This page describes presents the general process to extract, ingest, prepare, integrate and expose the data on dashboards and reports.architecture for data development. It's responsable to describe the data extraction, ingestion, preparation, integration and share. Not all the domains are using this architecture. Please, refers to the other subpages of Data Development to get to know the details and the differences.
Summary
| Table of Contents | ||||
|---|---|---|---|---|
|
Macro Diagram - ETL Phases
Data Sources
ELN
Labware
Instruments
1 - Data Ingestion
Data ingestion is the process of transporting data from one or more sources to a target site for further processing and analysis. For this project data is extracted or copied to Talend server as files using Talend.
Data Sources
ELN
(From Source to Local Files)
See ACN document:
Many spreadsheets are extracted from Electronic Laboratory Notebook (ELN) which is responsable to gather lab documents and result tests. It gives scientists a common view of data across disparate research areas, enabling complete visibility of research information.
ELN data is extracted in JSON format using an available API.
Related documents:
ZIFO - API Documentation for extracting E-Workbook Spreadsheet Data
| Google Drive Live Link | ||
|---|---|---|
|
Labware
LabWare LIMS is a laboratory information management system (LIMS) that is "configurable or pre-configured for laboratories of all types and sizes." The software consists of the core LIMS application with access to LabWare's library of LIMS software modules.
Instruments/Cyclers
The instruments files are available in different formats (txt, csv, xml, etc.) usually on Lab server folders or Google drive. The rules to structure the files in columns are specific for every workflow/method.
2 - Data Preparation
Data preparation is the process of cleaning and transforming raw data prior to processing and analysis. Often involves reformatting data and making corrections to data.
Talend jobs are responsable for cleaning and structuring the data.
(From Local Files to BigQuery Staging Dataset)
In this process, we also define if a source file should go to Accepted Files or Rejected Files bucket. The files with the expected structure go to Accepted and the files that we cannot load on BigQuery for a structure or data type issue go to Rejected. The process will not stop in case of rejected files.
3 - Data Integration and Enrich
Data enrichment refers to the process of appending or otherwise enhancing collected data with relevant context obtained from additional sources.
(From BigQuery Staging to ODS et DW datasets)
Data Visualization
(Tableau dashboards on ODS/DW data)
Data Access
4 - Data Presentation
All the reports, dashboards and users should access the data using this layer. Data marts (DM) with views will be created by subject, limiting access to non treated data.
5- Data Visualization
Check with the Data Viz team the specific documentation.The access to the data is controlled using...
