What is
TalendGCS
Talend is a leading ETL and big data integration software with an open-source environment for data planning, integration, processing, and cloud storage. It benefits organisations heading towards becoming data-driven by facilitating faster data movement to the preferred location for real-time data-driven decision-making. Through various other ETL tools available in the market, Talend is considered to be the next generation leader in the cloud and big data integration software.
What Solvay uses Talend for
Solvay uses Talend for dealing with heterogeneous data which is a tedious task and as the volume of data increases, only gets more tiresome. Talend helps in transforming this data into homogeneous data which can be useful for the business to analyse and derive the necessary information from it.
Talend acts as a one stop solution to enable data integration strategies by allowing us to gather data from multiple sources and consolidate it into a single, centralised location. It is the main ETL tool used at Solvay for batch processing, thanks to its many connectors which allows it to easily connect to various data sources on-premise and on-cloud and do data transformations.
Talend is used on the following projects:
Consolidate RnI data coming from ELN into Big Query
Analysis of the carbon footprint of our products
Retrieving BW data to feed Tableau dashboards
Extract some MES data that allows machine learning models to optimize the efficiency of our Soda Ash plants
Who should use it
Data Scientists and Data Engineers who develop and implement ETL solutions at Solvay.
When should you use Talend
When you want to:
Cloud Storage is a service for storing objects in Google Cloud. An object is an immutable piece of data consisting of a file of any format. The objects in containers are called buckets. All buckets are associated with a project, and these projects can be grouped under an organization.
Image Added
Google Cloud storage gives users capabilities to store data in the form of files coming from source and also transformed data sent to the destination application. The service combines the performance and scalability of Google’s cloud with advanced security and sharing capabilities with cost effectiveness.
What Solvay uses GCS for
Google Cloud storage will be used as staging area or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process.
Who should use it
All the application users seek easy, cloud-based storage and access for their data. Also will be helpful for business and individual users who wants
Back up data: It provides high-reliability and high availability data backup solutions to store the data
Analyse large amounts of Data: GCS supports Google's analytics tools (Prediction API and Bigquery) to let data owners or data scientist to swiftly analyze terabytes of data for powerful business insights
When should you use GCS
transform and load data from any source system to Google BigQueryextract data from Google BigQuery to deliver the extracts to downstream systemsprocess large volumes of events continuously coming from source system(s) and store into Google BigQuery- To stage the data coming from source systems before it is processed by Talend/GCP functions
- To archive/backup data for a long term.
- To analyse data by the analytics team for business intelligence, ad hoc analysis, and machine learning.
What outputs it will give you
It helps in taking real time decisions and becoming more data driven:
- Easily connect to various data sources (Excel, SQL databases, Google Drive)
- Perform data transformations using a no-code/low-code approach that simplifies maintenance
- Store the results in various databases or data warehouses
- Create standard job templates that can be re-used by other developers to fasten and standardize the development of data pipelines in the company
- Integrate with version control systems (GitLab, BitBucket…) allowing multiple developers to work at the same time on a same project and easily revert to previous versions in case of problems
- The Object/files stored in GCP can help data owners or data scientist to swiftly analyse terabytes of data for powerful business insights
- Acts as a landing zone for source systems to drop files, which can be consumed by Talend/GCP functions