Page tree


Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Content Layer
id1070138948
Content Column
id1070138969
Content Block
background-color#8870aa#5495FC
id1070138959


Content Block
id1070138947

What is

Talend

GCS

Talend is a leading ETL and big data integration software with an open-source environment for data planning, integration, processing, and cloud storage. It benefits organisations heading towards becoming data-driven by facilitating faster data movement to the preferred location for real-time data-driven decision-making.  Through various other ETL tools available in the market, Talend is considered to be the next generation leader in the cloud and big data integration software.

What Solvay uses Talend for

Solvay uses Talend for dealing with heterogeneous data which is a tedious task and as the volume of data increases, only gets more tiresome. Talend helps in transforming this data into homogeneous data which can be useful for the business to analyse and derive the necessary information from it.

Talend acts as a one stop solution to enable data integration strategies by allowing us to gather data from multiple sources and consolidate it into a single, centralised location.   It is the main ETL tool used at Solvay for batch processing, thanks to its many connectors which allows it to easily connect to various data sources on-premise and on-cloud and do data transformations. 

Talend is used on the following projects: 

  • Consolidate RnI data coming from ELN into Big Query

  • Analysis of the carbon footprint of our products

  • Retrieving BW data to feed Tableau dashboards

  • Extract some MES data that allows machine learning models to optimize the efficiency of our Soda Ash plants

Who should use it

Data Scientists and Data Engineers who develop and implement ETL solutions at Solvay.

When should you use Talend

When you want to:

Cloud Storage is a service for storing objects in Google Cloud. An object is an immutable piece of data consisting of a file of any format. The objects in containers are called buckets. All buckets are associated with a project, and these projects can be grouped under an organization. 

Image Added


Google Cloud storage gives users capabilities to store data in the form of files coming from source and also transformed data sent to the destination application. The service combines the performance and scalability of Google’s cloud with advanced security and sharing capabilities with cost effectiveness.

What Solvay uses GCS for

Google Cloud storage will be used as staging area or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. 

Who should use it

All the application users seek easy, cloud-based storage and access for their data. Also will be helpful for business and individual users who wants

Back up data: It provides high-reliability and high availability data backup solutions to store the data 

Analyse large amounts of Data: GCS supports Google's analytics tools (Prediction API and Bigquery) to let data owners or data scientist to swiftly analyze terabytes of data for powerful business insights

When should you use GCS

  • transform and load data from any source system to Google BigQuery
  • extract data from Google BigQuery to deliver the extracts to downstream systems
  • process large volumes of events continuously coming from source system(s) and store into Google BigQuery
    1. To stage the data coming from source systems before it is processed by Talend/GCP functions
    2. To archive/backup data for a long term.
    3. To analyse data by the analytics team for business intelligence, ad hoc analysis, and machine learning.


    What outputs it will give you

    It helps in taking real time decisions and becoming more data driven:

    • Easily connect to various data sources (Excel, SQL databases, Google Drive)
    • Perform data transformations using a no-code/low-code approach that simplifies maintenance
    • Store the results in various databases or data warehouses
    • Create standard job templates that can be re-used by other developers to fasten and standardize the development of data pipelines in the company
    • Integrate with version control systems (GitLab, BitBucket…) allowing multiple developers to work at the same time on a same project and easily revert to previous versions in case of problems
    1. The Object/files stored in GCP can help data owners or data scientist to swiftly analyse terabytes of data for powerful business insights
    2. Acts as a landing zone for source systems to drop files, which can be consumed by Talend/GCP functions
    Content Block
    background-color#8870aa#5495FC
    id1070845336