Description
Tools: Talend
Detail job
- J061_Helix_Case_ITSM
- Connect to the source system API by reading context from flow job
- Setup loop to get the data
- tSetGlobalVar : to set the maximum number of records to read each time and set the variable nb to check when to exit the loop (start with 0)
- tLoop : setup the condition to exit the loop when variable nb < 0
- tJava: setup the offset of records in order to get new records of each loop
- To get data from the source by using start row number from "nb" and max row number from "limit". It read schema from the source(meta data)
- Generate output file and save to DATA\DEV\DATA_OCEAN_DOMAIN_DT\Tmp
- Update the offset number "nb" = "nb" + "limit"
- Update "nb" = -1 when ((Integer)globalMap.get("tReplace_1_NB_LINE"))<= 0 in order to exit the loop
- Upload the files all the folder( cs-ew1-prj-data-dm-dt-[dev]-staging)
- Delete all the files in the folder (point number 5)
- Data Anonymization
The Talend process extracts all cases from the ITSM source, and as part of data anonymization, we have anonymized the "Human-Resources" case data.
Below is a list of columns that are anonymized during ingestion. Users will not be able to view the original data, as it will appear as "*** Anonymized info ***." By anonymizing at the ingestion stage, the original data is not stored in Cloud Storage or BigQuery tables.
- Description
- Dynamic_data_audit_info
- Dynamic_data_definitionid
- Dynamic_data_parameter
- Summary
Flow job
- F061_Helix_Case_ITSM
- Setup meta_run_id and filename of the output file
- Get the last load from table STG.incremetnal_load, control by the variable I_VAR_BQ_TABLE_INC_LOAD and configuration the logic of the incremental load in tJava to use the date from incremental_load to the field of create or change date in the SAP
- Call the detail job and pass parameters such as user/password, query from point number 2 to do the incremental load and save the file to GCS
- Call the standard job to upload the files from GCS to ODS
- If the loading is OK and parameter l_VAR_helix_[table_name]_reload = incremental, update the time on the table incremental_load. If the value is not incremental, it is the reloading
- If everything is OK, update the log.
Access rights
Source
Format
- JSON
Destination
Location
- Bucket = cs-ew1-prj-data-dm-dt-[dev]-staging/xxx
- DataOean GCP = prj-data-dm-dt-[env]
- STG Table name = prj-data-dm-dt-[env].STG.STG_HLX_0000_0000_F001_I_H_Cases_ITSM
- ODS Table name = prj-data-dm-dt-[Env].ODS.ODS_HLX_0000_F001_I_H_Cases_ITSM
- DPL View name = prj-data-dm-dt-[env].DPL.V_FACT_hlx_case_itsm
Format
- columnar format