Description
Tools: Talend
Detail job
- J070_Kadiska_to_GCS
- Define the date
- Time to select "Begin" to "End"
- Time to stop "Stop"
- Time today = "Now"
- Login to Kadiska API on URL: https://app.kadiska.com/api/v1/config/clients/client:5b77758db8/tokens (l_VAR_kadiska_url_token)and keep the token as global variable
- Start looping
- Extract data from Kadiska API on URL: https://app.kadiska.com/api/v1/query (l_VAR_kadiska_url_query) controlling by following variables
- In case [no more data] there is no data or the end time >= stop time, stop the loop
- In case [there is data more than limit] there is data and number of line = limit, add offset + number of line
- In case [no more data during the interval, go to next hour]there is data and number of line < limit or there is no data and End time < Stop, set offset = 0 and add Begin and End time 1 hour (interval)
Flow job
- F070_Kadiska_rum_to_ODS
- Define variable such as filename
- Get meta_run_id
- Define meta_file_name for incremental or reload which depending on l_VAR_kadiska_q_reload but the name is fixed to KADISKA_RUM[_RELOAD]
- Get the last successfully load from table incremental_loading
- Call detail job
- Load to ODS
- Update the last successful date to incremetnal_loading table
Access rights
It is request to network team : bilal.seddiki@solvay.com (manager =
Source
https://app.kadiska.com/api/v1/query
Format
JSON
Destination
Location
- Bucket = cs-ew1-prj-data-dm-dt-[dev]-staging/KADISKA/
- DataOean GCP = prj-data-dm-dt-[env]
- STG Table names =
- prj-data-dm-dt-prod.STG.STG_KDK_0000_0000_F001_I_H_rum
- ODS Table names =
- prj-data-dm-dt-prod.ODS.ODS_KDK_0000_F001_I_H_rum
- DPL View names =
- prj-data-dm-dt-[env].DPL.V_FACT_hlx_work_order
Format
columnar format
Sizing
Assessment
Loading
1.1 Incremental Load
It is control by variables:
- l_VAR_kadiska_q_condition : normally it is blank but it can add additional filter such as ["=", "watcher_name", ["$","Google Calendar"]],
- l_VAR_kadiska_q_interval : during project design, we agree to have interval 1 hour = 3600000 milliseconds
- l_VAR_kadiska_q_limit : max number of records to get from API each call
- l_VAR_kadiska_q_nloop : max number of loop
- l_VAR_kadiska_q_offset : start index of the record from API query
- l_VAR_kadiska_q_reload : "incremental" for the incremental load, in case of reload change this variable to end time with epoch format (starting time will get from table incremental_loading on meta_file_name = KADISKA_RUM_RELOAD
- l_VAR_kadiska_email_flag : "yes" when need Talend to send email to inform when the incremental load is not increasing on the incremental_table, which can use the loading duplicate data
- l_VAR_kadiska_email_recipient: list of email to get the email
1.2 Full load
N/A
1.3. Reloading data
It is control by variables:
- l_VAR_kadiska_q_limit: The number of records to get from API each time
l_VAR_kadiska_q_nloop: the number of loop for each load. (the number of limit x nloop will be max record on the file to load each time to BQ)
l_VAR_kadiska_q_offset: The number of record index from API. It starts with 0. Will change to another number only specific reload case - l_VAR_kadiska_q_reload: If it is NOT "increment", it will the full load with condition on this parameter such as [1724371200000]. It will be the end time (epoch format). The start date will always get from incremental_loading table with meta_file_name = KADISKA_RUM_RELOAD
How to:
1. Stop schedule job
2. Change the start date that is required to reload on
UPDATE STG.incremental_loading
SET meta_last_process_date = '2021-12-22 00:00:00' --the date that want to reload
where meta_file_name = 'KADISKA_RUM_RELOAD' -- identify the object that want to reload but the name to change must be _RELOAD. Without _RELOAD will use for incremental load only.
3. Change context l_VAR_kadiska_q_reload to the end date with epoch format
4. Recheck the variables: limit, nloop, offset
5. Run the job to load until KADISKA_RUM_RELOAD is reach the end time that we enter on l_VAR_kadiska_q_reload. Meaning reload is complete
6. Change the context l_VAR_kadiska_q_reload to be "incremental"
7. Reschedule the job
1.4 Plan to schedule
Every hour
1.5 Timing
5 minutes
Criticality
High/Medium/Low
Logging
1. Check the last loading
select * from STG.incremental_loading
where meta_file_name like '%KADISKA_RUM'
order by meta_file_name
2. Check the loading records / error
select job.job_name, job.meta_start_date, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 100)
and job_name like '%Kadiska%'
and meta_start_date > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
order by meta_start_date desc



