Description

Tools: Talend

Detail job

J070_Kadiska_to_GCS

Define the date
1. Time to select "Begin" to "End"
2. Time to stop "Stop"
3. Time today = "Now"
Login to Kadiska API on URL: https://app.kadiska.com/api/v1/config/clients/client:5b77758db8/tokens (l_VAR_kadiska_url_token)and keep the token as global variable
Start looping
Extract data from Kadiska API on URL: https://app.kadiska.com/api/v1/query (l_VAR_kadiska_url_query) controlling by following variables
In case [no more data] there is no data or the end time >= stop time, stop the loop
In case [there is data more than limit] there is data and number of line = limit, add offset + number of line
In case [no more data during the interval, go to next hour]there is data and number of line < limit or there is no data and End time < Stop, set offset = 0 and add Begin and End time 1 hour (interval)

Flow job

F070_Kadiska_rum_to_ODS

Define variable such as filename
Get meta_run_id
Define meta_file_name for incremental or reload which depending on l_VAR_kadiska_q_reload but the name is fixed to KADISKA_RUM[_RELOAD]
Get the last successfully load from table incremental_loading
Call detail job
Load to ODS
Update the last successful date to incremetnal_loading table

Access rights

It is request to network team : bilal.seddiki@solvay.com (manager =

Source

https://app.kadiska.com/api/v1/query

Format

JSON

Destination

Location

Bucket = cs-ew1-prj-data-dm-dt-[dev]-staging/KADISKA/
DataOean GCP = prj-data-dm-dt-[env]
STG Table names =
- prj-data-dm-dt-prod.STG.STG_KDK_0000_0000_F001_I_H_rum

ODS Table names =
- prj-data-dm-dt-prod.ODS.ODS_KDK_0000_F001_I_H_rum

DPL View names =
- prj-data-dm-dt-[env].DPL.V_FACT_hlx_work_order

Format

columnar format

Sizing

Assessment

Loading

1.1 Incremental Load

It is control by variables:

l_VAR_kadiska_q_condition : normally it is blank but it can add additional filter such as ["=", "watcher_name", ["$","Google Calendar"]],
l_VAR_kadiska_q_interval : during project design, we agree to have interval 1 hour = 3600000 milliseconds
l_VAR_kadiska_q_limit : max number of records to get from API each call
l_VAR_kadiska_q_nloop : max number of loop
l_VAR_kadiska_q_offset : start index of the record from API query
l_VAR_kadiska_q_reload : "incremental" for the incremental load, in case of reload change this variable to end time with epoch format (starting time will get from table incremental_loading on meta_file_name = KADISKA_RUM_RELOAD
l_VAR_kadiska_email_flag : "yes" when need Talend to send email to inform when the incremental load is not increasing on the incremental_table, which can use the loading duplicate data
l_VAR_kadiska_email_recipient: list of email to get the email

1.2 Full load

N/A

1.3. Reloading data

It is control by variables:

l_VAR_kadiska_q_limit: The number of records to get from API each time
l_VAR_kadiska_q_nloop: the number of loop for each load. (the number of limit x nloop will be max record on the file to load each time to BQ)
l_VAR_kadiska_q_offset: The number of record index from API. It starts with 0. Will change to another number only specific reload case
l_VAR_kadiska_q_reload: If it is NOT "increment", it will the full load with condition on this parameter such as [1724371200000]. It will be the end time (epoch format). The start date will always get from incremental_loading table with meta_file_name = KADISKA_RUM_RELOAD

How to:

1. Stop schedule job

2. Change the start date that is required to reload on

UPDATE STG.incremental_loading

SET meta_last_process_date = '2021-12-22 00:00:00' --the date that want to reload

where meta_file_name = 'KADISKA_RUM_RELOAD' -- identify the object that want to reload but the name to change must be _RELOAD. Without _RELOAD will use for incremental load only.

3. Change context l_VAR_kadiska_q_reload to the end date with epoch format

4. Recheck the variables: limit, nloop, offset

5. Run the job to load until KADISKA_RUM_RELOAD is reach the end time that we enter on l_VAR_kadiska_q_reload. Meaning reload is complete

6. Change the context l_VAR_kadiska_q_reload to be "incremental"

7. Reschedule the job

1.4 Plan to schedule

Every hour

1.5 Timing

5 minutes

Criticality

High/Medium/Low

Logging

1. Check the last loading

select * from STG.incremental_loading
where meta_file_name like '%KADISKA_RUM'
order by meta_file_name

2. Check the loading records / error

select job.job_name, job.meta_start_date, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 100)
and job_name like '%Kadiska%'
and meta_start_date > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
order by meta_start_date desc

Page tree

Products

Services

Ways of Working

Meet the Team

Technologies

Help

Life in DA&AI

Tools: Talend

Detail job

Flow job

Access rights

Source

Format

Destination

Location

Format

Sizing

Assessment

Loading

1.1 Incremental Load

1.2 Full load

1.3. Reloading data

1.4 Plan to schedule

1.5 Timing

Criticality

Logging

Page tree

Products

Services

Ways of Working

Meet the Team

Technologies

Help

Life in DA&AI

Kadiska - Network Statistic

Tools: Talend

Detail job

Flow job

Access rights

Source

Format

Destination

Location

Format

Sizing

Assessment

Loading

1.1 Incremental Load

1.2 Full load

1.3. Reloading data

1.4 Plan to schedule

1.5 Timing

Criticality

Logging