Description

Tools: Talend

Detail job

  1. Define the date
    1. Time to select  "Begin" to "End"
    2. Time to stop "Stop"
    3. Time today = "Now"
  2. Login to Kadiska API on URL: https://app.kadiska.com/api/v1/config/clients/client:5b77758db8/tokens (l_VAR_kadiska_url_token)and keep the token as global variable
  3. Start looping 
  4. Extract data from Kadiska API on URL: https://app.kadiska.com/api/v1/query (l_VAR_kadiska_url_query) controlling by following variables
  5. In case [no more data] there is no data or the end time >= stop time, stop the loop
  6. In case [there is data more than limit] there is data and number of line = limit, add offset + number of line
  7. In case [no more data during the interval, go to next hour]there is data and number of line < limit  or there is no data and End time < Stop,  set offset = 0 and add Begin and End time 1 hour (interval)

Flow job

  1. Define variable such as filename
  2. Get meta_run_id
  3. Define meta_file_name for incremental or reload which depending on l_VAR_kadiska_q_reload but the name is fixed to KADISKA_RUM[_RELOAD]
  4. Get the last successfully load from table incremental_loading 
  5. Call detail job 
  6. Load to ODS
  7. Update the last successful date to incremetnal_loading table

Access rights

It is request to network team : bilal.seddiki@solvay.com 

Source

https://app.kadiska.com/api/v1/query

Format

JSON

Destination

Location

Format

columnar format

Sizing


Assessment

Loading

1.1 Incremental Load

It is control by variables:

1.2 Full load

N/A

1.3. Reloading data

It is control by variables:

How to:

1. Stop schedule job

2. Change the start date that is required to reload on

UPDATE STG.incremental_loading

 SET meta_last_process_date = '2021-12-22 00:00:00'   --the date that want to reload

where meta_file_name = 'KADISKA_RUM_RELOAD' -- identify the object that want to reload but the name to change must be _RELOAD.  Without _RELOAD will use for incremental load only.

3. Change context l_VAR_kadiska_q_reload to the end date with epoch format 

4. Recheck the variables: limit, nloop, offset

5. Run the job to load until KADISKA_RUM_RELOAD is reach the end time that we enter on l_VAR_kadiska_q_reload. Meaning reload is complete

6. Change the context  l_VAR_kadiska_q_reload to be "incremental" 

7. Reschedule the job

1.4 Plan to schedule

Every hour

1.5 Timing

5 minutes

Criticality

Low

Logging

1. Check the last loading

select * from STG.incremental_loading
where meta_file_name like '%KADISKA_RUM'
order by meta_file_name

2. Check the loading records / error

select job.job_name, job.meta_start_date, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where  logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 100)
and job_name like '%Kadiska%'
and meta_start_date >  DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 DAY)
order by meta_start_date desc