Description
Wave is a external tool that calculate cost of saving on each project. Talend can get the data via API of this server and the output of the API is excel file.
It is required to create a connection and setup the access time to allow Talend read the data. If Talend access out side of the allow time, it will not get the file.
The password is also required to reset every 6 months since the password will be expired
Wave Connection
How to create connection
1. Login https://solvay.mckinseywave.com/login/auth
2. Go to Data Connection
3. Click on ADD DATA CONNECTION
4. Enter the name of the connection and data/time that allow to access.
If not set schedule, it is required to login on the web in order to allow the access.
The time is GMT+1 time zone
5. It will generate URL endpoints, username and password. Click on generate password and save it.
Please note that each password will be expired every 6 months and the interface is required to access at least every 17 days, otherwise, the connection will be deactivated.
How to change the access time
1. Login https://solvay.mckinseywave.com/login/auth
2. Go to Data Connections same as create connection
3. Right click on the connection that you want to change and select Update
4. Change date/time and click UPDATE
5. In case of the connector is deactivated, just mark it active here and keep login to the Wave in order to avoid deactivate.
How to reset password
1. Login https://solvay.mckinseywave.com/login/auth
2. Go to Data Connections same as create connection
3. Right click on the connection that you want to change and select Connect
4. Click on reset password
Talend
The Talend project is on DATA_OCEAN_DOMAIN_INDUSTRIAL
The project will required 4 sheets from the file output from the API. Then, in order to improve the performance of the loading and the size of data, the impactStar and SustainabilityImpactStar sheet are normalize into 2 tables
Detail job
- J001_WAVE_extraiction_to_GCS
- tJava to check l_VAR_WAVE_reload, true will get the file wave.xlsx, false will get from Wave API
- tHttpRequest to call Wave api in case l_VAR_WAVE_reload = false
- if return code = 200, the api will generate the file wave.xlsx
- if return code != 200, the the email will be sent to inform DataOps
- tSetGlobalVar to set current date for the snapshot value of this loading
- tFileInputExcel4 to read wave.xlsx on projectStar Sheet, then transform with tMap and save file to "WAV_IND_0000_0000_F001_[yyyyMMddHHmmss]_0000_F_D_project_star.csv"
tFileInputExcel1 to read wave.xlsx on activityStar Sheet, then transform with tMap and save file to "WAV_IND_0000_0000_F002_[yyyyMMddHHmmss]_0000_F_D_activity_star.csv"- tFileInputExcel2 to read wave.xlsx on sustainabilityImpactStar Sheet, then spit into 2 files
- month : unpivot by changing column (period) to row and remove the line which does not have data blank or 0 and save to "WAV_IND_0000_0000_F003_[yyyyMMddHHmmss]_0000_F_D_sust_impact_star_month.csv"
- annual : keep all columns except the Purpose and period and aggregate the row to reduce duplicate rows and save file to "WAV_IND_0000_0000_F003_[yyyyMMddHHmmss]_0000_F_D_sust_impact_star_annaul.csv"
- tFileInputExcel3 to read wave.xlsx on impactStar Sheet, then spit into 2 files
- month : unpivot by changing column (period) to row and remove the line which does not have data blank or 0 and save to "WAV_IND_0000_0000_F004_[yyyyMMddHHmmss]_0000_F_D_impact_star_month.csv"
- annual : keep all columns except the Purpose and period and aggregate the row to reduce duplicate rows and save file to "WAV_IND_0000_0000_F004_[yyyyMMddHHmmss]_0000_F_D_impact_star_annaul.csv"
- Upload all 6 files to GCS
- Delete the 6 files from Talend server
Flow job
- F001_WAVE_TO_BQ
- Setup meta_run_id and filename of the output file
- Call the detail job and pass parameters
- Call the standard job to upload the projectStar files from GCS to ODS
Call the standard job to upload the activityStar files from GCS to ODS- Call the standard job to upload the sustainabilityImpactStar annual files from GCS to ODS
- Call the standard job to upload the sustainabilityImpactStar month files from GCS to ODS
- Call the standard job to upload the impactStar annual files from GCS to ODS
- Call the standard job to upload the impactStar month files from GCS to ODS
- Update the log.
- F011_DIM_wave_project_star_SCD1
Select only last loading in ODS update to DIM_wave_project_star
Access rights
It is required to get access from WAVE admin ednamaria.coan@solvay.com
Source
https://solvay.mckinseywave.com/login/auth → prj-data-dm-industrial-[env]
- The API URL will be configured on Wave connection and the URL will keep in parameter l_VAR_WAVE_url_api
Format
- columnar format (Excel - .xlsx)
Destination
Data Ocean
- Bucket = cs-ew1-prj-data-dm-industrial-[env]-staging
- DataOean GCP = prj-data-dm-industrial-[env]
- STG Table name = prj-data-dm-industrial-[env].STG.
- STG_WAV_0000_0000_F001_F_D_project_star
STG_WAV_0000_0000_F002_F_D_activity_star- STG_WAV_0000_0000_F003_F_D_sust_impact_star_annual
- STG_WAV_0000_0000_F003_F_D_sust_impact_star_month
- STG_WAV_0000_0000_F004_F_D_impact_star_annual
- STG_WAV_0000_0000_F004_F_D_impact_star_month
- ODS Table name = prj-data-dm-industrial-[env].ODS
- ODS_WAV_0000_F001_F_D_project_star
ODS_WAV_0000_F002_F_D_activity_star- ODS_WAV_0000_F003_F_D_sust_impact_star_annual
- ODS_WAV_0000_F003_F_D_sust_impact_star_month
- ODS_WAV_0000_F004_F_D_impact_star_annual
- ODS_WAV_0000_F004_F_D_impact_star_month
- DM Tables name = prj-data-dm-industrial-[env].DM
- DIM_wave_project_star
DIM_wave_activity_star
Product
- Bucket = cs-ew1-prj-data-industrial-dash-[env]-staging
- Product GCP = prj-data-industrial-dash-[dev]
- Dataset = prj-data-industrial-dash-dev.DataOcean → DPL
V_ODS_wave_activity_star_fl- V_ODS_wave_impact_star_annual_fl
- V_ODS_wave_impact_star_month_fl
- V_ODS_wave_project_star_fl
V_wave_activity_star- V_wave_impact_star_annual
- V_wave_impact_star_month
- V_wave_project_star
- V_wave_sust_impact_star_annual
- V_wave_sust_impact_star_month
Format
- columnar format
Sizing
ProjectStar around 7000 records
Sust ImpactStart annual around 9000 records
Sust ImpactStart month around 54000 records
ImpactStart annual around 85,000 records
ImpactStart month around 1,226,000 records
Assessment
Data should be the same as output file and
SELECT distinct snapshot FROM `prj-data-dm-industrial-dev.ODS.ODS_WAV_0000_F004_F_D_impact_star_annual`
order by snapshot desc
The current date of snapshot should be available
Loading
1.1 Incremental Load
Source to ODS
N/A Always full load from the file
ODS to DM
l_VAR_dim_wave_project_star_reload = incremental
1.2 Full load
Set l_VAR_WAVE_reload = false
PL_INDUS_WAVE will load the output excel file from Wave API and save the current time as snapshot field
1.3. Reloading data
Source to ODS
Set l_VAR_WAVE_reload = true (in case of load historical data, manual file, which is not get from Wave API)
Key users will provide the historical file and load the file manually by rename the file to wave.xlsx on folder \\Acew1dtlndeng02\data\DEV\DATA_OCEAN_DOMAIN_INDUSTRIAL\InOut\WAVE. The file must have the same sequence of column same as the normal extraction.
ODS to DM
l_VAR_dim_wave_project_star_reload = QUALIFY ROW_NUMBER() OVER (PARTITION BY snapshot, projectStar ORDER BY meta_ods_insert_date DESC) = 1
1.4 Plan to schedule
Weekday at 04:01 AM CET
1.5 Timing
The average time expected for loading: around 20 mins
Criticality
High/Medium/Low
Logging
On project prj-data-dm-industrial-[environment]
select job.job_name, job.meta_start_date, job.meta_execution_id, logs.meta_run_id, logs.meta_source_system, logs.meta_step, logs.meta_status, logs.meta_num_lines, logs.meta_error_lines from STG.log_tables logs join STG.run_jobs job on logs.meta_run_id = job.meta_run_id
where logs.meta_run_id in (SELECT meta_run_id FROM STG.run_jobs order by meta_start_date desc limit 100)
and job_name like '%WAVE%'
and meta_start_date > DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
order by job.meta_start_date desc










