Table of contents :
1. Objectives of the document:
This document provides a technical overview of the solution delivered by BigData&Analytics Team, inspired by the previous version analytics project developed by D3S (https://wiki.solvay.com/display/BDA/PCM+-+Predictive+Credit+Management)
It explains the building blocks of the source code and the methodology followed to obtain:
- insightful predictions for cash recovery (Data)
- strategies to priorities documents lots in user interface
- end to end application to manage daily cash collection activities
2. General presentation of the solution:
The main business stake is to increase overdue coverage with the existing task force. As of today, dunning and pre-dunning actions focus on the largest outstanding amounts, leaving aside smaller accounts (below a threshold). Pre-dunning include some additional rules applied by cash collection teams through a time-consuming manual process.
Cash collection is steered with End of Month KPIs. Although not necessarily representative of the cost of working capital, EOM metrics are relevant as they are fully aligned with other business steering indicators. Predictive analytics are a way forward, especially to better address the smaller accounts on which the overdue rate is higher.
Figure 1: objectives and core principles
Functional overview
Figure 1bis: functional overview
Machine learning methodology via Dataiku
The predictive solution leverage machine learning technology. A model is first trained on payment history to learn customer behavior based on all available characteristics. For new customers, the model infers behavior based on available data (country, currency, sector, invoice- characteristics, etc…)
Figure 2: Machine learning description
Data engineering orchestration via Talend & Bigquery
F200_Daily_Delta :
Process flow:
TALEND :
Several talend pipelines covers the whole project during each iteration of the update. It allows to interact with several components of the projects (SAP BW, SFTP server, Google Big Queries, Google Cloud Storage, Dataiku Data science studio, Google Cloud Functions)
List of all talend pipelines used for the project
- 210_Extract_Delta_From_SFTP_WBP_WAIT
Extract files from BW
Sftp Folder (test Envi) è
/exploit/BW/PREDICTCM:
- All Csv Files (data + structure)
/exploit/BW/PREDICTCM/V2:
- "dbfiar20_"+context.CET_timezone+"*"
- "S_dbfiar20_"+context.CET_timezone+"*"
With, CET Time Zone :
LocalDateTime s = LocalDateTime.now(ZoneId.of("UTC+1"));
String myDate = s.format(DateTimeFormatter.ofPattern("yyyyMMdd_HH"));
context.CET_timezone = myDate;
System.out.println(context.CET_timezone);
- J220_Prepare_Files
Loop on the file list and get the metadata of each file, based on the following dictionary:
\DATA\DEV\SBS\PCM\Input\metadata\data_dictionnary_mssql.csv
- J240_Push_to_BQ_Step_2_MD
Upload a files into bigquery data base
- PAYMENT_DESC_TABLE`
- GBU_Description`
- company_code`
- sub_activity`
- customer`
- gl_account`
- customer_credit_control_area`
- currency_data`
- country`
- salemp_text`
- market_text`
- J230_Push_to_BQ_Step_1_TRX
Upload Delta files into bigquery data base
- credit_mgt`
- order_blocked`
- J250_Launch_Queries_From_GCS
Execution of the sql scripts stored in the buckets (google storage: predict-credit-mgt-v2-dev-queries)
Execution mode: "delta_trx”:
Append Mode
- 06 Delta/520 Delta Remove duplicates from credit_mgt.sql
- 06 Delta/521 Delta Credit Mgt.sql
- 06 Delta/522 Delta Remove duplicates from order_blocked.sql
- 06 Delta/523 Delta Order Blocked.sql
Execution mode: "delta_md”:
Create the flowing tables from files
- 06 Delta/500 Delta Transform Payment Desc.sql
- 06 Delta/501 Delta Transform GBU Description.sql
- 06 Delta/502 Delta Prepare sub_activity with description.sql
- 06 Delta/510 Delta Update Company Code.sql
- 06 Delta/511 Delta Update Country.sql
- 06 Delta/512 Delta Update Currency Data.sql
- 06 Delta/513 Delta Update Customer.sql
- 06 Delta/514 Delta Update CCA.sql
- 06 Delta/515 Delta Update GL Account.sql
- 06 Delta/516 Delta Update Payment Desc Table.sql
- 06 Delta/517 Delta Update Sub Activity.sql
- 06 Delta/518 Delta Update Sub Activity With Description.sql
- 06 Delta/519 Delta Update Market Text.sql
- 06 Delta/519 Delta Update Salemp Text.sql
Execution mode: "daily_post”:
Append mode
- 04 Core/111 Core Perimeter.sql
- 04 Core/112 Outstanding amount.sql
- 04 Core/113 Core Perimeter future.sql
- J260_Run_Scenario_on_DSS
Send a daily trigger to dataiku to run the prediction and strategy compilation. Executes dataiku model which will update input_from_dss.result_table
- J271_Call_PubSub_to_update_CloudSQL
Use Cloud Function component to update by an API the data stored by the cloud SQL
Google Storage :
Project:
Dev è predict-credit-mgt-v2-dev
Prod è predict-credit-mgt-v2-prod
Bucket:
predict-credit-mgt-v2-dev-queries
BigQuery :
Project:
Dev è predict-credit-mgt-v2-dev
Prod è predict-credit-mgt-v2-prod
Dataset:
F500_Training :
-----------------------This flow exists only in dev and it's executed OnDemand------------
- J250_Launch_Queries_From_GCS
- 02 Currency/101 Generate Currency Data Per Month.sql
- 02 Currency/102 Currency Preprocessing.sql
- 02 Currency/103 Currency Exceptions.sql
- 03 Credit Limit/104 Credit Limit Preprocessing.sql
- 04 Core/106 Create Documents Perimeter.sql
- 04 Core/107 Create Clearing List for Payer History.sql
- 04 Core/108 Order Blocked (Run).sql
- 04 Core/109 Payer History per Month.sql
- 04 Core/110 Open items at a particular date.sql
- 04 Core/111 Core Perimeter.sql
- 05 Training/210 Concatenate training result on Core Perimeter Concat.sql
- 205 Build Training Data View (only available in Dev under : predict-credit-mgt-v2-dev.Saved queries.Preject queries )
Google Storage :
Project:
Dev : predict-credit-mgt-v2-dev
Bucket:
predict-credit-mgt-v2-dev-queries
BigQuery :
Project:
Dev : predict-credit-mgt-v2-dev
User interface exposure via AppEngine
A simple webapp has been developt to monitor and prioritize cash collections. For UI details: see the documentation in confluence. Source code is available on version control tool (bitbucket repository) and through Google SDK in case of user credentials .
Figure 4 : Web interface with main features
3. Workflow description
There are several building blocks in or interacting with the solution:
- BW/SAP server: interaction in/out from the SAP system
- Google Sheets : End Users interaction with the project
- Google BigQuery : Datalake storage with big data such as documents long term history
- Dataiku : Machine learning processing and priorities computation
- Cloud SQL / AppEngine project : hosting of the transactional UI
These building blocks are linked through the here abode functional steps.
Figure 5: Building blocks and interactions of the whole solution
4. Workflow details
Step 1. Full & Daily raw data ingestion
Figure 6 : Schematic description
Through Data Transfer Processes in SAP BW and SFTP Connectors, all data retrieved in SAP for the project as raw data is remove from SAP environment to GBQ datasets in the GCP projects corresponding to each environment:
Several details to handle and upgrade this workflow:
- DTP reference in BW :
- DBFIAR20 -> OH_PCM_01– Delta
- DBFIAR21 -> OH_PCM_01– FULL
- SFTP host and login and/or location :
- See in Talend Documentation
- GBQ dedicated dataset :
- Raw_data
- GCP Dedicated project :
- List of the retrieved raw data and detailed variables: More details in the following Google Spreadsheet: Data Material
Step 2. CCT raw data ingestion
Figure 7 : Schematic description
Several information are manually stored each month in several tabs of a collaborative spreadsheet. These information are transformed into KPI for the forecast and strategies computation:
- Number of times - Postponed Payment in the last 12 months
- Number of times - Other Reasons
- Last twelve months - recurrent
- Last six months - recurrent
- Last three months - recurrent
- Correct GBU
- And others
Several details to handle and upgrade this workflow:
- Google Spreadsheet embedding the raw data : Diagnosis Report - Google Spreadsheet
- GBQ dedicated dataset : extra_data and tables for each area ()
- Talend pipeline : (To be )
Step 3. Iterative orchestration and data preparation to GBQ
This step helps for the daily and on demand update of the Master data and Transactional data stored in Gbq. Talend communicates with google cloud storage to launch Gbq saved queries to organize the extract-transform-load process.
Figure 8 : List of sql files for Talend transformations
Step 4. Model Design
The model is performed in Data Science Studio Platform of Dataiku, a user-friendly interface coded in python.
These SaaS tool owns several data connectors to external databases and storage such as: Google cloud storage, GCP.
The model object is (New RF on train sample) computed on demand by a Data Scientist with a python dedicated library, scikit-learn and stored as pickle object inside the platform and is made available to other dataiku project depending on access policy.
Figure 9 : Model details on dataiku.
Step 5. Validation & Accuracy computation
Each version of the model is evaluated automatically by the data science plateform. Split between train and test sample is realized with « net due date » variable, oldest documents for training (approximatively M-48 to M-7) , newest for test (M-6 to M-1)
Figure 10: Accuracy computation on dataiku.
Step 6. Daily prediction
Each working day at approximately 10:30 for dev env and 11:00 env, the prediction based on document data is released.
Figure 11: Daily prediction process
step 7. Strategies specifications
Here below is the strategy design process :
Figure 13: Strategy specification
The link for the source spreadsheet:
Step 8. Strategies daily computation
Figure 14 : Strategy Daily computation
Here is the link to useful document for the design :
Step 9. User Roles specifications for Back End :
Each user of the application should access to a specific documents portfolio split by region for a standard user and related to a dedicated group.
Figure 15: User roles specification
More information on the dedicated spreadsheet on Dev environment:
Step 10. Daily update / archive of the UI
Figure 15bis : Daily update
For more detail, please see the documentation: https://drive.google.com/file/d/1W9Qsa7aO4lVHNYybhsc14pZ3JPudQF33ajr3GbZ6xso/view
Step 11. Real time actions archiving
Figure 16: Actions archiving
Step 12. Real time dashboard
Link to the dashboard: https://datastudio.google.com/u/0/reporting/1l7Utyq5GIaVdRCbMpkJrKIbMjc5OFsEo/page/SPkf
Figure 17: Dashboard
Step 13. Update of BW Dashboard (pending)
Figure 18 : Update BW reporting
5. Contacts:
For maintenance
Figure 19 : Maintenance rules
For editable documentation:
https://docs.google.com/presentation/d/1bKRxVjK6c34jx980FuduR_Hnlb7mZaCDPam5hWNLLvQ/edit?usp=sharing
Remark:
PO2 Project (2023)
Solvay split to Eco and Sco which has spin off on 8 Dec 2023. As the result of this, Data Engineer need to modify following for Po2
- Add the filter on credit controlling area SYEN. Before we have only value SOLV. This change was starting from the DTP at BW. Then, all the filter in BQ, we add the new value SYEN as well.
- Change the calculation of interco. Before, transaction between SCO and ECO will consider as interco but after spin off, we consider it as external. In order to have this new calculation, we need to add scope (c_scope) to master data company, customer and view in BQ views.documents_view in order to check the interco when company_scope = customer_scope only then amount_EUR will be null on this case.























