Table of contents :

1.    Objectives of the document:

This document provides a technical overview of the solution delivered by BigData&Analytics Team, inspired by the previous version analytics project developed by D3S (https://wiki.solvay.com/display/BDA/PCM+-+Predictive+Credit+Management)

It explains the building blocks of the source code and the methodology followed to obtain:

2.    General presentation of the solution:

The main business stake is to increase overdue coverage with the existing task force. As of today, dunning and pre-dunning actions focus on the largest outstanding amounts, leaving aside smaller accounts (below a threshold). Pre-dunning include some additional rules applied by cash collection teams through a time-consuming manual process.

Cash collection is steered with End of Month KPIs. Although not necessarily representative of the cost of working capital, EOM metrics are relevant as they are fully aligned with other business steering indicators. Predictive analytics are a way forward, especially to better address the smaller accounts on which the overdue rate is higher.


 

Figure 1: objectives and core principles 

Functional overview

Figure 1bis:  functional overview 


Machine learning methodology via Dataiku

The predictive solution leverage machine learning technology. A model is first trained on payment history to learn customer behavior based on all available characteristics. For new customers, the model infers behavior based on available data (country, currency, sector, invoice- characteristics, etc…)


Figure 2: Machine learning description 


Data engineering orchestration via Talend & Bigquery

            F200_Daily_Delta : 

Process flow:

TALEND :

Several talend pipelines covers the whole project during each iteration of the update. It allows to interact with several components of the projects (SAP BW, SFTP server, Google Big Queries, Google Cloud Storage, Dataiku Data science studio, Google Cloud Functions)

List of all talend pipelines used for the project 


Extract files from BW

Sftp Folder (test Envi) è

 /exploit/BW/PREDICTCM: 


/exploit/BW/PREDICTCM/V2:

With, CET Time Zone :


LocalDateTime s = LocalDateTime.now(ZoneId.of("UTC+1"));

String myDate = s.format(DateTimeFormatter.ofPattern("yyyyMMdd_HH"));


context.CET_timezone = myDate;


System.out.println(context.CET_timezone);


Loop on the file list and get the metadata of each file, based on the following dictionary:

\DATA\DEV\SBS\PCM\Input\metadata\data_dictionnary_mssql.csv


Upload a files into bigquery data base

   

Upload Delta files into bigquery data base

Execution of the sql scripts stored in the buckets (google storage: predict-credit-mgt-v2-dev-queries)

Execution mode: "delta_trx”:

Append Mode

Execution mode: "delta_md”:

Create the flowing tables from files

Execution mode: "daily_post”:

Append mode

Send a daily trigger to dataiku to run the prediction and strategy compilation. Executes dataiku model which will update input_from_dss.result_table


Use Cloud Function component to update by an API the data stored by the cloud SQL


Google Storage :

Project:

Dev è predict-credit-mgt-v2-dev

Prod è predict-credit-mgt-v2-prod

Bucket:

predict-credit-mgt-v2-dev-queries


BigQuery :

Project:

Dev è predict-credit-mgt-v2-dev

Prod è predict-credit-mgt-v2-prod

Dataset:


       F500_Training :


-----------------------This flow exists only in dev and it's executed OnDemand------------



Google Storage :

Project:

Dev : predict-credit-mgt-v2-dev

Bucket:

predict-credit-mgt-v2-dev-queries 

BigQuery :

Project:

Dev : predict-credit-mgt-v2-dev


User interface exposure via AppEngine

A simple webapp has been developt to monitor and prioritize cash collections. For UI details: see the documentation in confluence. Source code is available on version control tool (bitbucket repository) and through Google SDK in case of user credentials .

Figure 4 :  Web interface with main features

3.    Workflow description

There are several building blocks in or interacting with the solution:

These building blocks are linked through the here abode functional steps.

Figure 5: Building blocks and interactions of the whole solution

4.    Workflow details

Step 1. Full & Daily raw data ingestion

Figure 6 : Schematic description

Through Data Transfer Processes in SAP BW and SFTP Connectors, all data retrieved in SAP for the project as raw data is remove from SAP environment to GBQ datasets in the GCP projects corresponding to each environment:

Several details to handle and upgrade this workflow:




 

Step 2. CCT raw data ingestion


Figure 7 : Schematic description

Several information are manually stored each month in several tabs of a collaborative spreadsheet. These information are transformed into KPI for the forecast and strategies computation:

Several details to handle and upgrade this workflow:  

  

Step 3. Iterative orchestration and data preparation to GBQ

This step helps for the daily and on demand update of the Master data and Transactional data stored in Gbq. Talend communicates with google cloud storage to launch Gbq saved queries to organize the extract-transform-load process.


Figure 8 : List of sql files for Talend transformations

 

Step 4. Model Design

The model is performed in Data Science Studio Platform of Dataiku, a user-friendly interface coded in python.

These SaaS tool owns several data connectors to external databases and storage such as: Google cloud storage, GCP.

The model object is (New RF on train sample) computed on demand by a Data Scientist with a python dedicated library, scikit-learn and stored as pickle object inside the platform and is made available to other dataiku project depending on access  policy.


Figure 9 : Model details on dataiku.

 

 Step 5. Validation & Accuracy computation

Each version of the model is evaluated automatically by the data science plateform.  Split between train and test sample is realized with « net due date » variable, oldest documents for training (approximatively M-48 to M-7) , newest for test (M-6 to M-1)


Figure 10: Accuracy computation on dataiku.


Step 6. Daily prediction

Each working day at approximately 10:30 for dev env and 11:00 env, the prediction based on document data is released. 

Figure 11: Daily prediction process

step 7. Strategies specifications

Here below is the strategy design process :


Figure 13: Strategy specification

 The link for the source spreadsheet:

https://docs.google.com/spreadsheets/d/1m5s2--qrjfdl5QfNG1rHBNnzWXRY8bzdCwtA23omHPo/edit?usp=sharing_eip&ts=5e5fb009

Step 8. Strategies daily computation

 

Figure 14 : Strategy Daily computation


Here is the link to useful document for the design : 

              https://drive.google.com/file/d/1XhsQsZnmlyEPA2PIs2pkXYFHxVQyWw8F_AomyPTjFmk/view


Step 9. User Roles specifications for Back End :

Each user of the application should access to a specific documents portfolio split by region for a standard user and related to a dedicated group.



 

Figure 15: User roles specification

 

More information on the dedicated spreadsheet on Dev environment:

Step 10. Daily update / archive of the UI

Figure 15bis : Daily update

For more detail, please see the documentation:  https://drive.google.com/file/d/1W9Qsa7aO4lVHNYybhsc14pZ3JPudQF33ajr3GbZ6xso/view


Step 11. Real time actions archiving

 

Figure 16: Actions archiving



Step 12. Real time dashboard


Link to the dashboard: https://datastudio.google.com/u/0/reporting/1l7Utyq5GIaVdRCbMpkJrKIbMjc5OFsEo/page/SPkf


Figure 17: Dashboard  

Step 13. Update of BW Dashboard (pending)


Figure 18 :  Update BW reporting  


5.    Contacts:

For maintenance

Figure 19 :  Maintenance rules 

For editable documentation:

https://drive.google.com/file/d/10BIBnqLgH9Ek1Axv3QHv8CBtZdJYbq38k7ue31N2JGo/view


Remark:

PO2 Project (2023) 

Solvay split to Eco and Sco which has spin off on 8 Dec 2023.  As the result of this, Data Engineer need to modify following for Po2