Objectives of the document

This document provides a technical overview of the solution delivered by D3S. It explains the building blocks of the source code and the methodology followed to obtain insightful predictions for cash recovery.

General presentation of the solution

Business stake and approach

The main business stake is to increase overdue coverage with the existing task force. As of today, dunning and pre-dunning actions focus on the largest outstanding amounts, leaving aside smaller accounts (below a threshold). Pre-dunning include some additional rules applied by cash collection teams through a time-consuming manual process.
Cash collection is steered with End of Month KPIs. Although not necessarily representative of the cost of working capital, EOM metrics are relevant as they are fully aligned with other business steering indicators.
Predictive analytics are a way forward, especially to better address the smaller accounts on which the overdue rate is higher.
Wrap Up
Figure 1: objectives and core principles

Machine learning technology models deep correlation patterns from past behavior
A model is trained on 2 years' history, including all documents paid in month or not
The predictive engine generates risk scores allowing to focus on the main stakes
Cash collectors follow the proposed priority, applying their expert judgment as required (known event at payer, dunning blocked, etc.)
Model performance is monitored monthly through a dedicated report (QlikView)
Statistic on each payer's past payment behavior are also made available without drilling down into SaP

Machine learning methodology

The predictive solution leverage machine learning technology. A model is first trained on payment history to learn customer behavior based on all available caracteristics. For new customers, the model infers behavior based on available data (country, currency, sector, invoice-caracteristics, etc)

Figure 2:overview of the machine learning methodology

Outputs available to Cash collectors

The predictive solution generates cash recovery insights at payer and document levels. The illustration below shows the indicators computed at PRS Customer Zone/GBU/PRS Customer level. Similar information is available to drill down on underlying documents, either open or due in the current month.

Figure 3: illustration of indicators computed by the solution (view per GBU)

Technical description

Solution Architecture

Overview

There are four building blocks in or interacting with the solution:

BW/SAP server: interaction in/out from the SAP system
SFTP/SSH server/client: file exchange (in/out) and remotely process execution
Python server: predictive engine and working list generation
MSSQL server: data storage, features engineering and processing

Figure 4 is an overview of the solution architecture. BW pushes new data onto the BW SFTP server (1) and launches a remote command to trigger all processing (2). Python server connects to the BW SFTP server (using a sftp client) to download data files (3). These files are inserted in SQL server tables where they are processed and pushed back to the Python server for predictive modeling. Results are then pushed back to the SQL server. Result tables are available through BW, connecting directly to the SQL server (4).

Figure 4. Solution architecture overview

Communication process between BW and the solution server

Execution is lauched through a SSH command, running a .bat file on the sftp server
End of processing is indicated through the .bat return code
BW then read a specific status table in SQL server, with a code indicating the status and the actions required:
- CODE = 0:
  - Results ready, to be uploaded  BW reads result tables in the SQL server
- CODE = 1
  - Error#1, data upload issue  BW regenerates the input files and re-launch the SSH command
- CODE = 2
  - Error#2, issue encountered  specific manual action required (eg. Full reload of the data set)

Table 1 shows an example of the table status. Columns description are the following:

TIMESTAMP: start timestamp of the command
TIMESTAMP_END: end of the execution of the command
COMAND: name of the command executed
CODE: exist code of the application
MESSAGE: description of the exit code

Table 1 Example of STATUS table

TIMESTAMP	TIMESTAMP_END	COMMAND	CODE	MESSAGE
26/04/2017 13:16	26/04/2017 13:19	run	0	OK

Inbound connectors

For inbound connector, BW is interacting with the sftp/ssh server (see Figure 5).

Figure 5. Inbound connectors sequence diagram
The BW server pushes csv files onto BW sftp server. The following csv file are expected:

C_CST_CA2: credit code table (full replacement at each upload)
C_COMPCDE: company code table (full replacement at each upload)
C_CUSTID: customer code table (full replacement at each upload)
GL_ACCOUNT: gl account code table (full replacement at each upload)
G_CWWE01: sub activity table (full replacement at each upload)
TCURR: exchange rate table (full replacement at each upload)
COUNTRY: country table (full replacement at each upload)
DBFIAR20: credit management table (incremental upload)
DBFIAR21: order blocked table (incremental upload)

Then a ssh remote command launches the process called "pacm-run.bat"

Outbound connectors

Figure 6. outbound connectors with MSSQL sequence diagram
For outbound connector, BW is interacting with the MSSQL database (Figure 6) to read the results tables (schema cm_data). The end of the name of the results tables are linked to the server:

PRODUCTION SERVER: <SERVER> = PRD
PRE-PRODUCTION/TEST SERVER: <SERVER>=TST

The results tables are the following:

PRIORITY_PER_AMOUNT_<SERVER>: predictive indicators and statistics on each open document within prechasing and chasing scope for the month
PRIORITY_PER_PAYER_<SERVER>: predictive indicators aggregated by payers, used for cash collection priorities setting
PERF_MONITORING_<SERVER>: stores predictive performance of the past 4 months (predictive performance indicators are computed on the first day of each month, from previous months' results).

For example for the production server the name of the first result table will be: PRIORITY_PER_AMOUNT_PRD

Solution building blocks

Figure 7 describes all the interaction against the different components.

Figure 7. Sequence diagram

MS SQL server

SQL server is used for data storage, features engineering and processing.
Version : MSQL Server 2012

Connection

Type	Instance	DB
Pre-production	WDCPABP21\PACM_DBPRD	PACM_DBTST
Production	WDCPABP21\PACM_DBPRD	PACM_DBPRD

Inputs/outputs

The input tables are attached to the schema cm_data:

C_CST_CA2: credit code table
C_COMPCDE: company code table
C_CUSTID: customer code table
GL_ACCOUNT: gl account code table
G_CWWE01: sub activity table
TCURR: exchange rate table
COUNTRY: country table
DBFIAR20: credit management table
DBFIAR21: order blocked table

Results tables are attached to the schema cm_data:

PRIORITY_PER_AMOUNT_<SERVER>: list of priority per invoices
PRIORITY_PER_PAYER_<SERVER>: list of priority per payer
PERF_MONITORING_<SERVER>: follow the performance of the past 4 months

Python

Version :

python 3.5.2
scikit-learn 0.17.1
pandas 0.18.1
pymssql 2.1.3
openpyxl 2.3.2
xlrd 1.0.0

Connection

Type	Instance
Pre-production	WDCPAAT21
Production	WDCPAAP21

Inputs/outputs

All interactions with the sftp server go through the pacm-workspace ("F:\pacm_workspace"). The workspace folders are the following:

archive: folder used to store, for each run, the value in the inbox folder
bin: folder with all the executable (run, train, reset_db, ping)
etc: folder with the configuration files for the loging (loging.config) and for the application (pacm.config)
inbox: composed of one folder (in) used to get new data
logs: folder to store the logs

SFTP server

Version : Bitvise SSH Server 7.16

Connection

Type	Instance
Pre-production	WDCPAAT21
Production	WDCPAAP21

Inputs/outputs

Inputs use the folder workspace inbox folder: F:\pacm_workspace\inbox\in to push new data in the application

SFTP client

Version : Bitvise SSH Client 7.22

Connection

Type	Instance
Dev Quality	wbdsapr3.ibm.be.solvay.com wbqsapr3.ibm.be.solvay.com
Production	wbpsapr3.ibm.be.solvay.com

Inputs/outputs

Download all the file from /exploit/BW/PREDICTCM in the workspace inbox folder F:\pacm_workspace\inbox\in using the following command :

sftpc -profile=F:\pacm-workspace\bin\bw_prod_sftp.tlp -hostKeyFile=F:\pacm-workspace\bin\bw_prod_sftp.pub -cmd="cd PREDICTCM; get * F:\pacm-workspace\inbox\in -o"

Deep-dive on solution building blocks

Table 1 describes the main step of the solution main steps. There is four main steps:

Get data from BW
Load data and compute features
Predictive engine and working list generation
Performance assessment

Table 2. Main steps

ID	Main step	Step	Description	Location
0	Get data from BW	From BW to python	Download the data from PW to python inbox folder	Batch script
1	Load data and compute features	From python to SQL	Push the raw data to SQL	Python function
		Features computation	Compute the customer features based on the raw data	SQL queries
2	Predictive engine and working list generation	Load the prediction models	Load the predictions models from the user workspace	Python function
		Get the data from the SQL	Load the data from SQL in python	Python function
		Predict the model	Apply the prediction model	Python function
		Chasing adjustement	Adjust the prediction for chasing invoices	Python function
		Push the predictive indicators to SQL	Write the prediction in SQL	Python function
		Generate result tables	Generate result tables in SQL	SQL queries
3	Performance assessment	Get the result of the past month	Compute the result of the past month	SQL queries
		Compute performance report	Compare the prediction and the reallity	SQL queries
4	Archive input data	Archive input data	Move the input data to the archive folder	Python function

MS SQL server

MS SQL server is used for data storage, features engineering and processing. All the queries are located in: *app\core\sql*
We focus on the online version (app\core\sql\online) of the sql queries in this section. The offline queries (app\core\sql\offline) are used to build-up the training dataset. They can be found in annex "Off line sql queries" (page ). For each source table, all column names and types are described in annex DB table details (page 29).
Figure 8 describes the data model used in the MS SQL server. There are four main steps:

Create schema and table in the database
Preprocess data to compute predictive features (used by the Python predictive model)
Generate result tables, including predictive insights and statistics about past behavior
Evaluate predictive performance, comparing what was predicted and what was ultimately observed (monthly)

Figure 8. SQL Data model

Create database

Folder: app\core\sql\create_db
Description: clear and recreate the full schemas:

cm : containing all input data
cm_view : temporary views for online
cm_view_training : temporary views for training
cm_output: results
cm_data: view used to interact with BW

List of files:

000_create_schema_cm_view_training: : create cm_view_training schema
0a_create_schema_cm.sql: create cm schema
0b_create_schema_cm_view.sql: create cm_view schema
0c_create_schema_cm_output.sql: create cm_output schema
1a_create_current_date_table.sql: create a table with the current date
1_create_table.sql: create input data tables
2_Solvay_functions.sql: define all the specific functions used in the queries
3_create_input_data_view.sql: create cm_data schema and view with the input data
4_create_payment_table.sql: create PAYMENT_DESC_TABLE with the description of the payment method

Preprocess data to compute predictive features

This section followings the graph numbering in Figure 8.

00 – current_data_table

Python function: app.simul.predictive_model.update_simulation_day()
This table contains the current timestamp in UTC, it is updated by python script after loading the data in the database.
Table 3. current_date_table preview

01 – currency_data_per_month

Folder: app\core\sql\create_view\online
Files: 01_transform_currency.sql
From: TCURR table
Where: -
Group by: -
Description: Fill the currency table based on the TCURR table with one exchange rate per month of the year. If for one month, it is not given, then use the last available
Note: K_FFACT and K_TFACT cannot be zero, zero values are replaced per 1
Table 4. currency_data_per_month table preview

02 – currency

Folder: app\core\sql\create_view\online
Files: 02_Currency_preprocessing.sql
From: currency_data_per_month
Where: -
Group by: -
Description: Compute historical statistics for each currency
Table 5. currency table preview

02a – update currency table

Folder: app\core\sql\create_view\online
Files: 02a_modif_currency_table_for_JPY_KRW
From: currency_data_per_month
Where: -
Group by: -
Description: Update CURRENCY table for currency JPY and KRW. SAP store JPY and KRW currency / 100. This table modify the currency table to add a multiplication per 100 (K_TFACT column).
Warning: Hard coded data modification on specific currency JPY and KRW

03 – credit_limit

Folder: app\core\sql\create_view\online
Files: 03_Credit_limit_preprocessing.sql
From: C_CST_CA2, currency
Where: -
Group by: -
Description: Convert the credit limit in EUR
Table 6. credit_limit table preview

04– documents_view

Folder: app\core\sql\create_view\online
Files: 04_Documents_list.sql
From: DBFIAR20, C_CUSTID, GL_ACCOUNT, current_date_table, currency, C_COMPCDE, G_CWWE01, PAYMENT_DESC_TABLE
Where: GL_ACCOUNT."C_GL_TYPE" = 'RECEIVABLES' and (GL_ACCOUNT."C_GL_STYP" = 'PRODUCT AND SERVICES' or GL_ACCOUNT."C_GL_STYP" = 'SERVICES NON DOUBT' or GL_ACCOUNT."C_GL_STYP" = 'PRODUCT NON DOUBT') and DBFIAR20."0C_CTR_AREA" = 'SOLV'
Group by: -
Description: Preliminary filter, to reduce dimensions and transform key datas used in other views. Convert and consolidate the amount in EUR.
Note: For converting the amount in euro, the rate associated with the "0CLEAR_DATE" is used. If the document is still open, then the most recent exchange rate is used.
Warning: Hard coded document filtering is implemented here

05– documents_perimeter

Folder: app\core\sql\create_view\online
Files: 05_Documents_filtered_perimeter.sql
From: documents_view
Where: (documents_view."0CREATEDON" < documents_view."0CLEAR_DATE" or documents_view."0CLEAR_DATE" is null) and documents_view."Amount EUR" is not null
Group by: -
Description: Full data sample on which customer features are computed. Targets to be predicted are computed, as well as most features directly related to the document or static customer attributes. Condition on documents."Amount EUR" allow to remove the documents outside the scope. Table 7 explains the WHERE clause.
Table 7 Explanation of the where clause for documents_perimeter

Query part	Explanation
documents_view."0CREATEDON" < documents_view."0CLEAR_DATE"	remove documents create and clear the same day (automatic clearing)
documents_view."0CLEAR_DATE" is null	keep open documents
documents_view."Amount EUR" is not null	remove document out of the scope (see Amount EUR calculation from documents_view)

06– clearing_list_for_payer_history

Folder: app\core\sql\create_view\online
Files: 06a_Clearing_list_by_payer_for_payer_history.sql
From: documents_view
Where: -
Group by: "Payer ID"
Description: this table computes the average payment day and week of a customer for the last 12 months

07– OB_payer_current_date

Folder: app\core\sql\create_view\online
Files: 07_OB_payer_per_current_date.sql
From: DBFIAR21
Where: -
Group by: "Payer ID "
Description: compute the block orders information per payer (number of blocked order, average resolution time)

08– payer_history_per_month

Folder: app\core\sql\create_view\online
Files: 08_Payer_history_per_month.sql
From: documents_perimeter
Where: ((documents_perimeter.C_FCONNUM is not null and documents_perimeter.[0LOGSYS] = 'PI1_020' and ([0CLEAR_DATE] >= '2017-02-01' or [0PSTNG_DATE] >= '2017-02-01')) or ([0CLEAR_DATE] < '2017-02-01') or (documents_perimeter.C_FCONNUM is null))
Group by: "Payer ID"
Description: past payer behavior for the last 12, 6 and 3 month, accounting for all previous documents keeping only the legal document for the document with a factoring contract number.

10– OD_at_current_Date

Folder: app\core\sql\create_view\online
Files: 10_Open_Documents_at_current_date.sql
From: documents, current_date_table
Where: -
Group by: -
Description: for each payer and current date, this view computes contextual indicators accounting for all open documents

11– core_perimeter

Folder: app\core\sql\create_view\online
Files: 11_Core_perimeter.sql
From: documents, current_date_table, currency, payer_history, payer_history_distinct, OD_at_Due_Date, credit_limit, OB_payer_due_date, DBFIAR20, C_COMPCDE, C_CST_CA2, C_CUSTID, GL_ACCOUNT, G_CWWE01, COUNTRY
Where:

documents_perimeter."0FI_DOCSTAT" = 'O' : keep only open document
(current_date_table."Current Date" < C_COMPCDE."C_MERGDAT" or C_COMPCDE."C_MERGDAT" is null): remove company merged
documents_perimeter."0NETDUEDATE" <= EOMONTH(current_date_table."Current Date"): keep only open document with a net due date in the current month

Group by: -
Description: aggregate all previously computed information into one singe table. This table will then be used to apply the predictive model in Python.

12–result_table

Python function:

app.core.model.predictive_model_pckg.get_open_documents_from_db (): extract from core_perimeter all the data where "0NETDUEDATE" is before or in the current month + (Apply predictive model)
app.core.model.predictive_model_pckg. push_pred_in_db(): push the prediction in the result table

13–PRIORITY_PER_AMOUNT

Folder: app\core\sql\report
Files: 0a_outstanding_amount.sql, 0b_priority_per_amount.sql
From: DBFIAR20, result_table
Where: -
Group by: -
Description: for each document, this view computes contextual indicators accounting for all open documents

14–PRIORITY_PER_PAYER

Folder: app\core\sql\report
Files: 00_Clearing_list_by_payer_for_payer_history_per_GBU.sql, 00_Payer_history_per_month_per_zone_logsys_c_custid.sql, priority_payer.sql, 1_priority_C_CUSTID_payer.sql
From: PRIORITY_PER_AMOUNT
Where: -
Group by: [Payer ID], [0LOGSYS], [C_CUSTID], [PRS Cust. Zone (FI)], [GBU]
Description: for each GBU/payer, this view computes contextual indicators accounting for all open documents

15a–result_table_with_truth

Folder: app\core\sql\evaluate_perf
Files: 01_create_truth_table.sql
Description: Merge the result table of the prediction with what was observed during the previous month.

15b–PERF_MONITORING

Use for the performance evaluation – see User guide/Monitoring
Folder: app\core\sql\evaluate_perf
File: 02_perf_per_day.sql
Description: Performance of the algorithm is computed for each day and each zone from the previous month information. This query is executed at the beginning of each month.

Python

Table 8 describes the main python function called during the generation of the working list.
Table 8. Main python function for working list generation

ID	Main step	Description	Python function
0	Get data from BW	Download the data from PW to python inbox folder	app.process.get_data_from_bw_sftp()
1	Load data	Push the raw data to SQL	app.core.predictive_model.PredictiveModel.insert_data_in_db()
		Compute the customer features based on the raw data	app.predictive_model.create_all_view()
2	Predictive engine and working list generation	Load the predictions models from the user workspace	app.core.predictive_model.PredictiveModel.load()
		Load the data from SQL in python	app.core.predictive_model.PredictiveModel.get_open_documents_from_db()
		Apply the prediction model	app.core.predictive_model_pckg.load_and_apply_model()
		Adjust the prediction for chasing invoices	app.core.predictive_model_pckg.apply_reguralize_ratio_model()
		Write the prediction in SQL	app.core.predictive_model.PredictiveModel.push_pred_in_db()
		Execute SQL query to generate the result tables	app.predictive_model.create_report()
3	Performance assessment	Select results of the previous month	app.predictive_model.evaluate_perf()
		Compare past prediction with observed payment behavior	app.predictive_model.evaluate_perf()
4	Write reports and archive	Move the input data and report results to the archive folder	app.process.archive_inbox()

The main parts of the python code are in:

"app/core": main function for model prediction
"app/simul": function used for simulation
"app/sklearn_ext": function used to preprocess the data
"app/io": function to read or write a files
"app/utils": useful functions like sql connection, ….

User guide

Installation

Two zipped files are needed to install the solution:

pacm-0.8.0.zip: pacm application
pacm-model.zip: predictive model

Step-by-step installation guide:

Unzip the application file in the folder "E:\PACM<version>"
Copy the folder "E:\PACM<version>\workspace_template" to "F:\pacm_workspace"
Update the BIN_FOLDER in : "F:\pacm_workspace\bin\ _env.bat"
set BIN_FOLDER= E:\PACM<version>\bin

Configure the "F:\pacm_workspace\etc\pacm.config":

<ac:structured-macro ac:name="unmigrated-wiki-markup" ac:schema-version="1" ac:macro-id="858e273f-7ca0-4621-9072-46520c8699d8"><ac:plain-text-body><![CDATA[

[python server]
]]></ac:plain-text-body></ac:structured-macro>

server wher the application is run TST or PRD
Server=TST

[sql_server]
server=***
user=***
password=***
database=***
port=***
folder used to bulk insert the result in the table
base_bulk_insert_folder=\\WDCPAAT21\bulk_insert

[model]
to skip the import_data step in case data is inserted directly in db
#skip_import_data=True

[predict]
model folder to use
model_folder= F:\pacm_workspace\models\20161111

[train]
base model folder where the new trained model folder will be created,
named according to current timestamp
model_base_folder=F:\pacm_workspace\models\
training_period_in_month = 36
testing_period_in_month = 6

[dynamic_threshold]
dynamic threshold use to adjust the risk for customer with large outstanding amount
amount_threshold_list = [(300000, 0.5), (600000, 0.25), (1200000, 0)]

|
To test the connection to the db run: "F:\pacm_workspace\bin\ping.bat"
To create the db run: "F:\pacm_workspace\bin\create_db.bat"
Unzip the model (pacm-model.zip) in: "F:\pacm_workspace\models\20161111"
Launch the executable: "F:\pacm_workspace\bin\run.bat"

Figure 9. pacm workspace
Figure 9 shows the folder in the pacm-workspace

Tunning of the dynamic model

One parameter see the dynamic threshold section in the configuration file. This threshold allows to adapt the risk based on the outstanding amount of a customer. This dynamic threshold is used to compute the Risk Amount for the documents with a positive amount with the following formula:
RiskAmount= Outstanding Amount* Risk Probability
with Risk Probability=if Probability not PIM<threshold then 1if Probability not PIM>threshold then Probability not PIM
Table 9 shows the dynamic threshold.
Table 9 Dynamic probability threshold

Total amount outstanding per customer	probability threshold
0 - 300 K€	100%
300 K€ - 600 K€	50%
600 K€ - 1 200 K€	25%
1 200 K€ -	0%

These thresholds can be adjust in the configuration file in the "Dynamic treshold" section:

dynamic threshold use to adjust the risk for customer with large outstanding amount
amount_threshold_list = [(300000, 0.5), (600000, 0.25), (1200000, 0)]
For example:

If a customer has an outstanding amount of 1 500 K€, then the probability threshold is set to 0%. It means that all documents of the customer with a positive amount will have a risk probability set to 1. The other documents will keep the value of probability computed by the model.
If a customer has an outstanding amount of 700 K€, then the probability threshold is set to 25%. It means that all documents of the customer with a positive amount and with a probability not PIM over 25% will have a risk probability set to 1. The other documents will keep the value of probability computed by the model.

If a customer has an outstanding amount of 100 K€, then the probability threshold is set to 100%. It means that all documents of the customer will keep the value of probability computed by the model.

Monitoring

Performance

The solution generates a performance monitoring table at the beginning of each month, comparing predictive insights generated with realized payment behavior. Description of the performing monitoring columns are in annex "Description of the performance monitoring table" (page )
The usual predictive performance monitoring metrics are:

Recall Score: it quantifies the true positive rate (value between 0 and 1).

Precision Score: it quantifies how many documents are flagged as positive (depending on the target this will mean 'not paid in month' or 'overdue') are actually so (value between 0 and 1).

AUC: Area under the Curve, which plots Recall and precision for different decision threshold.

As the main objectif is to prioritize cash collection actions, the solution also generates Lorenz curves. They sort customers with different ordering criteria (simulating the working list ranking) and compare the total amount not paide in month for the first N customers. The curves allow to compare three different criteria and monitor the predictive ranking gain:
Customer number
Figure 10. Ranking performance

Technical

All application logs are stored in the folder "F:\pacm_workspace\logs".
All the input data are archived in the volume "F:\pacm_workspace\archive". If the folder size exceed the current volume size (100Go), the volume size should be increase.
If the size of the data base execeed the SQL database size, the data base should be increase.

Procedure to retrain the model

Attention point: model training is a long process (around 12 hours)
Three models are used:

Model for prediction of the documents not paid in the current month (prediction_model_training_calibrated_1_PIM.pk)
Model for prediction of the documents overdue (prediction_model_training_calibrated_1_overdue.pk)
Regularisation model to correct the probability for long overdue documents (reg_model_All_norm.pk)

When to retrain the model?

The model should be retrained when one of the following happens:

Major point-in-time degradation of the predictive performance, as shown with Lorentz curve metrics (see above)
Smaller but progressive drift during two months, assessed through either:
- Lorentz curve metrics (see above)
- Expected not PIM amount ratio (comparing the expected not PIM amount with the realized amount not Paid In Month). Point-in-time delta can happen due to very large amounts on some overdue documents, but a continuous drift requires attention.

How to retrain the model (step-by-step)?

Retrain model for prediction of not paid documents in the current month / documents overdue

Set the training period in the configuration file (train section)

[train]

base model folder where the new trained model folder will be created,
named according to current timestamp
model_base_folder=F:\pacm_workspace\models\
training_period_in_month = 30
testing_period_in_month = 6
To train the model run: "F:\pacm_workspace\bin\train.bat"
A new folder is created in the "model_base_folder"
Change in the predict section of the configuration file the new folder path

[predict]

model folder to use
model_folder= F:\pacm_workspace\models\20161111
This procedure uses lots of memory and is time consuming (~12h). It is advices to retrain the model on a local computer. To re-train localy the model:

Copy the online DataBase on your local environment
Install the application on your computer
Run the train process

Retrain model for regularization model

To retrain the regularization model the data of the past four month are used (in table result_table_with_truth).

To train the model run: "F:\pacm_workspace\bin\ train_regularization_model.bat"
A new folder is created in the "model_base_folder"
Change in the predict section of the configuration file the new folder path

[predict]

model folder to use
model_folder= F:\pacm_workspace\models\20161111

What are the checkpoints?
Check the model validity using the performance assessment file stored in the folder with the trained model. For the overdue model the file is : prediction_model_training_calibrated_1_overdue.csv and for the not paid in month model the file is : prediction_model_training_calibrated_1_PIM.csv.
Model information stored in the file are the following:

Total_observation : total number of observation
Total_observation_Train : number of observations used to train the model
Total_observation_Test : number of observations user to test the model
AUC_score : Indicator of the global model performance. This score corresponds to the area under the precision-recall curve. This score should be over 0.85 (see monitoring performance section).
Confusion matrix : precision, recall and support on the test data. The first line is for the document paid in month, the second line is for document not paid in month.
Features importances of the variable in the model : List of the feature importance in the variable in the model

You will find below an example of prediction_model_training_calibrated_1_PIM.csv file.
prediction_model_training_calibrated_1_PIM
Total_observation : 8083490
Total_Observations_Train : 7028867
Total_Observations_Test : 1054623
AUC_score : 0.909
Scores_Matrix :
"precision,recall,support"
"0.889,0.939,807373.000"
"0.756,0.616,247250.000"
Features importances :
"variable_name,feature_importance"
"scenario,0.208"
"Rate_Not_PIM_last_12_month,0.087"
"AVG_delay_over_remaining_days_last_12_month,0.072"
"AVG_delay_over_remaining_days_last_3_month,0.052"
"AVG_delay_over_remaining_days_last_6_month,0.045"
"Ratio_remaining_days_in_month,0.041"
"C_COMPCAF,0.036"
"Amount_Rate_Not_PIM_last_12_month,0.033"
"Ratio_AVG_day_of_month,0.031"
"Rate_Not_PIM_last_6_month,0.029"
"Payment Term,0.029"
"0POST_KEY,0.028"
"Nb_docs_last_12_month,0.026"
"Nb_docs_last_3_month,0.025"
"Amount EUR,0.025"
"Nb_docs_last_6_month,0.024"
"Relative_Amount_vs_OD,0.020"
"Amount_ratio_Late_OD,0.019"
"Ratio_AVG_week_number,0.019"
"Rate_Not_PIM_last_3_month,0.017"
"nb_distinct_clear_date,0.016"
"STDEV_week_number,0.016"
"STDEV_day_of_month,0.015"
"Amount_Rate_Not_PIM_last_6_month,0.014"
"C_DOCTYP,0.014"
"LAST_C_PM_MTHD,0.012"
"GBU_group,0.012"
"Amount_Rate_Not_PIM_last_3_month,0.012"
"Create_post_lag,0.010"
"OD_Due_Date_vs_CL_EUR,0.005"
"OB_nb_overdue,0.004"
"C_COMPCDE__K_INTRAT,0.002"

Procedure for Exec builder

If some modification is made to the source code, then the application executable must be re-build. The steps to package the python source code in an application are :

Check the version of the application to build in the file "version.py"
Run "build.bat"

The compress application named "pacm-<version>.zip" is created in the folder "dist"
Then follow the installation steps of the user guide.

APPENDIX

Off line sql queries

For training/offline, all the function works in cm_view_training schema.
These offlines queries are run to compute the dataset used to train the model. To simulate the offline behavior for each day in the training period, a view of DBFIAR20 is created with only the data availaible at the simulation date. Based on this view, all the customer features are computed (same queries as online on cm_view_training schema). To reduce the size of the training data, only the 1, 5 ,10 ,15, 20, 25 and last day of month are used. Figure 11 shows the training data generation mechanism.

Figure 11 Training data generation

01 – create schema cm_view_training

Folder: app\core\sql\create_view\offline
Files: 000_create_schema_cm_view_training.sql
From: -
Where: -
Group by: -
Description: create the schema cm_view_training

02 – create credit_mgt_view

Folder: app\core\sql\create_view\offline
Files: 001a_modif_input_data_to_keep_doc.sql
From: DBFIAR20
Where: [0PSTNG_DATE] <= cm_view_training.current_date_table."Current Date"
Group by: -
Description: This view is a preliminary filter, to simulate the DBFIAR20 data availaible at a given date. This view is then used to apply all the online queries.

03 – APPLY THE ONLINE QUERIES ON CREDIT_MGT_VIEW in shema cm_view_training

Folder: app\core\sql\create_view\offline
Files: 01_transform_currency.sql, 02a_modif_currency_table_for_JPY_KRW.sql, 02_Currency_preprocessing.sql, 03_Credit_limit_preprocessing.sql, 04_Documents_list.sql, 05_Documents_filtered_perimeter.sql, 06a_Clearing_list_by_payer_for_payer_history.sql, 07_OB_payer_per_current_date.sql, 08_Payer_history_per_month.sql, 10_Open_Documents_at_current_date.sql
Description: Same queries than online apply on cm_view_training schema and based on the credit_mgt_view (modification in 04_Documents_list.sql in the FROM clause)

04– Save the core_perimeter table in core_perimeter_concat

Folder: app\core\sql\create_view\offline
Files: 12_Core_perimeter_concat.sql
From: core_perimeter
Where: -
Group by: -
Description: Save the previous core_perimeter table

05- Create training data with the updated target

Folder: app\core\sql\create_view\offline\final_table
Files: 0_create_result_view.sql
From: core_perimeter_concat
Where: ((C_FCONNUM is not null and 0LOGSYS] = 'PI1_020' and ([0CLEAR_DATE] >= '2017-02-01' or [0PSTNG_DATE] >= '2017-02-01')) or ([0CLEAR_DATE] < '2017-02-01') or (C_FCONNUM is null))
Group by:
Description: Generate the training data keeping only the legal document for the document with a factoring contract number

DB table details

Table 11 describes the input format of the table in the database.
Table 11. Table column definition

TABLE_SCHEMA	TABLE_NAME	COLUMN_NAME	DATA_TYPE
cm_data	STATUS_TST	TIMESTAMP	datetime
cm_data	STATUS_TST	TIMESTAMP_END	datetime
cm_data	STATUS_TST	COMMAND	varchar
cm_data	STATUS_TST	CODE	int
cm_data	STATUS_TST	MESSAGE	varchar
cm_data	PRIORITY_PER_PAYER_TST	0LOGSYS	char
cm_data	PRIORITY_PER_PAYER_TST	C_CUSTID	char
cm_data	PRIORITY_PER_PAYER_TST	PRS Cust. Zone (FI)	varchar
cm_data	PRIORITY_PER_PAYER_TST	GBU	varchar
cm_data	PRIORITY_PER_PAYER_TST	PRS Customer	varchar
cm_data	PRIORITY_PER_PAYER_TST	outstanding amount	numeric
cm_data	PRIORITY_PER_PAYER_TST	overdue amount	numeric
cm_data	PRIORITY_PER_PAYER_TST	overdue amount_0_4	numeric
cm_data	PRIORITY_PER_PAYER_TST	overdue amount_5_10	numeric
cm_data	PRIORITY_PER_PAYER_TST	overdue amount_11_30	numeric
cm_data	PRIORITY_PER_PAYER_TST	overdue amount_31_60	numeric
cm_data	PRIORITY_PER_PAYER_TST	overdue amount_61_90	numeric
cm_data	PRIORITY_PER_PAYER_TST	overdue amount_90+	numeric
cm_data	PRIORITY_PER_PAYER_TST	Prechasing amt lst 3 dom	numeric
cm_data	PRIORITY_PER_PAYER_TST	Risk amount not PIM	numeric
cm_data	PRIORITY_PER_PAYER_TST	probability_Not_PIM	decimal
cm_data	PRIORITY_PER_PAYER_TST	Prechasing amount	numeric
cm_data	PRIORITY_PER_PAYER_TST	Risk not PIM prechasing	numeric
cm_data	PRIORITY_PER_PAYER_TST	Risk overdue prechasing	numeric
cm_data	PRIORITY_PER_PAYER_TST	Per. Exp. Amt overdue	numeric
cm_data	PRIORITY_PER_PAYER_TST	chasing amount	numeric
cm_data	PRIORITY_PER_PAYER_TST	Risk not PIM chasing	numeric
cm_data	PRIORITY_PER_PAYER_TST	Risk ranking	int
cm_data	PRIORITY_PER_PAYER_TST	OB_nb_overdue	int
cm_data	PRIORITY_PER_PAYER_TST	LAST_C_PM_MTHD	char
cm_data	PRIORITY_PER_PAYER_TST	Nb_docs_last_12_month	int
cm_data	PRIORITY_PER_PAYER_TST	Nb_docs_Not_PIM_last_12_month	int
cm_data	PRIORITY_PER_PAYER_TST	Nb_docs_last_6_month	int
cm_data	PRIORITY_PER_PAYER_TST	Nb_docs_Not_PIM_last_6_month	int
cm_data	PRIORITY_PER_PAYER_TST	Nb_docs_last_3_month	int
cm_data	PRIORITY_PER_PAYER_TST	Nb_docs_Not_PIM_last_3_month	int
cm_data	PRIORITY_PER_PAYER_TST	Nb_late_docs_12_month	int
cm_data	PRIORITY_PER_PAYER_TST	Cumulated_Delay_last_12_month	int
cm_data	PRIORITY_PER_PAYER_TST	Nb_late_docs_6_month	int
cm_data	PRIORITY_PER_PAYER_TST	Cumulated_Delay_last_6_month	int
cm_data	PRIORITY_PER_PAYER_TST	Nb_late_docs_3_month	int
cm_data	PRIORITY_PER_PAYER_TST	Cumulated_Delay_last_3_month	int
cm_data	PRIORITY_PER_PAYER_TST	Pay cycle_dom	numeric
cm_data	PRIORITY_PER_PAYER_TST	Pay_cycle_week	numeric
cm_data	PRIORITY_PER_PAYER_TST	Nb GBU for the payer	int
cm_data	PRIORITY_PER_AMOUNT_TST	simulation_ts	datetime
cm_data	PRIORITY_PER_AMOUNT_TST	0LOGSYS	char
cm_data	PRIORITY_PER_AMOUNT_TST	C_COMPCDE	char
cm_data	PRIORITY_PER_AMOUNT_TST	C_CUSTID	char
cm_data	PRIORITY_PER_AMOUNT_TST	0FISCPER	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	0AC_DOC_NO	char
cm_data	PRIORITY_PER_AMOUNT_TST	0ITEM_NUM	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	0FI_DSBITEM	char
cm_data	PRIORITY_PER_AMOUNT_TST	PRS Cust. Zone (FI)	varchar
cm_data	PRIORITY_PER_AMOUNT_TST	GBU	varchar
cm_data	PRIORITY_PER_AMOUNT_TST	PRS Customer	varchar
cm_data	PRIORITY_PER_AMOUNT_TST	Payer ID	varchar
cm_data	PRIORITY_PER_AMOUNT_TST	outstanding amount	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	overdue amount	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	overdue amount_0_4	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	overdue amount_5_10	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	overdue amount_11_30	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	overdue amount_31_60	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	overdue amount_61_90	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	overdue amount_90+	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	pre-chasing amount last three	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	0COUNTRY	char
cm_data	PRIORITY_PER_AMOUNT_TST	Amount EUR	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	0POST_KEY	char
cm_data	PRIORITY_PER_AMOUNT_TST	0NETDUEDATE	date
cm_data	PRIORITY_PER_AMOUNT_TST	scenario	varchar
cm_data	PRIORITY_PER_AMOUNT_TST	scenario_last_3_days	varchar
cm_data	PRIORITY_PER_AMOUNT_TST	Number of days past due date	int
cm_data	PRIORITY_PER_AMOUNT_TST	Risk amount not PIM	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	probability_Not_PIM	decimal
cm_data	PRIORITY_PER_AMOUNT_TST	probability_Not_PIM_Risk	decimal
cm_data	PRIORITY_PER_AMOUNT_TST	proba_overdue	decimal
cm_data	PRIORITY_PER_AMOUNT_TST	Prechasing amount	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	Risk not PIM prechasing	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	Risk overdue prechasing	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	chasing amount	numeric
cm_data	PRIORITY_PER_AMOUNT_TST	Risk not PIM chasing	numeric
cm_data	PERF_MONITORING_TST	evaluation day	datetime
cm_data	PERF_MONITORING_TST	simulation_day	datetime
cm_data	PERF_MONITORING_TST	C_CUSTID__C_ZONEFI	char
cm_data	PERF_MONITORING_TST	precision_Paid_In_Month	numeric
cm_data	PERF_MONITORING_TST	recall_Paid_In_Month	numeric
cm_data	PERF_MONITORING_TST	precision_Not_Paid_In_Month	numeric
cm_data	PERF_MONITORING_TST	recall_Not_Paid_In_Month	numeric
cm_data	PERF_MONITORING_TST	Expected not PIM amount	numeric
cm_data	PERF_MONITORING_TST	Not PIM amount	numeric
cm_data	PERF_MONITORING_TST	Expected not PIM amount ratio	numeric
cm_data	PERF_MONITORING_TST	NOT_PIM	numeric
cm_data	PERF_MONITORING_TST	CB_NOT_PIM_50	numeric
cm_data	PERF_MONITORING_TST	CB_NOT_PIM_100	numeric
cm_data	PERF_MONITORING_TST	CB_NOT_PIM_200	numeric
cm_data	PERF_MONITORING_TST	CB_NOT_PIM_300	numeric
cm_data	PERF_MONITORING_TST	CB_NOT_PIM_400	numeric
cm_data	PERF_MONITORING_TST	CB_NOT_PIM_500	numeric
cm_data	PERF_MONITORING_TST	CB_NOT_PIM_1000	numeric
cm_data	PERF_MONITORING_TST	A_NOT_PIM_50	numeric
cm_data	PERF_MONITORING_TST	A_NOT_PIM_100	numeric
cm_data	PERF_MONITORING_TST	A_NOT_PIM_200	numeric
cm_data	PERF_MONITORING_TST	A_NOT_PIM_300	numeric
cm_data	PERF_MONITORING_TST	A_NOT_PIM_400	numeric
cm_data	PERF_MONITORING_TST	A_NOT_PIM_500	numeric
cm_data	PERF_MONITORING_TST	A_NOT_PIM_1000	numeric
cm_data	PERF_MONITORING_TST	EA_NOT_PIM_50	numeric
cm_data	PERF_MONITORING_TST	EA_NOT_PIM_100	numeric
cm_data	PERF_MONITORING_TST	EA_NOT_PIM_200	numeric
cm_data	PERF_MONITORING_TST	EA_NOT_PIM_300	numeric
cm_data	PERF_MONITORING_TST	EA_NOT_PIM_400	numeric
cm_data	PERF_MONITORING_TST	EA_NOT_PIM_500	numeric
cm_data	PERF_MONITORING_TST	EA_NOT_PIM_1000	numeric
cm_data	PERF_MONITORING_TST	REA_NOT_PIM_50	numeric
cm_data	PERF_MONITORING_TST	REA_NOT_PIM_100	numeric
cm_data	PERF_MONITORING_TST	REA_NOT_PIM_200	numeric
cm_data	PERF_MONITORING_TST	REA_NOT_PIM_300	numeric
cm_data	PERF_MONITORING_TST	REA_NOT_PIM_400	numeric
cm_data	PERF_MONITORING_TST	REA_NOT_PIM_500	numeric
cm_data	PERF_MONITORING_TST	REA_NOT_PIM_1000	numeric
cm_data	PERF_MONITORING_TST	S_NOT_PIM	numeric
cm_data	PERF_MONITORING_TST	S_CB_NOT_PIM_50	numeric
cm_data	PERF_MONITORING_TST	S_CB_NOT_PIM_100	numeric
cm_data	PERF_MONITORING_TST	S_CB_NOT_PIM_200	numeric
cm_data	PERF_MONITORING_TST	S_CB_NOT_PIM_300	numeric
cm_data	PERF_MONITORING_TST	S_CB_NOT_PIM_400	numeric
cm_data	PERF_MONITORING_TST	S_CB_NOT_PIM_500	numeric
cm_data	PERF_MONITORING_TST	S_CB_NOT_PIM_1000	numeric
cm_data	PERF_MONITORING_TST	S_A_NOT_PIM_50	numeric
cm_data	PERF_MONITORING_TST	S_A_NOT_PIM_100	numeric
cm_data	PERF_MONITORING_TST	S_A_NOT_PIM_200	numeric
cm_data	PERF_MONITORING_TST	S_A_NOT_PIM_300	numeric
cm_data	PERF_MONITORING_TST	S_A_NOT_PIM_400	numeric
cm_data	PERF_MONITORING_TST	S_A_NOT_PIM_500	numeric
cm_data	PERF_MONITORING_TST	S_A_NOT_PIM_1000	numeric
cm_data	PERF_MONITORING_TST	S_EA_NOT_PIM_50	numeric
cm_data	PERF_MONITORING_TST	S_EA_NOT_PIM_100	numeric
cm_data	PERF_MONITORING_TST	S_EA_NOT_PIM_200	numeric
cm_data	PERF_MONITORING_TST	S_EA_NOT_PIM_300	numeric
cm_data	PERF_MONITORING_TST	S_EA_NOT_PIM_400	numeric
cm_data	PERF_MONITORING_TST	S_EA_NOT_PIM_500	numeric
cm_data	PERF_MONITORING_TST	S_EA_NOT_PIM_1000	numeric
cm_data	PERF_MONITORING_TST	S_REA_NOT_PIM_50	numeric
cm_data	PERF_MONITORING_TST	S_REA_NOT_PIM_100	numeric
cm_data	PERF_MONITORING_TST	S_REA_NOT_PIM_200	numeric
cm_data	PERF_MONITORING_TST	S_REA_NOT_PIM_300	numeric
cm_data	PERF_MONITORING_TST	S_REA_NOT_PIM_400	numeric
cm_data	PERF_MONITORING_TST	S_REA_NOT_PIM_500	numeric
cm_data	PERF_MONITORING_TST	S_REA_NOT_PIM_1000	numeric
cm_data	DBFIAR21	0LOGSYS	char
cm_data	DBFIAR21	0DOC_NUMBER	char
cm_data	DBFIAR21	C_BLOCKTY	char
cm_data	DBFIAR21	0CSM_CRTI	numeric
cm_data	DBFIAR21	0RECORDMODE	char
cm_data	DBFIAR21	0CSM_CLTI	numeric
cm_data	DBFIAR21	0DATEFROM	date
cm_data	DBFIAR21	0DATETO	date
cm_data	DBFIAR21	C_COMPPRS	char
cm_data	DBFIAR21	C_CRDACC	char
cm_data	DBFIAR21	0C_CTR_AREA	char
cm_data	DBFIAR21	C_CUSTPR	char
cm_data	DBFIAR21	0G_CWWE01	char
cm_data	DBFIAR21	0CRED_GROUP	char
cm_data	DBFIAR21	0REPR_GROUP	char
cm_data	DBFIAR21	K_COUNTER	decimal
cm_data	DBFIAR21	C_NBHOURS	numeric
cm_data	DBFIAR21	C_MTHFROM	numeric
cm_data	DBFIAR21	C_MTHTO	numeric
cm_data	DBFIAR21	C_CUSTID	char
cm_data	DBFIAR21	C_CUSTPRS	char
cm_data	DBFIAR21	C_PAYERID	char
cm_data	DBFIAR21	C_CRD_AC	char
cm_data	DBFIAR21	C_CST_CA2	char
cm_data	DBFIAR21	CDOC_TYPE	char
cm_data	DBFIAR21	C_TIMESTP	numeric
cm_data	GL_ACCOUNT	0CHRT_ACCTS	char
cm_data	GL_ACCOUNT	0GL_ACCOUNT	char
cm_data	GL_ACCOUNT	0BAL_FLAG	char
cm_data	GL_ACCOUNT	0INCST_FLAG	char
cm_data	GL_ACCOUNT	0LOGSYS	char
cm_data	GL_ACCOUNT	0SEM_POSIT	char
cm_data	GL_ACCOUNT	0SOURSYSTEM	char
cm_data	GL_ACCOUNT	C_GL_TYPE	char
cm_data	GL_ACCOUNT	C_GL_STYP	char
cm_data	GL_ACCOUNT	C_MGN_ACC	char
cm_data	GL_ACCOUNT	C_EXTRFLG	char
cm_data	GL_ACCOUNT	C_TEMPRES	char
cm_data	GL_ACCOUNT	C_STKACCT	char
cm_data	GL_ACCOUNT	C_INTRFLG	char
cm_data	GL_ACCOUNT	C_GLFAMIL	char
cm_data	C_CST_CA2	0C_CTR_AREA	char
cm_data	C_CST_CA2	C_CST_CA2	char
cm_data	C_CST_CA2	0CRED_GROUP	char
cm_data	C_CST_CA2	0CURRENCY	char
cm_data	C_CST_CA2	0CRED_LIMIT	decimal
cm_data	C_CST_CA2	0RC_LIM_CUR	char
cm_data	C_CST_CA2	0REC_CR_LM	decimal
cm_data	C_CST_CA2	0RISK_CATEG	char
cm_data	C_CST_CA2	C_GARAM	numeric
cm_data	C_CST_CA2	0CRED_ACCNT	char
cm_data	C_CST_CA2	0REPR_GROUP	char
cm_data	C_CST_CA2	0RATING	char
cm_data	C_CST_CA2	0PMNT_INDEX	char
cm_data	C_CST_CA2	0NXT_REVIEW	decimal
cm_data	C_CST_CA2	0LST_REVIEW	decimal
cm_data	C_CST_CA2	0LST_INT_RV	decimal
cm_data	C_CST_CA2	0LOGSYS	char
cm_data	C_CST_CA2	0CUST_GR_CM	char
cm_data	C_CST_CA2	C_CRDSTAT	char
cm_data	C_CST_CA2	C_RISKMAN	char
cm_data	C_CST_CA2	C_CRDACC	char
cm_data	C_CST_CA2	C_ACTDAT	date
cm_data	C_CST_CA2	C_ACTMONT	numeric
cm_data	C_CST_CA2	C_LRE_DAT	date
cm_data	C_CST_CA2	C_NRE_DAT	date
cm_data	C_CUSTID	0LOGSYS	char
cm_data	C_CUSTID	C_CUSTID	char
cm_data	C_CUSTID	C_CUSTPRS	char
cm_data	C_CUSTID	C_CUSTMPR	char
cm_data	C_CUSTID	C_CUSTPR	char
cm_data	C_CUSTID	0ACCNT_GRP	char
cm_data	C_CUSTID	0ADDR_NUMBR	char
cm_data	C_CUSTID	0AF_CUSTDC	char
cm_data	C_CUSTID	0AF_CUSTID	char
cm_data	C_CUSTID	0APO_LOCNO	char
cm_data	C_CUSTID	0BPARTNER	char
cm_data	C_CUSTID	0CITY_2	char
cm_data	C_CUSTID	0COUNTRY	char
cm_data	C_CUSTID	0CUST_CLASS	char
cm_data	C_CUSTID	0CUST_MKT	char
cm_data	C_CUSTID	0DBDUNS_NUM	numeric
cm_data	C_CUSTID	0FISCVARNT	char
cm_data	C_CUSTID	0ID_XCPD	char
cm_data	C_CUSTID	0INDUSTRY	char
cm_data	C_CUSTID	0NAME	varchar
cm_data	C_CUSTID	0TAX_NUMB	char
cm_data	C_CUSTID	0TAX_NUMB2	char
cm_data	C_CUSTID	0VISIT_RYT	char
cm_data	C_CUSTID	C_INT_GRP	char
cm_data	C_CUSTID	C_DUNSHQ	numeric
cm_data	C_CUSTID	C_DUNSPA	numeric
cm_data	C_CUSTID	C_DUNSDU	numeric
cm_data	C_CUSTID	C_DUNSGU	numeric
cm_data	C_CUSTID	0ACCNT_GRPV	char
cm_data	C_CUSTID	0DEL_INDIC	char
cm_data	C_CUSTID	C_STCEG	char
cm_data	C_CUSTID	C_STAT_SL	char
cm_data	C_CUSTID	0LANGU	char
cm_data	C_CUSTID	0TRANSPZONE	char
cm_data	C_CUSTID	C_LOCATIO	numeric
cm_data	C_CUSTID	C_ZONEFI	char
cm_data	C_CUSTID	C_FLGINT	char
cm_data	C_CUSTID	C_CSMETH	char
cm_data	C_CUSTID	C_ENTRP	numeric
cm_data	C_CUSTID	C_ZONE	char
cm_data	C_CUSTID	C_MZONE	char
cm_data	C_CUSTID	C_PCOMPAN	char
cm_data	C_CUSTID	0CLM_CLGR	char
cm_data	C_CUSTID	0CLM_CLSP	char
cm_data	C_CUSTID	C_GZONE	char
cm_data	TCURR	KURST	char
cm_data	TCURR	FCURR	char
cm_data	TCURR	TCURR	char
cm_data	TCURR	GDATU	date
cm_data	TCURR	UKURS	decimal
cm_data	TCURR	FFACT	decimal
cm_data	TCURR	TFACT	decimal
cm_data	DBFIAR20	0AC_DOC_NO	char
cm_data	DBFIAR20	0FISCPER	numeric
cm_data	DBFIAR20	0FISCVARNT	char
cm_data	DBFIAR20	0FI_DSBITEM	char
cm_data	DBFIAR20	0ITEM_NUM	numeric
cm_data	DBFIAR20	C_CUSTID	char
cm_data	DBFIAR20	C_COMPCDE	char
cm_data	DBFIAR20	0LOGSYS	char
cm_data	DBFIAR20	0DOC_CURRCY	char
cm_data	DBFIAR20	0LOC_CURRCY	char
cm_data	DBFIAR20	0CHRT_ACCTS	char
cm_data	DBFIAR20	0CLEAR_DATE	date
cm_data	DBFIAR20	0CLR_DOC_NO	char
cm_data	DBFIAR20	0COUNTRY	char
cm_data	DBFIAR20	0CREATEDON	date
cm_data	DBFIAR20	0DOC_DATE	date
cm_data	DBFIAR20	0FI_DOCSTAT	char
cm_data	DBFIAR20	0G_CWWE01	char
cm_data	DBFIAR20	0G_CWWE13	char
cm_data	DBFIAR20	0GL_ACCOUNT	char
cm_data	DBFIAR20	0LAST_DUNN	date
cm_data	DBFIAR20	0NETDUEDATE	date
cm_data	DBFIAR20	0POST_KEY	char
cm_data	DBFIAR20	0PSTNG_DATE	date
cm_data	DBFIAR20	C_CST_CA2	char
cm_data	DBFIAR20	C_CUSTPRS	char
cm_data	DBFIAR20	C_DOCTYP	char
cm_data	DBFIAR20	C_DUNN_BL	char
cm_data	DBFIAR20	C_PM_MTHD	char
cm_data	DBFIAR20	0C_CTR_AREA	char
cm_data	DBFIAR20	C_SALEMP	char
cm_data	DBFIAR20	0DEB_CRE_DC	decimal
cm_data	DBFIAR20	0DEB_CRE_LC	decimal
cm_data	DBFIAR20	C_FCONNUM	char
cm_data	DBFIAR20	C_TIMESTP	numeric
cm_data	DBFIAR20	C_LGSYSAF	char
cm_data	DBFIAR20	C_COMPCAF	char
cm_data	COUNTRY	0COUNTRY	char
cm_data	COUNTRY	C_ZONE	char
cm_data	COUNTRY	C_GZONE	char
cm_data	COUNTRY	C_MZONE	char
cm_data	COUNTRY	C_ZREACH	char
cm_data	COUNTRY	C_ZONEFI	char
cm_data	COUNTRY	C_PZONE	char
cm_data	COUNTRY	C_STDPTRM	numeric
cm_data	C_COMPCDE	0LOGSYS	char
cm_data	C_COMPCDE	C_COMPCDE	char
cm_data	C_COMPCDE	0CHRT_ACCTS	char
cm_data	C_COMPCDE	0COMPANY	char
cm_data	C_COMPCDE	0COUNTRY	char
cm_data	C_COMPCDE	0CURRENCY	char
cm_data	C_COMPCDE	0C_CTR_AREA	char
cm_data	C_COMPCDE	0FISCVARNT	char
cm_data	C_COMPCDE	0SOURSYSTEM	char
cm_data	C_COMPCDE	0OFYEAR	numeric
cm_data	C_COMPCDE	0OFPER	numeric
cm_data	C_COMPCDE	0OFPER3	numeric
cm_data	C_COMPCDE	0RETROPOST	char
cm_data	C_COMPCDE	C_ZONE	char
cm_data	C_COMPCDE	C_STAT_SL	char
cm_data	C_COMPCDE	C_CONTRIB	numeric
cm_data	C_COMPCDE	C_DEFAREA	char
cm_data	C_COMPCDE	C_DEFPCTR	char
cm_data	C_COMPCDE	C_FGPTOB	char
cm_data	C_COMPCDE	0CO_AREA	char
cm_data	C_COMPCDE	C_COMPPRS	char
cm_data	C_COMPCDE	C_LANDSCP	char
cm_data	C_COMPCDE	C_FLGINT	char
cm_data	C_COMPCDE	C_CSMETH	char
cm_data	C_COMPCDE	C_ENTRP	numeric
cm_data	C_COMPCDE	C_ZONEPUR	char
cm_data	C_COMPCDE	K_INTRAT	decimal
cm_data	C_COMPCDE	C_MNGAREA	char
cm_data	C_COMPCDE	C_MNGCTRY	char
cm_data	C_COMPCDE	C_PZONE	char
cm_data	C_COMPCDE	C_PSCOPE	char
cm_data	C_COMPCDE	0VALIDTO	date
cm_data	C_COMPCDE	0VALIDFROM	date
cm_data	C_COMPCDE	C_MERBK	char
cm_data	C_COMPCDE	C_PMERBK	char
cm_data	C_COMPCDE	C_BFCCOMP	char
cm_data	C_COMPCDE	C_MERGDAT	date
cm_data	G_CWWE01	0G_CWWE01	char
cm_data	G_CWWE01	0G_CWWE02	char
cm_data	G_CWWE01	C_PFCTR2	char
cm_data	G_CWWE01	C_MAGNITU	char
cm_data	G_CWWE01	C_PFCTR1	char
cm_data	G_CWWE01	CPFCTR1_2	char
cm_data	G_CWWE01	CPFCTR2_2	char
cm_data	G_CWWE01	C_PFCTR_3	char

Description of the performance monitoring table

TABLE_NAME	COLUMN_NAME	DESCRIPTION
PERF_MONITORING	evaluation day	evalution day of the performance
PERF_MONITORING	simulation_day	day of the prediction
PERF_MONITORING	C_CUSTID__C_ZONEFI	zone of the payer
PERF_MONITORING	precision_Paid_In_Month	precision score of Paid In Month (PIM) documents
PERF_MONITORING	recall_Paid_In_Month	recall score of Paid In Month documents
PERF_MONITORING	precision_Not_Paid_In_Month	precision score of Not Paid In Month documents
PERF_MONITORING	recall_Not_Paid_In_Month	recall score of Not Paid In Month documents
PERF_MONITORING	Expected not PIM amount	expected amount Not Paid In Month
PERF_MONITORING	Not PIM amount	amount Not Paid In Month
PERF_MONITORING	Expected not PIM amount ratio	ratio beween Expected not PIM amount and Not PIM amount
PERF_MONITORING	CB_NOT_PIM_50	Lorenz curve point with first 50 customers sorted by the true not PIM amount
PERF_MONITORING	CB_NOT_PIM_100	Lorenz curve point with first 100 customers sorted by the true not PIM amount
PERF_MONITORING	CB_NOT_PIM_200	Lorenz curve point with first 200 customers sorted by the true not PIM amount
PERF_MONITORING	CB_NOT_PIM_300	Lorenz curve point with first 300 customers sorted by the true not PIM amount
PERF_MONITORING	CB_NOT_PIM_400	Lorenz curve point with first 400 customers sorted by the true not PIM amount
PERF_MONITORING	CB_NOT_PIM_500	Lorenz curve point with first 500 customers sorted by the true not PIM amount
PERF_MONITORING	A_NOT_PIM_50	Lorenz curve point with first 50 customers sorted by the amount
PERF_MONITORING	A_NOT_PIM_100	Lorenz curve point with first 100 customers sorted by the amount
PERF_MONITORING	A_NOT_PIM_200	Lorenz curve point with first 200 customers sorted by the amount
PERF_MONITORING	A_NOT_PIM_300	Lorenz curve point with first 300 customers sorted by the amount
PERF_MONITORING	A_NOT_PIM_400	Lorenz curve point with first 400 customers sorted by the amount
PERF_MONITORING	A_NOT_PIM_500	Lorenz curve point with first 500 customers sorted by the amount
PERF_MONITORING	EA_NOT_PIM_50	Lorenz curve point with first 50 customers sorted by the Expected not PIM amount
PERF_MONITORING	EA_NOT_PIM_100	Lorenz curve point with first 100 customers sorted by the Expected not PIM amount
PERF_MONITORING	EA_NOT_PIM_200	Lorenz curve point with first 200 customers sorted by the Expected not PIM amount
PERF_MONITORING	EA_NOT_PIM_300	Lorenz curve point with first 300 customers sorted by the Expected not PIM amount
PERF_MONITORING	EA_NOT_PIM_400	Lorenz curve point with first 400 customers sorted by the Expected not PIM amount
PERF_MONITORING	EA_NOT_PIM_500	Lorenz curve point with first 500 customers sorted by the Expected not PIM amount
PERF_MONITORING	REA_NOT_PIM_50	Lorenz curve point with first 50 customers sorted by the Risk amount not PIM
PERF_MONITORING	REA_NOT_PIM_100	Lorenz curve point with first 100 customers sorted by the Risk amount not PIM
PERF_MONITORING	REA_NOT_PIM_200	Lorenz curve point with first 200 customers sorted by the Risk amount not PIM
PERF_MONITORING	REA_NOT_PIM_300	Lorenz curve point with first 300 customers sorted by the Risk amount not PIM
PERF_MONITORING	REA_NOT_PIM_400	Lorenz curve point with first 400 customers sorted by the Risk amount not PIM
PERF_MONITORING	REA_NOT_PIM_500	Lorenz curve point with first 500 customers sorted by the Risk amount not PIM
PERF_MONITORING	S_CB_NOT_PIM_50	Lorenz curve point with first 50 customers (under 1 million €) sorted by the true not PIM amount
PERF_MONITORING	S_CB_NOT_PIM_100	Lorenz curve point with first 100 customers (under 1 million €) sorted by the true not PIM amount
PERF_MONITORING	S_CB_NOT_PIM_200	Lorenz curve point with first 200 customers (under 1 million €) sorted by the true not PIM amount
PERF_MONITORING	S_CB_NOT_PIM_300	Lorenz curve point with first 300 customers (under 1 million €) sorted by the true not PIM amount
PERF_MONITORING	S_CB_NOT_PIM_400	Lorenz curve point with first 400 customers (under 1 million €) sorted by the true not PIM amount
PERF_MONITORING	S_CB_NOT_PIM_500	Lorenz curve point with first 500 customers (under 1 million €) sorted by the true not PIM amount
PERF_MONITORING	S_A_NOT_PIM_50	Lorenz curve point with first 50 customers (under 1 million €) sorted by the amount
PERF_MONITORING	S_A_NOT_PIM_100	Lorenz curve point with first 100 customers (under 1 million €) sorted by the amount
PERF_MONITORING	S_A_NOT_PIM_200	Lorenz curve point with first 200 customers (under 1 million €) sorted by the amount
PERF_MONITORING	S_A_NOT_PIM_300	Lorenz curve point with first 300 customers (under 1 million €) sorted by the amount
PERF_MONITORING	S_A_NOT_PIM_400	Lorenz curve point with first 400 customers (under 1 million €) sorted by the amount
PERF_MONITORING	S_A_NOT_PIM_500	Lorenz curve point with first 500 customers (under 1 million €) sorted by the amount
PERF_MONITORING	S_EA_NOT_PIM_50	Lorenz curve point with first 50 customers (under 1 million €) sorted by the Expected not PIM amount
PERF_MONITORING	S_EA_NOT_PIM_100	Lorenz curve point with first 100 customers (under 1 million €) sorted by the Expected not PIM amount
PERF_MONITORING	S_EA_NOT_PIM_200	Lorenz curve point with first 200 customers (under 1 million €) sorted by the Expected not PIM amount
PERF_MONITORING	S_EA_NOT_PIM_300	Lorenz curve point with first 300 customers (under 1 million €) sorted by the Expected not PIM amount
PERF_MONITORING	S_EA_NOT_PIM_400	Lorenz curve point with first 400 customers (under 1 million €) sorted by the Expected not PIM amount
PERF_MONITORING	S_EA_NOT_PIM_500	Lorenz curve point with first 500 customers (under 1 million €) sorted by the Expected not PIM amount
PERF_MONITORING	S_REA_NOT_PIM_50	Lorenz curve point with first 50 customers (under 1 million €) sorted by the Risk amount not PIM
PERF_MONITORING	S_REA_NOT_PIM_100	Lorenz curve point with first 100 customers (under 1 million €) sorted by the Risk amount not PIM
PERF_MONITORING	S_REA_NOT_PIM_200	Lorenz curve point with first 200 customers (under 1 million €) sorted by the Risk amount not PIM
PERF_MONITORING	S_REA_NOT_PIM_300	Lorenz curve point with first 300 customers (under 1 million €) sorted by the Risk amount not PIM
PERF_MONITORING	S_REA_NOT_PIM_400	Lorenz curve point with first 400 customers (under 1 million €) sorted by the Risk amount not PIM
PERF_MONITORING	S_REA_NOT_PIM_500	Lorenz curve point with first 500 customers (under 1 million €) sorted by the Risk amount not PIM

Page tree

PCM - Predictive Credit Management

Objectives of the document

General presentation of the solution

Business stake and approach

Machine learning methodology

Outputs available to Cash collectors

Technical description

Solution Architecture

Overview

Communication process between BW and the solution server

Inbound connectors

Outbound connectors

Solution building blocks

MS SQL server

Connection

Inputs/outputs

Python

Connection

Inputs/outputs

SFTP server

Connection

Inputs/outputs

SFTP client

Connection

Inputs/outputs

Deep-dive on solution building blocks

MS SQL server

Create database

Preprocess data to compute predictive features

00 – current_data_table

01 – currency_data_per_month

02 – currency

02a – update currency table

03 – credit_limit

04– documents_view

05– documents_perimeter

06– clearing_list_for_payer_history

07– OB_payer_current_date

08– payer_history_per_month

10– OD_at_current_Date

11– core_perimeter

12–result_table

13–PRIORITY_PER_AMOUNT

14–PRIORITY_PER_PAYER

15a–result_table_with_truth

15b–PERF_MONITORING

Python

User guide

Installation

Tunning of the dynamic model

Monitoring

Performance

Technical

Procedure to retrain the model

When to retrain the model?

How to retrain the model (step-by-step)?

Retrain model for prediction of not paid documents in the current month / documents overdue

Retrain model for regularization model

What are the checkpoints?

Procedure for Exec builder

APPENDIX

Off line sql queries

01 – create schema cm_view_training

02 – create credit_mgt_view

03 – APPLY THE ONLINE QUERIES ON CREDIT_MGT_VIEW in shema cm_view_training

04– Save the core_perimeter table in core_perimeter_concat

05- Create training data with the updated target

DB table details

Description of the performance monitoring table