CPC definition
You will find the term "CPC" a lot in the following documentation as it is the main level of granularity used through the application. This means that most of our datasets will have one record by CPC of a given GBU.
The Customer Product Combination (CPC) is the identifier representing a specific product sold to a specific customer.
The product and customer definition here varies from one GBU to another.
For example :
- For Novecare, the product considered is the material and the customer is the shipto.
- For SpP, the product considered is the material_group and the customer is the soldto.
Target
The target represents the data we are trying to optimize, in this case the unit price of a CPC (Customer Product Combination).
The unit price we use in our models is the result of several computation made in our data preparation steps, we could summarize it as follows :
For Novecare:
- Gathering and aggregation of the CPC forecasted sales and volumes on the next 12 months.
- Computation of the resulting forecasted unit price
- Computation of the resulting forecasted unit price
- Gathering of the CPC last invoiced price from the forecast data.
- Gathering and aggregation of the CPC historical sales and volumes on the last 12 months.
- Computation of the resulting historical unit price
- Computation of the resulting historical unit price
- Gathering of the CPC last invoiced price from the historical data.
- Select the first non-zero unit price following this order :
- Forecasted unit price
- Last invoice price from forecasts
- Last invoice price from historical data
- Historical unit price
For SpP:
The rule used to define the unit price is as follows:
- Average of volume-weighted prices for the last 3 months with sales.
Note : as of now, this unit price includes all costs : fixed and variable.
More details are available below on how these computations are included in our global data preparation flow.
Price drivers
To select the final list of the most relevant price drivers, we collected, built and tested more than 50 features:
These price drivers are coming from several data sources described below.
Forecasts and historical data
The main data source we are currently using is the Pricing Data Lake in Big Query, especially the two following datasets :
- V_FACT_sales_forecast_enriched_current : Forecasts data.
- V_FACT_sales_history_cpc_last12months : Historical data for the past 12 months.
These datasets include :
- Sales and volume measures that are also used to generate the unit price used as a target (see dedicated § above) for our models.
- Dimensions used as features / price drivers (see dedicated § above).
[Novecare] Detailed processing steps
[SpP] Detailed processing steps
Manual input
All manual data sources are gathered in a spreadsheet specified by GBU and under the responsibility of the business.
For Novecare, this following GSheet is used to process the manual inputs for product groupings, manual regions and manufacturing plant groups.
For SpP, this following GSheet is used to process the manual inputs for product taxonomy and this one for manual regions.
Update the inputs from Gsheet directly (Novecare only)
Here are the steps to follow to use this manual process :
- The user clicks on the button of the link displayed in the "README" tab to trigger an update of the content of the file (this will take a few seconds and is finished as soon as the button re-appears).
- In Dataiku, it will launch a scenario that will retrieve the data of the latest run and send them in the several tabs of the Gsheet.
Let's take the example of the "Regions" tab :
We can see that new records appeared with countries that do not have a manual_region value for the given product_family_h4.
- The user can now add the right manual_region in the green background column or update an existing value. Note : The original_ prefixed columns will always display the current value to help the users potentially reverting changes.


