Target:

The price target is built following these steps (presented in these recipes in Dataiku: Prepare recipe 1, Group recipe 1, Prepare recipe 3):

1. Gathering of the CPC forecast prices and last invoice price:

Corresponds to the two fields: forecast_unit_price (Fcst_Unit_Price_Est_Act_E for next 12 months) and last_invoice_price (Last_Invoice_Price_EUR_KG) in the Transparency Dashboard extracts.

2. Cleaning of both variables by replacing "-" and non-decimal values with empty cells

3. Aggregating of the forecasted months to keep a single forecast_unit_price at a CPC level. This price being the average of the forecasts on the 12 next months.

4. Computing the final unit price using the following formula:

computed_unit_price = if(forecasted_unit_price == 0 || isNull(forecasted_unit_price), last_invoiced_price, forecasted_unit_price)

We take as a priority the forecast_unit_price from the step above if it is greater than 0, else we revert to the last invoice price from the forecast data.

Price drivers:

To select the final list of the most relevant price drivers, we collected, built and tested more than 50 features:

Pricing Front End - 11240 > Processing steps (CS) > image-2023-11-23_11-30-8.png

The preprocessing of different features can be summarized as follow:

Pricing Front End - 11240 > Processing steps (CS) > image-2023-11-23_14-38-14.png

For Sales and Volume:

- in the Group recipe 1, forecasted_sales (Fcst_Sales_Est_Act_EUR) and forecasted_volume (Fcst_Volume_Est_Act_KG) have been aggregated by CPC in order to keep the sum of forecasts over the whole 12 months for each CPC.

- CPCs with zero values in forecasted_volume are filtered out in the Prepare recipe 3.

- historical sales and volume have been processed with same method in the Group recipe 2, and are included in the final dataset in the Python recipe, to be used used later in the feature engineering stage.

For ICM:

- historical_unit_icm (Fcst_Unit_ICM_Est_Act_EUR) has also been aggregated by CPC to compute the average over the last 12 months (Group recipe 2).

- historical_icm (Fcst_ICM_Est_Act_EUR) has been included in the Python recipe, and used further in the feature engineering step to compute ICM ratios.

For other features:

incoterms	manufacturing_plant	product_name
material_code	grp_of_activities	product_code
material_name	country_shipto	product_brand_name
shipto_code	gbu_region	country_soldto
shipto_name	gbu_product_family	grp_customer_seg
soldto_name	entreprise_seg	shipping_plant
soldto_code	market	end_use
soldto_group	gbu_customer_seg

- in the forecast dataset these features were aggregated by CPC to retain the earliest available data (Group recipe 1).

- the same features in the historical dataset were aggregated by CPC to retain the latest available data (Group recipe 2).

- in the Python recipe, the two dataset were combined to fill empty values of forecast data by historical data and to generate the features dataset.

For manual inputs:

- the product grouping are collected from this GSheet, and supplied by the business.

- the manual region mapping for each product family comes from this GSheet, and is also supplied by by the business.

- in the Join recipe ,both datasets are added to the feature dataset obtained in the previous steps.

For Product characteristics:

- both product composition files are provided in in this GSheet file.

- in the Python recipe, we have calculated the product composition features by adding the content ratio per product for each selected component and the total number of substances per product.

- the Join recipe adds the material code to the dataset using the "EHS_Product" key mapping.