You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 21 Next »

The objective of this page is provide simple explanation for the business (with few technical items), to understand all the steps of the modelization


1.Data Encoding

In our original data, we have numeric and categorical features (region, product taxonomy features, …).

for machine learning model, or to compute a similarity distance between CPC, we need to have only numeric features.

So we transform categorical features to numeric, applying a "Target Encoding"

  • we replace variables' modalities by a numeric value, which is the Target (price with "log" transformation) average of this modality.
  • An example on the region for Amodel family :

    • This give us several information:
    • The average price in Americas is greater compared to other regions
    • EMEA and Other APAC will be consider close in the similarity distance computation, While Americas is farthest to EMEA.

From a Categorical feature with no information about order and proximity between modalities, we obtain an ordered numeric variable usable for machine learning model and the similarity distance calculation.



2.Modelization of the Target (price with log transformation)

One model is created for each family.

The objective is to predict the target (price with "log" transformation) according to all numeric features selected. To do this, some CPC are used to train the model, and others to test the performance. An optimization is done to find the best parameters of the model for each family.

We use the R² metrics to measure the model performance, generally between 0 (bad) and 1 (perfect). In general, we are good if we are between 0.4 and 0.9

the objective is not to have a perfect model, because in this case we probably fit to well our current data, and the model will not generalized well to new data that are coming each month.

But if we are too low, this mean that : 

  • Maybe some levers that can explain the price dispersion between CPC are not available.
  • Or the dispersion cannot be explained (due to human behavior, price negotiation with different customers, …) : 
    • For example, if we have 2 CPCs with exactly the same values for all features, but with different price. 


The modelization in only a first step. Our objective is not to predict the price as well as possible, but to obtain coherent features importance and volume curve that can be used to compute similarity between CPCs.

we describe in the next section model's outputs that should be reviewed to validate the modelization step.


3.Modelization outputs

xxx

4.Finds comparables

xxx

5.Volume adjustment

xxx

6.Group volume adjustment

xxx

7.Price recommendation

Cap 30%

dqf


  • No labels