You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 10 Next »

The objective of this page is provide simple explanation for the business (with few technical items), to understand all the steps of the modelization

1.Data Encoding

In our original data, we have numerical and categorical features (region, product taxonomie features, ...).

for machine learning model, or to compute a similarity distance between CPC, we need to have only numerical feature.

So we transform categorical features to numerical, applying a "Target Encoding"

  • we replace variables' modalities by a numerical value, which is the Target (price with "log" transformation) mean of this modality.
  • And example on the region for Amodel family :

    • This give us several information:
    • The mean price in Americas is greater compared to other regions
    • EMEA and Other APAC will be consider close in the similarity distance computation, While Americas is farthest to EMEA.


  • No labels