The objective of this page is provide simple explanation for the business (with few technical items), to understand all the steps of the modelization
1.Data Encoding
In our original data, we have numeric and categorical features (region, product taxonomy features, …).
for machine learning model, or to compute a similarity distance between CPC, we need to have only numeric features.
So we transform categorical features to numeric, applying a "Target Encoding" :
- we replace variables' modalities by a numeric value, which is the Target (price with "log" transformation) average of this modality.
- An example on the region for Amodel family :
- This give us several information:
- The average price in Americas is greater compared to other regions
- EMEA and Other APAC will be consider close in the similarity distance computation, While Americas is farthest to EMEA.
From a Categorical feature with no information about order and proximity between modalities, we obtain an ordered numeric variable usable for machine learning model and the similarity distance calculation.
2.Modelization of the Target (price with log transformation)
xxx