You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

The objective of this page is to provide a detailed description of the global variables used in Dataiku's workflow.

Versioning variables

these two variables are used to define the version of the runs in Dataiku folders:

"versioning_version_name": "Q3.3_all_families_run",
"extract_month_date": "2023-09-01",

to get the version name in the python recipes, we used the get_config_version() function from config_helpers.py file.

GBU variables

Specific GBU measures and identifiers, which must be updated for each new GBU.

The "families_in_scope" contains the names of integrated product families for the current GBU, to be updated each time we add a new product family

"GBU_measures": {
    "historical_revenue": "historical_sales",
    "historical_volume": "historical_volume",
    "historical_price": "historical_unit_price"
  },
  "GBU_identifiers": {
    "id_key": "cpc",
    "product_key": "material_code",
    "customer_key": "shipto_code",
    "soldto_key": "soldto_code",
    "soldto_group_key": "soldto_group",
    "shipto_key": "shipto_code",
    "family_key": "gbu_product_family",
    "sales_key": "forecasted_sales",
    "volumes_key": "forecasted_volume",
    "prices_key": "computed_unit_price"
  },
  "families_in_scope": [
    "Sulfosuccinate_Sulfosuccinamate",
    "Specialty_Monomers",
    "Phosphate_Esters",
    "Guars"
  ],

To get these variables, we used the get_config_gbu_ids() function from config_helpers.py file.

The values of "families_in_scope" variable are used by the GBU variable "family_key" to select families in the scope.

Product composition variables

variables used to process product composition data, in particular to select the component to be used, specify the identifiers of the product, component type, measure and unit.

"product_composition": {
    "component_values": [
      "COMPONENT",
      "IMPURITY",
      "SOLVENT",
      "ADDITIVE",
      "Z_CONST"
    ],
    "product_identifier": "EHS_Product",
    "component_type_identifier": "Component_Type",
    "measure_identifier": "Average",
    "unit_identifier": "Unit"
  },

these variables are used as arguments to the compute_product_composition() function to compute the product composition features in this recipe.

Pre-processing variables

variables used in the various data preparation stages:

  • "preprocessing_filters" to filter CPCs on the basis of column values (for example, here we filter all CPCs with "SSPH" in the product_group column), used as arguments for the data_filters() function in feature engineering recipe.
  • "imputers" to specify the imputation strategy to be used for each characteristic with nan values, used as arguments for simple_imputer() function in feature engineering recipe.
  • "categorical_encoder" to specify the type of encoder to be used for categorical features, used as arguments for encoding_data() function in encoding recipe.
  • "ordinal_encoder" to specify the type of encoder to be used for ordinal features, used as arguments for encoding_data() function in encoding recipe.


"preprocessing_filters": {
    "product_group": [
      "SSPH"
    ],
    "material_name": [
      "AEROSOL OT-100 SURF 25KG FBD WHSKIN",
      "AEROSOL OT-100 SURF 11KG W/LBL BOX"
    ],
    "end_use": [
      "Hpc-Api"
    ]
  },
"imputers": {
    "most_frequent": [
      "manual_region_SS",
      "manual_region_SM",
      "product_group"
    ],
    "constant": {
      "n_competitors": 1,
      "historical_unit_price_coalesce_ratio_on_12": 1,
      "historical_sales_coalesce_ratio_on_12": 1,
      "historical_unit_price_ratio_3_on_12_month": 1
    },
    "mean": [
      "COMPONENT_ratio",
      "IMPURITY_ratio",
      "SOLVENT_ratio",
      "n_components"
    ]
  },
"categorical_encoder": "TargetMean",
"ordinal_encoder": "Ordinal",


params for get_interval_ratio() function, used to computes a ratio of the chosen column in "evolution_columns" on one or several month ("numerator_list") in regards to another set of months ("denominator_list")


"evolution_features_params": {
    "evolution_columns": [
      "historical_sales",
      "historical_volume",
      "historical_unit_price"
    ],
    "numerator_list": [
      1,
      3,
      6
    ],
    "denominator_list": [
      12
    ]
  },
  • No labels