The objective of this page is to provide a detailed description of the global variables used in Dataiku's workflow.
Versioning variables
these two variables are used to define the version of the runs in Dataiku folders:
- the version name must be updated for each new run
- the extract_month_date refers to the first forecast month in the extracted forecast data, and is updated automatically during a scenario step (https://dss.solvay.com/projects/SPOT_CS_PO_918/scenarios/2DATAPREPARATION/steps).
"versioning_version_name": "Q3.3_all_families_run", "extract_month_date": "2023-09-01",
to get the version name in the python recipes, we used the get_config_version() function from config_helpers.py file.
GBU variables
Specific GBU measures and identifiers, which must be updated for each new GBU.
The "families_in_scope" contains the names of integrated product families for the current GBU, to be updated each time we add a new product family
"GBU_measures": {
"historical_revenue": "historical_sales",
"historical_volume": "historical_volume",
"historical_price": "historical_unit_price"
},
"GBU_identifiers": {
"id_key": "cpc",
"product_key": "material_code",
"customer_key": "shipto_code",
"soldto_key": "soldto_code",
"soldto_group_key": "soldto_group",
"shipto_key": "shipto_code",
"family_key": "gbu_product_family",
"sales_key": "forecasted_sales",
"volumes_key": "forecasted_volume",
"prices_key": "computed_unit_price"
},
"families_in_scope": [
"Sulfosuccinate_Sulfosuccinamate",
"Specialty_Monomers",
"Phosphate_Esters",
"Guars"
],
To get these variables, we used the get_config_gbu_ids() function from config_helpers.py file.
The values of "families_in_scope" variable are used by the GBU variable "family_key" to select families in the scope.
Product composition variables
variables used to process product composition data, in particular to select the component to be used, specify the identifiers of the product, component type, measure and unit.
"product_composition": {
"component_values": [
"COMPONENT",
"IMPURITY",
"SOLVENT",
"ADDITIVE",
"Z_CONST"
],
"product_identifier": "EHS_Product",
"component_type_identifier": "Component_Type",
"measure_identifier": "Average",
"unit_identifier": "Unit"
},
these variables are used as arguments to the compute_product_composition() function to compute the product composition features in this recipe.
Pre-processing variables
variables used in the various data preparation stages:
- "preprocessing_filters" to filter CPCs on the basis of column values (for example, here we filter all CPCs with "SSPH" in the product_group column), used as arguments for the data_filters() function in feature engineering recipe.
- "imputers" to specify the imputation strategy to be used for each characteristic with nan values, used as arguments for simple_imputer() function in feature engineering recipe.
- "categorical_encoder" to specify the type of encoder to be used for categorical features, used as arguments for encoding_data() function in encoding recipe.
- "ordinal_encoder" to specify the type of encoder to be used for ordinal features, used as arguments for encoding_data() function in encoding recipe.
"preprocessing_filters": {
"product_group": [
"SSPH"
],
"material_name": [
"AEROSOL OT-100 SURF 25KG FBD WHSKIN",
"AEROSOL OT-100 SURF 11KG W/LBL BOX"
],
"end_use": [
"Hpc-Api"
]
},
"imputers": {
"most_frequent": [
"manual_region_SS",
"manual_region_SM",
"product_group"
],
"constant": {
"n_competitors": 1,
"historical_unit_price_coalesce_ratio_on_12": 1,
"historical_sales_coalesce_ratio_on_12": 1,
"historical_unit_price_ratio_3_on_12_month": 1
},
"mean": [
"COMPONENT_ratio",
"IMPURITY_ratio",
"SOLVENT_ratio",
"n_components"
]
},
"categorical_encoder": "TargetMean",
"ordinal_encoder": "Ordinal",
params for get_interval_ratio() function, used to computes a ratio of the chosen column in "evolution_columns" on one or several month ("numerator_list") in regards to another set of months ("denominator_list")
"evolution_features_params": {
"evolution_columns": [
"historical_sales",
"historical_volume",
"historical_unit_price"
],
"numerator_list": [
1,
3,
6
],
"denominator_list": [
12
]
},