Refer to this article for the branch management part, including creation, rebase and merge of the branch.

Note : think about updating the proper Dataiku run versions Gsheet (CS / SPP) with the versions and set of variables you use to keep trace of the developments.

Once the branch is created :

  1. Update "versioning_version_name" variable.

       2. Make sure the data of the new product family is included in the sources.

       4. Add the family name in the global variable "families_in_scope".

       5. Run up to the encoding folder and conduct a correlation analysis using the dedicated notebook (or Encoding_folder statistical sheets), check null values to see if there are any column to impute in the config.

       5.bis Run the Features_selection_analysis recipe to run all combinations of non-correlated features and see which one returns the best R2.

       6. Enable the "random variables" in the config to see which features are above the RV features (every feature having a weight below one of the RV features should not be kept in the model).

       7. Launch a first run up to the weights model on its own version.

       8. Check the results using the dashboard "SHAP analysis" tab, filter out features having less importance than the RV ones and test new features if needed (refer to this article for details).

       9. Run the similarity model and look at the volume curves health. Try to identify outliers and have them explained with the help of the business. Use this to adapt volume thresholds or filter out CPC if needed.

       10. To perform hyperparameter optimization, set "run_cross_val" variable to True and add the family in the "families_to_optimize" variable.

       11. Set "use_hyper_params" to True and launch a second run up to the similarity model on its own version.

       12. Finish the run until final output and provide results for validation.

       13. Rebase and merge the branch back to dev and rerun entirely there to keep the history of the version.


  • No labels