Overview

This page is designated to scope, plan and deliver the Proof of Concept the AdEx application that was running on Dataiku and then moved to GCP in June 2023.

Product

Vision

We want Researchers and smart tools to help each other at finding quickly the best products for the right applications.

Goal

AdEx is an application that allows Lab Researchers to upload a dataset so that it can predict best results based on recommended inputs.

The value: allow the researchers to reduce the number of trials with non-successful outputs.

Documentation

AdEx deck for "From Dataiku to GCP" migration here

Quick User Manual Documentation here.

Full User Manual here.

LLD here.

Projects

BM Aubervilliers & ARO Lyon

Product Features

This Product composed of a few modules represented in 5 steps for the user going from the upload to the recommendation:

Upload and customize Dataset

The upload

A Unique file can be uploaded from computer only
Type of files that can be uploaded are: “.csv“ or “.pkl” (export from a previous AdEx analysis-export), else:
- If a wrong file format is uploaded: nothing happens, no error message, nothing stops the user to reupload a file

Dataset customization and selection

The user can visualize the uploaded dataset on the right side/panel. It exactly shows the same number of columns and cells, and values that are in the uploaded file in the data table on the original radio button/tab
- There is no column/row limits
Format cleanup, list of accepted rules
Clear session button allows the clear the dataset of the session to reupload a new one.
2 type of uploads with different format - the file must already be in that format:
- Trials in rows
- Trials in columns
Select the “Variable Selection“ tab
Select Trial ID column if identifier (Primary key) is available
Select inputs and outputs from available list in dropdown menus
- Minimum of 2 inputs and 1 output
Click on the Verify “Only ID, Inputs & Targets“ radio button/tab to see selected columns from original dataset
If those steps has been fully completed, the button in the next tab will be orange

Set Design space - Select Dataset Range

Select “Design Space“ tab
Click “USE DEFAUL DESIGN SPACE AND MAXIMIZE TARGETS“
- Grey if previous step was incomplete
- Orange shows if the previous step to select variables was completed
- Green if the design space is set
Or Click on the dropdown, select to change the range of the value of the columns. (cannot change back and forth with the USE DEFAULT DESIGN SPACE…)
- Reset to default value can still make you use the default range
Proceed to “Model Optimize“ story and perform tasks.
Select each column and change ranges.

Model Optimize

Select “FIT MODEL & SEARCH NEXT TRIAL“ to compute the model.
- Grey if previous step was incomplete
- Orange shows if the previous step to select variables was completed
- Green if the design space is set
To reset the design space, click the green button to change it back to orange
If double clicking before computation is complete and button turns green, button turns red with message “TOO MANY CLICKS - REFRESH PAGE“ - at this point user needs to start from a fresh page
After computation is complete and button turns green, the user can visualize the results of the fit:
- “Model Info“: the user selects one output from the drop down menu and a graph displays predicted vs measured output along with error bars (5-Model Info Graph)
  - SHAP graphs (See attached 11-SHAP Graphs):
  - A bar chart with horizontal bars
  - A graph displaying features, feature values vs SHAP value
  - One graph per input showing SHAP values vs input values
- “Recommended Trials“:
  - A table sorted by trial ranking is displayed (See attached 6-Table Sorted by Trial Ranking)
  - X Output graphs (See attached 7-X Output Graphs):
    - In red, it shows output value from the design table (historical trials) sorted by the identifier (primary key)
    - In green, it shows the top 10 recommended trails
  - Below the graphs, a contour plot can be visualized for each output by selecting X and Y in drop down menus below these graphs (See attached 8-Contour Plots)
- “One-Dimensional Profile of the Model“: the user selects the trial ID if available (selected in “Select Variables“ tab), the input name and the target to display a banded graph. A red cross is for each historical trail. Reference values for the profile are also displayed below graph (See attached 9-Banded Graph)
- “Prediction Visualizer“: only available for multiple targets. Select two targets at the time and display scatter plot with historical trails (red points and error bars) and suggested trials (green points). Clicking on points displays trial experimental values below scatter plot (See attached 10-Prediction Plot)

Update Scores

Timeline

First Phase

The data Scientists, who are part of Materials R&I, TAMBURRO, Alessio and Ongari, Daniele have been developing a small application running on Dataiku where researchers can upload their dataset and get recommendation.
The current example taken in developing the app shows how the yield can be optimized.

This proof of concept also shows that the application running on Dataiku is not scalable to our users due too poor performance.

Current documentation can be found here: https://docs.google.com/presentation/d/1VPZLjZ05u780Y9Unwead3OEk_TOrSmL2Tpzr4teGaGg/edit#slide=id.g1683c30a2ec_0_1

Second Phase

In order to have a few users testing and using the application, we need to move it GCP.

How?

We need the DataLab squad, UI/UX designer and full stack developers to re-design the application coded in python on DataLab.

The application would behave exactly with the same feature set we currently have.

What business requirements we can improve:

DataLab UI/UX experience
Users would still upload their dataset in the DataEx module
Solvay users could use the Google SSO to access DataEx (No need to sign an NDA if the user will only have strict access to DataEx)

Steps

We need to review the scope, create epics, stories and design with the current DataLab Team
1. Meetings with the Architect, Product Manager, Product Owner, Business Analyst and UI/UX Designer
Breakdown and estimate non-functional re-design with the current team
1. Meetings with squad
Prioritize with an additional statement of work (growing the DataLab team with Lab Booster current budget)
1. Meeting with Cloud/Vanenburg team to pitch the additional workload with a solution