Page tree


Note: -  Please create similar document if target project has dependency on other projects

Data Flow :- 

Upload the Architecture diagram of the project(Data Flow)

For Ex: -





List down all upstream and downstream dependencies

    1. Upstream Dependencies
      1. Upstream Dependencies-1
        1. Detail about dependency
          1. How data getting generated.?
          2. What is the source of the data.?
          3. What is the output of the data.?
          4. How frequently data getting loaded to the output.?
          5. Mention Point of contact/Team to reach out incase any issue.
        2. Dataset-1 (Which is input source to dataiku).
          1. List if any intermediate data transformation is happening.
          2. Detail about data transformation(e.g. Function, Calculation..)
        3. Dataset -2
          1. Mention similar as Dataset-1
        4. Dependent Scenario- 
          1. Add the scenario name which is been triggered by upstream dependency and mention its time .
          2. If the dependent scenario trigger is required by dataiku team or not.
      2. Upstream Dependencies-2
        1. Mention same details if any other dependencies available
    2. Downstream Dependencies
      1. Downstream Dependencies-1
        1. Detail about Dependency.
          1. Where data getting loaded (i.e. Data source, Any platform like Talend, Tableau)?
          2. How frequently data should be loaded ?
          3. If any additional reporter needs to added(i.e. Who will receive email on success or failure)?
          4. No. of user impacted
          5. Mention Point of contact/Team to reach out incase any issue.
        2. Dataset-1
          1. Mention if any quality/specific details need to maintain or needs to check.?
          2. Potential issues/Concerns if any data loss.
          3. Mention if any backup/ Rollback options available
        3. Dataset - 2
          1. Mention similar as Dataset -1 if any other dataset connected.
      2. Downstream Dependencies-2
        1. Mention same details if any other dependencies available..?

Procedures : - 

  1. Project Name 
  2. URL
  3. Short description about the project.(Overview)
  4. Diagram of complete flow zone of project.

                                           

  1. List No of Zones.
    1. Zone1
    2. Zone2
    3. Zone3
  2. List No. of Recipes in the project.
    1. No. of code recipes
      1. no. of SQL recipes
      2. no. of Python recipes
    2. No. of visual recipes
    3. Provide tabular data of all recipe details

      Recipe Name

      Recipe Type

      Environment Name 

      Visual recipe -1

      Recipe -Type -1(e.g. Sync, Join.. etc)

      Visual recipe

      Visual Recipe -2

      Recipe -Type -1(e.g. Sync, Join.. etc)

      Visual recipe

      Sql Recipe -1

      SQL

      Sql Recipe

      Python Recipe -1PythonEnv Name -1
  3. List No. Datasets in the project.
    1. List all the dataset and details.

      Datasets Name

      Connection Type

      Connection Name

      Dataset Type

      Dataset 1

      connection type(e.g. GCS,BQ)

      Connection name

      Dataset type( e.g. Base/Input/Source, Transformation, Output)

      Dataset 2

      connection type(e.g. GCS,BQ)

      Connection name

      Dataset type( e.g. Base/Input/Source, Transformation, Output)

      Dataset 2

      connection type(e.g. GCS,BQ)

      Connection name

      Dataset type( e.g. Base/Input/Source, Transformation, Output)

  4. Description about all the zones
    1. Zone 1
      1. Snapshot of Zone 1
      2. List of all sources to the Zone.(e.g. Dataset name.)
      3. Short description about how source data is generated and from where its generated.
      4. Mention if any query used in datasets.
      5. Provide detail description about data transformation and data flow in the specific zone.
      6. Mention each recipe and problem statement.
      7. Mention the queries used in the code recipes(e.g. BW queries, sql queries)
      8. Where the destination/output dataset is used.?
      9. Mention if any append used in any dataset and backup of dataset.
    2. Zone -2

      1. Provide similar description about the zone -1 if any other zones available.
  5. Detail about Library
    1. Library -1
      1. Mention if any python code available in Library
      2. Detail description about the python code.
      3. Mention if maintenance needed from Dataops team.
      4. Mention the changes need to updated.
      5. Where does this library used.
    2. Library -2
      1. Mention details if any library available.
  6. Detail about the API.
    1. Mention if any API created in API designer.
    2. Short description about the API.
    3. Mention if any maintenance/code changes needed as per instance.(e.g. URL change, variable change)
  7. Project/ Scenario variable
    1. Mention if any variables stored in project variable.
    2. Mention if any data needs to be maintained in variables.
    3. Reason of variable and how its used..
  8. Detail about Webapp/Dashboard.
    1. Webapp/Dashboard -1
      1. Description about the webapp/dashboard.
      2. List of datasets connected to webapp/ dashboard.
      3. Mention if any scenario/Job is updating/loading the webapp/dashboard.
      4. Frequency of webapp/dashboard to be updated.
      5. etc.
    2. Webapp/Dashboard -2
      1. Mention as similar to webapp-1 if any other webapp/dashboard is available.
  9. Plugin info.
    1. Mention if any plugin used.
    2. Detail description about the plugin.
    3. Mention if the plugin is installed from dataiku store or self developed.
    4. Mention the environment details if self developed.
  10. resource consumption details.
    1. Memory consumption
    2. CPU consumption.

Scheduling and Monitoring: - 

  1. Detail about the scenario.
    1. Scenario -1
      1. Short description about the scenario(What it does.?)
      2. Mention the steps.
      3. Mention the datasets which is building.
      4. Mention the trigger details.
      5. Mention if any custom codes available in steps/triggers.
      6. Mention if any further changes needed in scenario.
    2. Scenario -2
      1. Mention same details as Scenario 1 if any other scenario available.

Error Handling: -

  1. Potential Problems/Concerns
    1. Problem -1
      1. Any recurring and potential issues for the project.
      2. Point of contact in case of any issues.
      3. Related documentation
    2. Problem -2
      1. Mention the same details as problem 1


  • Link to the additional documents if you have(Dataiku projects)
    • Dataset related.
    • PPTs
    • Docs
  • Point of contacts for any issue in dataiku project from DT team and Business Team
  • No. of impacted users.