As Data architect when you start on the project DA needs to participate in below activities : 

  1. Functional Analysis Review

  2. "Discussion with Dataviz if not covered under Functional Analysis"

  3. Source System Analysis

    1. Description : In this activity DA needs to understand the source that has been requested by business, document the source tables etc
    2.  Tools : If the source is BW then use Xtract to asses the SAP BW, if source is anyother system for ex SAP, salesforce, request access of it for analysis of tables.
    3. Deliverable : Documenting the sources in mapping document 
  4. Create Xtract configuration for SAPBW Description :

    1. Description : If the source for the project is the SAP BW then use use Xtract to configure and retrive data.
    2. Step by Step guide to connect to Xtract and get the URL for Talend
    3. Once you get the URL , insert it the mapping document
  5. Tables / Views conventions

    1. Tables conventions:
      1. STG
        1. Naming:
          1. Tables
            • The name must begin with the prefix STG_
            • Staging can only contain lowercase letters for the file name description, numeric characters, underscores (_). 
            • Spaces are not allowed. 
            • The name cannot be a reserved word in Google BigQuery such as WHERE or VIEW.
            • The name cannot be the same as another Google BigQuery object that has the same type.
            • When you create a table in BigQuery, the table name must be unique per dataset.
            • Only use approved acronyms which are known in the organization.
            • The table name cannot exceed the 80 characters.
            • Include the system name
            • Include the domain name
            • Include the Site name
            • Include the System Reference
            • Include the File Code
            • Include a sequential number
            • Include an extraction type 
            • Include the frequency


            Name convention: [001]_[002]_[003]_[004]_[005]_[006]_[007]_[008]

            Where:

            #

            Category

            Description

            001

            Area

            Start with the prefix STG to identify 

            3 characters for this category

            002

            System Name

            System the data come from. 

            Ex: HLX ; ELN ; MES

            3 Characters for this category

            003

            Site Name

            Site that data come from

            Ex: 0000

            4 Characters for this category

            004

            System Reference

            System the data come from. 

            Ex: HLX ; ELN ; MES
            Ex:0000

            4 Characters for this category

            005

            File Code

            For each file code we will have a dedicated staging table 

            Ex: F001; F002 

            4 Characters for this category

            006

            Extraction Type

            If it’s a full or incremental extraction (F= Full; I = Incremental)

            1 character for this category

            007

            Frequency

            If it’s Monthly, weekly, daily, quarterly etc…

            Ex: M = monthly, W = weekly

            1 character for this category

            008

            File name

            Identify the content of the table.

            Ex:

            Cannot exceed 63 characters


            Example : STG_HLX_0000_0000_F001_F_W_stellar_escalation_follow_up

          2. Columns:

            • Can only contain lowercase letters.
            • Spaces are not allowed.
            • names must start with a letter and finish with a letter or number.
            • The name cannot contain special characters. (only “_” is allowed)
            • The name cannot be a reserved word  such as WHERE or VIEW.
            • A primary key column should usually have only 1 column serving as a primary key. It would be best to simply name this column “id”. 
            • For dates, it’s good to describe what the date represents. Names like start_date and end_date are pretty descriptive. If you want, you can describe them even more precisely, using names like call_start_date and call_end_date. 


          3. Views

            • The name must begin with the prefix "vw_" 
            • Views follow many of the same rules that apply to naming tables convention.


        2.  Keys
        3. Mandatory Columns
          1. #

            Field Name

            Description

            Type

            Example

            001

            meta_run_id

            Id of the run, this information come from STG_runs

            Integer

            E.g. 12408

            002

            meta_md5_hsh

            file check sum

            String

            E.g. 69095ac6258ec1c9fd151cec979ace71

            003

            meta_file_name

            Bucket File name

            String

            E.g. 1291619-3d223530ac3c11ecae3500000ad02795-Emulsion Polymerization.json

            004

            meta_file_path

            Path of the bucket

            String

            E.g. gs://cs-ew1-prj-dashb-rational-dev-staging/STG_SEA.csv

            005

            meta_execution_id

            System Execution id

            Prefix of the tool used to populate the field + execution id that match tool log.

            String

            E.g. Talend_6Oyhb6

            006

            meta_bucket_load_date

            Date the bucket file was loaded (UTC)

            timestamp

            E.g. 2022-12-12 17:00:21 UTC

            007

            meta_business_date

            This date comes from the business (UTC), it’s when the data was refreshed from the business side.

            timestamp

            E.g. 2022-12-12 17:00:21 UTC

            008

            meta_stg_insert_date

            When the data was inserted in the staging

            timestamp

            E.g. 2022-12-12 18:00:42 UTC

            009

            meta_source_system

            Identification of the source where the data comes from.

            String

            E.g SAP; BW; ORA

      2. ODS
        1. Naming
          1. Tables
            • adopt convention names from the source, it will facilitates debugging or finding data     
            • The name must begin with the prefix ODS_
            • ODS can only contain lowercase letters for the file name description, numeric characters, underscores (_). 
            • Spaces are not allowed. 
            • The name cannot be a reserved word in Google BigQuery such as WHERE or VIEW.
            • The name cannot be the same as another Google BigQuery object that has the same type.
            • When you create a table in BigQuery, the table name must be unique per dataset.
            • Only use approved acronyms which are known in the organization.
            • The table name cannot exceed the 80 characters.
            • Include the domain name
            • Include the Site name
            • Include the File Code
            • Include a sequential number
            • Include an extraction type 
            • Include the frequency


              Name convention: [001]_[002]_[003]_[004]_[005]_[006]_[007]

              Where:

              #

              Category

              Description

              001

              Area

              Start with the prefix ODS to identify 

              3 characters for this category

              002

              System Name

              System the data come from. 

              Ex: HLX ; ELN ; MES

              3 Characters for this category

              003

              Site Name

              Site that data come from

              Ex: 0000

              4 Characters for this category

              004

              File Code

              For each file code we will have a dedicated staging table 

              Ex: F001; F002 

              4 Characters for this category

              005

              Extraction Type

              If it’s a full or incremental extraction (F= Full; I = Incremental)

              1 character for this category

              006

              Frequency

              If it’s Monthly, weekly, daily, quarterly etc…

              Ex: M = monthly, W = weekly

              1 character for this category

              007

              File name

              Identify the content of the table.

              Ex:

              Cannot exceed 63 characters


              Example of the staging naming convention:

              ODS_0000_F001_F_W_stellar_escalation_follow_up

          2. Columns
            • adopt convention names from the source, it will facilitates debugging or finding data     
            • Can only contain lowercase letters.
            • Spaces are not allowed.
            • names must start with a letter and finish with a letter or number.
            • The name cannot contain special characters. (only “_” is allowed)
            • The name cannot be a reserved word  such as WHERE or VIEW.
            • A primary key column should usually have only 1 column serving as a primary key. It would be best to simply name this column “id”. 
            • For dates, it’s good to describe what the date represents. Names like start_date and end_date are pretty descriptive. If you want, you can describe them even more precisely, using names like call_start_date and call_end_date.
        2. Keys
        3. Mandatory Columns
  1.