Control Table

Author

Version

Comments

Date

João Fonseca

0.1

Initial Version

 

Fernando Girante

1.1

Revision

 





Introduction

Purpose of the document

This document describes the naming for all the objects to be used in a project or in domain for GCP and Talend technologies.This will help to normalize all the names and facilitate the way of work.

Intended Audience

This document is intended for the Data Architectures and the Data Engineering team.
It will be used as reference for any project or domain for the developments of the models.

Bucket Naming Convention

Description

A Bucket is anything you want to store in the GCS in order to perform any operations on them.
Naming your buckets is a great way to make it easy for you to find your data in the Cloud Storage service.

Buckets

Name conventions for buckets are:


To generate the name of a bucket use this work document for all the conventioning names, filling the 11 first fields in the google sheet:
{+}https://docs.google.com/spreadsheets/d/1pGYnumYbJCOuu3MvVHMARUsLpiB_miXVMx1geReQYkE/edit#gid=1870517346+

#

Category

Description

001

System Name

System the data come from.
Ex: HLX; ELN; MES
3 Characters for this category

002

Domain

Domain that data comes from.
Ex: HR; RI; etc…
If we don't know the Domain, use the generic one: IT
2 Characters for this category

003

Site Name

Site is the same system existing in a different location.
This information comes from the source.
Ex: 0000
4 Characters for this category

004

System Reference

Reference of the system.
Ex:0000
4 Characters for this category

005

File Code

For each system and project we can receive more than 1 file and we need to identify each of them. This will help to understand how many files that project receives for a specific system, if one of them is missing the process will abort.
Ex: F001; F002
4 Characters for this category

006

Business Date

When data was extracted from the business side.
Format: YYYYMMDDHHMISS
Ex:20220812000000
14 characters for this category

007

Sequential Number

If we have more than one file with exactly the same name.
This can happen when we extract big volumes of data and we need to split the file in 2 or 3
Ex: 0000; 0001
4 characters for this category

008

Extraction Type

If it's a full or incremental extraction (F= Full; I = Incremental)
1 character for this category

009

Frequency

If it's Monthly, weekly, daily, quarterly etc…
Ex: M = monthly, W = weekly
1 character for this category

010

File name

Identify the content of the table. Need to be the same name used by the table source.
Ex:
Cannot exceed 60 characters


Example of the bucket naming convention:
HLX_IT_0000_0000_F001_20220812000000_0000_F_W_Stellar_escalation_follow_up

Fields

For the buckets the naming of the fields usually comes from the google sheet source, to guarantee the naming convention, this needs to be discussed with the source to send the files with the fields names exactly as we need, in case it is not possible we need to normalize the fields in GCP side.
Name conventions for fields are:

Staging Naming Convention

Description

A staging area, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses, data marts, or other data repositories.
We will use this staging for merging bucket files (in case we have more than one file for the same table).
Datasets name always needs to be in CAPITAL letter ,

Dataset

The name of the Dataset is STG

Tables

Name conventions for tables are:


To generate the name of the staging table use this work document for all the conventioning names, filling the 11 first fields in the google sheet:
{+}https://docs.google.com/spreadsheets/d/1pGYnumYbJCOuu3MvVHMARUsLpiB_miXVMx1geReQYkE/edit#gid=1870517346+

#

Category

Description

001

Area

Start with the prefix STG to identify
3 characters for this category

002

System Name

System the data come from.
Ex: HLX; ELN; MES
3 Characters for this category

003

Site Name

Site that data come from
Ex: 0000
4 Characters for this category

004

System Reference

System the data come from.
Ex: HLX; ELN; MESEx:0000
4 Characters for this category

005

File Code

For each file code we will have a dedicated staging table
Ex: F001; F002
4 Characters for this category

006

Extraction Type

If it's a full or incremental extraction (F= Full; I = Incremental)
1 character for this category

007

Frequency

If it's Monthly, weekly, daily, quarterly etc…
Ex: M = monthly, W = weekly
1 character for this category

008

File name

Identify the content of the table.
Ex:
Cannot exceed 63 characters


Example of the staging naming convention:
STG_HLX_0000_0000_F001_F_W_stellar_escalation_follow_up

Fields

Name conventions for fields are:

Views

At this level the Views are only used for security or to reduce the number of field purposes, we will not deliver data from the staging area to any report or extraction layer.
Name conventions for views are:

Routines

A routine usually runs a set of actions and returns a dataset.
Name conventions for routine are:


Note: All the ETL will be performed by the Talend tool.

WDL Naming Convention

Description

Working data layer, it's an area for temporary tables or for tables we need to work on to deliver a final.

Dataset

The name of the Dataset is WDL

Tables

Name conventions for tables are:

Fields

Name conventions for the fields are:

Views


Routines


ODS Naming Convention

Description

An operational data store (ODS) is used for operational reporting and as a source of data for the enterprise data warehouse (EDW). It is a complementary element to an EDW in a decision support environment, and is used for operational reporting, controls, and decision making, as opposed to the EDW, which is used for tactical and strategic decision support.

Dataset

The name of the Dataset is ODS

Tables

Name conventions for tables are:


To generate the name of the staging table use this work document for all the conventioning names, filling the 11 first fields in the google sheet:
{+}https://docs.google.com/spreadsheets/d/1pGYnumYbJCOuu3MvVHMARUsLpiB_miXVMx1geReQYkE/edit#gid=1870517346+

#

Category

Description

001

Area

Start with the prefix STG to identify
3 characters for this category

002

System Name

System the data come from.
Ex: HLX; ELN; MES
3 Characters for this category

003

Site Name

Site that data come from
Ex: 0000
4 Characters for this category

004

File Code

For each file code we will have a dedicated staging table
Ex: F001; F002
4 Characters for this category

005

Extraction Type

If it's a full or incremental extraction (F= Full; I = Incremental)
1 character for this category

006

Frequency

If it's Monthly, weekly, daily, quarterly etc…
Ex: M = monthly, W = weekly
1 character for this category

007

File name

Identify the content of the table.
Ex:
Cannot exceed 63 characters


Example of the staging naming convention:
ODS_0000_F001_F_W_stellar_escalation_follow_up
Note: ODS dont need systemref because if have several system on the same site and domain in staging, they will merge on the same table in the ODS and the table on the ODS will have a column with the systemref to be identified

Fields

Name conventions for the fields are:

Views

The views at this level can be used for:


Name conventions for views are:

Routines

A routine usually runs a set of actions and returns a dataset.
Name conventions for routine are:


Note: All the ETL will be performed by the Talend tool.

Models Naming Conventions

Description

This area will be used to describe naming conventions models for the Data warehouse and Data Marts.
A data warehouse is a large collection of business data used to help an organization make decisions.
A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject area.

Dataset

The name of the Dataset is DM

Tables

Name conventions for tables are:

Fields




Product Naming Conventions

Description

This area will be used to describe naming conventions for the product naming conventions.

Dataset

The name of the Dataset is PRD_namingoftheproduct

ex: PRD_PricingDataLake



Views

V_AUT
V_ETL
V_REP ⇒ for reporting purpose

Routines

A routine usually runs a set of actions and returns a dataset.
Name conventions for routine are:


Note: All the ETL will be performed by the Talend tool.