Author

Version

Comments

Date

João Fonseca

0.1

Initial Version

26/12/2022

Fernando Girante

1.1

Revision

22/05/2023














Summary
1- Introduction
1.1- Purpose of the document
1.2- Intended Audience
2- Bucket Naming Convention
2.1- Description
2.2- Buckets
2.2 - Fields
3- Staging Naming Convention
3.1- Description
3.2- Tables
3.3- Fields
3.4- Views
3.5- Routines
4- WDL Naming Convention
4.1- Description
4.2- Tables
4.3- Fields
4.3- Views
4.3- Routines
5- ODS Naming Convention
5.1- Description
5.2- Tables
5.3- Fields
5.4- Views
5.5- Routines
6- Models Naming Conventions
6.1- Description
6.2- Tables
6.3- Fields
6.4- Views
6.5- Routines


1- Introduction
1.1- Purpose of the document
This document describes the naming for all the objects to be used in a project or in domain for GCP and Talend technologies.This will help to normalize all the names and facilitate the way of work.
1.2- Intended Audience
This document is intended for the Data Architectures and the Data Engineering team.
It will be used as reference for any project or domain for the developments of the models.
2- Bucket Naming Convention
2.1- Description
A Bucket is anything you want to store in the GCS in order to perform any operations on them.
Naming your buckets is a great way to make it easy for you to find your data in the Cloud Storage service.
2.2- Buckets
Name conventions for buckets are:


To generate the name of a bucket use this work document for all the conventioning names, filling the 11 first fields in the google sheet:
{+}https://docs.google.com/spreadsheets/d/1pGYnumYbJCOuu3MvVHMARUsLpiB_miXVMx1geReQYkE/edit#gid=1870517346+

#

Category

Description

001

System Name

System the data come from.
Ex: HLX; ELN; MES
3 Characters for this category

002

Domain

Domain that data comes from.
Ex: HR; RI; etc…
If we don't know the Domain, use the generic one: IT
2 Characters for this category

003

Site Name

Site is the same system existing in a different location.
This information comes from the source.
Ex: 0000
4 Characters for this category

004

System Reference

Reference of the system.
Ex:0000
4 Characters for this category

005

File Code

For each system and project we can receive more than 1 file and we need to identify each of them. This will help to understand how many files that project receives for a specific system, if one of them is missing the process will abort.
Ex: F001; F002
4 Characters for this category

006

Business Date

When data was extracted from the business side.
Format: YYYYMMDDHHMISS
Ex:20220812000000
14 characters for this category

007

Sequential Number

If we have more than one file with exactly the same name.
This can happen when we extract big volumes of data and we need to split the file in 2 or 3
Ex: 0000; 0001
4 characters for this category

008

Extraction Type

If it's a full or incremental extraction (F= Full; I = Incremental)
1 character for this category

009

Frequency

If it's Monthly, weekly, daily, quarterly etc…
Ex: M = monthly, W = weekly
1 character for this category

010

File name

Identify the content of the table. Need to be the same name used by the table source.
Ex:
Cannot exceed 60 characters





Example of the bucket naming convention:
HLX_IT_0000_0000_F001_20220812000000_0000_F_W_Stellar_escalation_follow_up
2.2 - Fields
For the buckets the naming of the fields usually comes from the google sheet source, to guarantee the naming convention, this needs to be discussed with the source to send the files with the fields names exactly as we need, in case it is not possible we need to normalize the fields in GCP side.
Name conventions for fields are:


3- Staging Naming Convention
3.1- Description
A staging area, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses, data marts, or other data repositories.
We will use this staging for merging bucket files (in case we have more than one file for the same table).
Datasets name always needs to be in CAPITAL letter ,
3.2- Tables
Name conventions for tables are:


To generate the name of the staging table use this work document for all the conventioning names, filling the 11 first fields in the google sheet:
{+}https://docs.google.com/spreadsheets/d/1pGYnumYbJCOuu3MvVHMARUsLpiB_miXVMx1geReQYkE/edit#gid=1870517346+

#

Category

Description

001

Area

Start with the prefix STG to identify
3 characters for this category

002

System Name

System the data come from.
Ex: HLX; ELN; MES
3 Characters for this category

003

Site Name

Site that data come from
Ex: 0000
4 Characters for this category

004

System Reference

System the data come from.
Ex: HLX; ELN; MESEx:0000
4 Characters for this category

005

File Code

For each file code we will have a dedicated staging table
Ex: F001; F002
4 Characters for this category

006

Extraction Type

If it's a full or incremental extraction (F= Full; I = Incremental)
1 character for this category

007

Frequency

If it's Monthly, weekly, daily, quarterly etc…
Ex: M = monthly, W = weekly
1 character for this category

008

File name

Identify the content of the table.
Ex:
Cannot exceed 63 characters





Example of the staging naming convention:
STG_HLX_0000_0000_F001_F_W_stellar_escalation_follow_up

3.3- Fields
Name conventions for fields are:


3.4- Views
At this level the Views are only used for security or to reduce the number of field purposes, we will not deliver data from the staging area to any report or extraction layer.
Name conventions for views are:


3.5- Routines
A routine usually runs a set of actions and returns a dataset.
Name conventions for routine are:


Note: All the ETL will be performed by the Talend tool.
4- WDL Naming Convention
4.1- Description
Working data layer, it's an area for temporary tables or for tables we need to work on to deliver a final
4.2- Tables
Name conventions for tables are:


4.3- Fields
Name conventions for the fields are:


4.3- Views
4.3- Routines
5- ODS Naming Convention
5.1- Description
An operational data store (ODS) is used for operational reporting and as a source of data for the enterprise data warehouse (EDW). It is a complementary element to an EDW in a decision support environment, and is used for operational reporting, controls, and decision making, as opposed to the EDW, which is used for tactical and strategic decision support.

5.2- Tables
Name conventions for tables are:



To generate the name of the staging table use this work document for all the conventioning names, filling the 11 first fields in the google sheet:
{+}https://docs.google.com/spreadsheets/d/1pGYnumYbJCOuu3MvVHMARUsLpiB_miXVMx1geReQYkE/edit#gid=1870517346+

#

Category

Description

001

Area

Start with the prefix STG to identify
3 characters for this category

002

System Name

System the data come from.
Ex: HLX; ELN; MES
3 Characters for this category

003

Site Name

Site that data come from
Ex: 0000
4 Characters for this category

004

File Code

For each file code we will have a dedicated staging table
Ex: F001; F002
4 Characters for this category

005

Extraction Type

If it's a full or incremental extraction (F= Full; I = Incremental)
1 character for this category

006

Frequency

If it's Monthly, weekly, daily, quarterly etc…
Ex: M = monthly, W = weekly
1 character for this category

007

File name

Identify the content of the table.
Ex:
Cannot exceed 63 characters


Example of the staging naming convention:
ODS_0000_F001_F_W_stellar_escalation_follow_up
Note: ODS dont need systemref because if have several system on the same site and domain in staging, they will merge on the same table in the ODS and the table on the ODS will have a column with the systemref to be identified

5.3- Fields
Name conventions for the fields are:


5.4- Views
The views at this level can be used for:


Name conventions for views are:



5.5- Routines
A routine usually runs a set of actions and returns a dataset.
Name conventions for routine are:


Note: All the ETL will be performed by the Talend tool.
6- Models Naming Conventions
6.1- Description
This area will be used to describe naming conventions models for the Data warehouse and Data Marts.
A data warehouse is a large collection of business data used to help an organization make decisions.
A data mart is a subset of a data warehouse focused on a particular line of business, department, or subject area.
6.2- Tables
Name conventions for tables are:



6.3- Fields
6.4- Views
V_AUT
V_ETL
V_REP ⇒ for reporting purpose
6.5- Routines
A routine usually runs a set of actions and returns a dataset.
Name conventions for routine are:


Note: All the ETL will be performed by the Talend tool.