| Jira | ||||||
|---|---|---|---|---|---|---|
|
| Info |
|---|
| Info |
|
...
- application contains raw ingested data.
- platform contains cleaned and standardized data.
- entreprise contains business-ready datasets optimized for analytics.
...
2.2 One Object = One Concept
...
The general naming structure of the platform is:<layer>. <grouping>.<object>
Examples:
- application .: sap.vbak
- platform .: sap.sales_order_header
- entreprise .: sales.fact_sales_order
...
4. Table Naming Standards
...
application tables store raw source data exactly as ingested.
Naming pattern:application. <source_system>.<source_object>
Examples:
- application.sap.vbak
- application.sap.vbap
- application.salesforce.account
- application.workday.worker
Rules:
- table names follow the source system object names
- minimal transformationappend-only ingestion preferred
4.2 Platform - Tables
Platform tables contain cleaned and standardized datasets aligned to source systems.
Naming pattern:platform. <source_system>.<object_name>
Examples:
- platform.sap.sales_order_header
- platform.sap.sales_order_item
- platform.salesforce.account
Rules:
- normalized column naming
- deduplicated records
- standardized data types
- still aligned with source system entities
...
Entreprise tables contain enterprise analytics datasets organized by business domain.
Naming pattern:entreprise. <domain>.<object>
Examples:
- entreprise.sales.fact_sales_order
- entreprise.sales.dim_customer
- entreprise.finance.fact_invoice
...
5. Entreprise Table - Types
...
- _country_cd → Country code based on a standard reference (e.g., ISO).
Example: country_cd = "FR" - _region_cd → Region or state code.
Example: region_cd = "IDF" - _city → City name.
Example: city = "Paris" - _postal_cd → Postal or ZIP code.
Example: postal_cd = "75001"
...
7. Column
...
Type Standards
| Suffix |
|---|
...
| Expected Type |
|---|
| _id |
...
| string or integer | |
| _ |
...
| key |
...
| integer |
...
| _ |
...
| ref | string |
...
| _ |
...
| cd | string |
...
| _ |
...
| type |
...
| string |
| _ |
...
| category |
...
| string |
...
| _ |
...
| segment | string |
| _level_cd |
...
| string | |
| _name | string |
| _desc |
...
| string |
...
| _ |
...
| label | string |
...
| _ |
...
| dt |
...
| date |
| _ |
...
| ts | timestamp |
| _ |
...
| time |
...
| time |
...
| _ |
...
| amt | decimal |
...
| _ |
...
| cost_ |
...
| amt |
...
| decimal |
| _price_amt |
...
| decimal | |
| _ |
...
| rev_amt | decimal |
| _ |
...
| tax_amt | decimal |
| _disc_amt | decimal |
| _qty | decimal or integer |
| _cnt | integer |
| _vol | decimal |
| _wt | decimal |
| _len | decimal |
| _pct | decimal |
| _rate | decimal |
| _ratio | decimal |
| _idx | decimal or integer |
| is_ | boolean |
| has_ | boolean |
| _status_cd | string |
| _status_desc | string |
| _hash | string |
| _version | string |
| _seq | integer |
| _event | string |
| _country |
Examples:
- sales_amt
- order_qty
- item_cnt
- margin_pct
Boolean Fields
Preferred format: is_<condition>
Examples:
- is_active
- is_deleted
- is_returned
7. Column Type Standards
Suffix | Expected Type |
_id | string or integer |
_key | integer |
| _cd | string |
| _nameregion_cd | string |
| _desccity | string |
| _postal_amtcd | decimalstring |
| created_qtyts | Decimal or integertimestamp |
| updated_cntts | integertimestamp |
| load_pctts | decimaltimestamp |
| pipeline_run_dtid | datestring |
| record_tssource | timestampstring |
| isrow_hash | booleanstring |
...
8. Mandatory Metadata Columns
...
Column Name | Type | Description | Example |
|---|---|---|---|
_ingestion_ts | timestamp | Timestamp when the record was ingested into the platform. Used to track ingestion time and support replay or debugging of ingestion pipelines. | 2025-03-12 14:25:32 |
_batch_id | string | Identifier of the ingestion batch or pipeline execution that loaded the record. Useful for pipeline monitoring and troubleshooting. | batch_20250312_01 |
_source_system | string | Name of the source system from which the record originated. Helps identify data lineage and upstream systems. | sap |
_source_file (or source object) | string | Name or path of the file used to ingest the record (when file-based ingestion is used). Enables traceability back to the original data file. | Sap_sales_20250312.csv |
_record_hash | string | Hash value representing the content of the record. Used to detect changes between loads or identify duplicates efficiently. | a9f4b12c89d2 |
...
Platform Layer
Column Name | Type | Description | Example |
|---|---|---|---|
created_ts | timestamp | Timestamp when the record was first created in the platform layer. Helps identify when the data became available in the curated layer. | 2025-03-12 14:25:32 |
updated_ts | timestamp | Timestamp of the latest update applied to the record in the platform layer. Useful for incremental processing and debugging transformations. | 2025-03-12 16:02:01 |
row_hash | string | Hash calculated from the business columns of the row. Used to detect if the data has changed between loads. | 8fa21a3b5d |
record_source | string | Source system or pipeline that generated the record in the platform layer. Used for lineage and audit. | sap_sales_pipeline |
is_deleted | boolean | Logical deletion flag used when the source record is removed or marked as inactive. Enables soft deletion instead of physical removal. | false |
...
Entreprise Layer
Column Name | Type | Description | Example |
|---|---|---|---|
created_ts | timestamp | Timestamp when the record was first created in the entreprise dataset. Useful for lineage and auditing. | 2025-03-12 17:30:10 |
updated_ts | timestamp | Timestamp of the latest modification applied to the record. Helps identify refresh cycles of the dataset. | 2025-03-13 02:00:00 |
Others Technical standards
Column Name | Type | Description | Example |
|---|---|---|---|
load_ts | timestamp | Timestamp when the record was loaded into the table. | 2025-03-13 02:00:00 |
pipeline_run_id | string | Identifier of the pipeline or job execution that produced the object. | pipeline_run_84721 |
...
9. Error and Data Quality Tables
...
- lower layers are source-system oriented
- upper layers are business-domain oriented
...
map_<source_object>_<business_object>
Example:
- map_sap_customer_enterprise_customer
Purpose
Mapping tables are used to:
- align source system values with enterprise standards
- resolve inconsistencies between systems
- standardize business definitions
- enable cross-system integration
Naming Pattern
map_<source_object>_<business_object>
Example:
- map_sap_customer_enterprise_customer
When to Use Mapping Tables
Use mapping tables when:
- multiple systems use different identifiers or codes
- business definitions differ across systems
- values need to be harmonized for analytics
...
- source_system → Origin of the data
- Example: source_system = "sap"
- source_value → Value from the source system
- Example: source_value = "ELEC"
- target_value → Standardized enterprise value
- Example: target_value = "ELECTRONICS"
- mapping_type → Type of mapping applied
- Example: mapping_type = "CATEGORY_MAPPING"
- valid_from_dt → Start date of the mapping validity
- Example: valid_from_dt = "2024-01-01"
- valid_to_dt → End date of the mapping validity
- Example: valid_to_dt = "9999-12-31"