| Jira | ||||||
|---|---|---|---|---|---|---|
|
| Info |
|---|
| Info |
|
...
- application contains raw ingested data.
- platform contains cleaned and standardized data.
- entreprise contains business-ready datasets optimized for analytics.
2.2 One Object = One Concept
...
- _country_cd → Country code based on a standard reference (e.g., ISO).
Example: country_cd = "FR" - _region_cd → Region or state code.
Example: region_cd = "IDF" - _city → City name.
Example: city = "Paris" - _postal_cd → Postal or ZIP code.
Example: postal_cd = "75001"
...
7. Column Type Standards
| Suffix | Expected Type |
|---|---|
| _id | string or integer |
| _key | integer |
| _ |
| ref | string |
| _ |
| cd | string |
| _ |
| type | string |
| _ |
| category |
| string |
| _ |
| segment |
| string |
| _level_ |
| cd |
| string |
| _ |
| name |
decimal
_dt
date
_ts
timestamp
is_
| string | |
| _desc | string |
| _label | string |
| _dt | date |
| _ts | timestamp |
| _time | time |
| _amt | decimal |
| _cost_amt | decimal |
| _price_amt | decimal |
| _rev_amt | decimal |
| _tax_amt | decimal |
| _disc_amt | decimal |
| _qty | decimal or integer |
| _cnt | integer |
| _vol | decimal |
| _wt | decimal |
| _len | decimal |
| _pct | decimal |
| _rate | decimal |
| _ratio | decimal |
| _idx | decimal or integer |
| is_ | boolean |
| has_ | boolean |
| _status_cd | string |
| _status_desc | string |
| _hash | string |
| _version | string |
| _seq | integer |
| _event | string |
| _country_cd | string |
| _region_cd | string |
| _city | string |
| _postal_cd | string |
| created_ts | timestamp |
| updated_ts | timestamp |
| load_ts | timestamp |
| pipeline_run_id | string |
| record_source | string |
| row_hash | string |
...
8. Mandatory Metadata Columns
...
Column Name | Type | Description | Example |
|---|---|---|---|
_ingestion_ts | timestamp | Timestamp when the record was ingested into the platform. Used to track ingestion time and support replay or debugging of ingestion pipelines. | 2025-03-12 14:25:32 |
_batch_id | string | Identifier of the ingestion batch or pipeline execution that loaded the record. Useful for pipeline monitoring and troubleshooting. | batch_20250312_01 |
_source_system | string | Name of the source system from which the record originated. Helps identify data lineage and upstream systems. | sap |
_source_file (or source object) | string | Name or path of the file used to ingest the record (when file-based ingestion is used). Enables traceability back to the original data file. | Sap_sales_20250312.csv |
_record_hash | string | Hash value representing the content of the record. Used to detect changes between loads or identify duplicates efficiently. | a9f4b12c89d2 |
...
Platform Layer
Column Name | Type | Description | Example |
|---|---|---|---|
created_ts | timestamp | Timestamp when the record was first created in the platform layer. Helps identify when the data became available in the curated layer. | 2025-03-12 14:25:32 |
updated_ts | timestamp | Timestamp of the latest update applied to the record in the platform layer. Useful for incremental processing and debugging transformations. | 2025-03-12 16:02:01 |
row_hash | string | Hash calculated from the business columns of the row. Used to detect if the data has changed between loads. | 8fa21a3b5d |
record_source | string | Source system or pipeline that generated the record in the platform layer. Used for lineage and audit. | sap_sales_pipeline |
is_deleted | boolean | Logical deletion flag used when the source record is removed or marked as inactive. Enables soft deletion instead of physical removal. | false |
...
Entreprise Layer
Column Name | Type | Description | Example |
|---|---|---|---|
created_ts | timestamp | Timestamp when the record was first created in the entreprise dataset. Useful for lineage and auditing. | 2025-03-12 17:30:10 |
updated_ts | timestamp | Timestamp of the latest modification applied to the record. Helps identify refresh cycles of the dataset. | 2025-03-13 02:00:00 |
...
Others Technical standards
Column Name | Type | Description | Example |
|---|---|---|---|
load_ts | timestamp | Timestamp when the record was loaded into the table. | 2025-03-13 02:00:00 |
pipeline_run_id | string | Identifier of the pipeline or job execution that produced the object. | pipeline_run_84721 |
...
9. Error and Data Quality Tables
Error and data quality tables are used to manage invalid, incomplete, or inconsistent data during ingestion and transformation processes.
...
- lower layers are source-system oriented
- upper layers are business-domain oriented
Naming Pattern
map_<source_object>_<business_object>
Example:
- map_sap_customer_enterprise_customer
...
Mapping tables are used to:
- align source system values with enterprise standards
- resolve inconsistencies between systems
- standardize business definitions
- enable cross-system integration
Naming Pattern
map_<source_object>_<business_object>
Example:
- map_sap_customer_enterprise_customer
When to Use Mapping Tables
Use mapping tables when:
- multiple systems use different identifiers or codes
- business definitions differ across systems
- values need to be harmonized for analytics
...
- source_system → Origin of the data
- Example: source_system = "sap"
- source_value → Value from the source system
- Example: source_value = "ELEC"
- target_value → Standardized enterprise value
- Example: target_value = "ELECTRONICS"
- mapping_type → Type of mapping applied
- Example: mapping_type = "CATEGORY_MAPPING"
- valid_from_dt → Start date of the mapping validity
- Example: valid_from_dt = "2024-01-01"
- valid_to_dt → End date of the mapping validity
- Example: valid_to_dt = "9999-12-31"