Page History

Task Name

Description

Env

Responsibility

Obtain [SOURCE_NAME] Access & Technical Documentation

Objective

Get all the access, credentials, and documentation needed so we can safely and reliably ingest data from the [SOURCE_NAME] system.

________________________________________

---------------------------------------------------------------------------

Description

This task covers everything required to connect to the [SOURCE_NAME] system from our environment. It includes setting up user/service access, collecting connection and authentication details, understanding the data structure, and documenting how often data will be updated.

The outcome is that we can successfully test a connection from Dev and have all information stored securely and centrally.

________________________________________
Scope
•

---------------------------------------------------------------------------

Scope

Access to source system

o

Request and obtain a user or service account for the [SOURCE_NAME] system.

o

Ensure the account has the correct permissions for data access (read-only unless otherwise agreed).

•

Connection details (as applicable)

o

Collect API details:

base URL, endpoints, required headers, query parameters.
o

Base URL
Endpoints
Required headers
Query parameters

Collect database details:

server

Server/host

, port, database name, schema, and any

Port
Database name
Schema
Any required network info

.

o

Collect SFTP details:

host, port, folder/path, file naming conventions.
•

Host
Port
Folder/path
File naming conventions

Authentication details

o

Obtain the required auth method and details, e.g.:



OAuth (client ID/secret, token URL, scopes, etc.)



API key(s)



Username and password



Certificates/keys

o

Clarify any token expiry, rotation, or renewal process.

•

Network / IP whitelisting

o

Collect the list of IP addresses or ranges from which we will connect (Dev at minimum).

o

Share these with the source owner for whitelisting if required.

•

Sample data

o

Request sample data files or sample API responses that are representative of real data.

o

Use samples to validate structure, formats, and edge cases.

• Data dictionary / schema
o Obtain data dictionary or schema documentation describing:
 Tables/endpoints/files
 Fields, data types, allowed values
 Key relationships and important business rules.
• Frequency and SLA
o

Frequency and SLA

Confirm with the source owner:

Data refresh

Confirm with the source owner:
 Data refresh

frequency (e.g. real-time, hourly, daily)



SLA for data availability and expected response times

.
________________________________________
Deliverables
• Credentials
o

---------------------------------------------------------------------------

Deliverables

Credentials

All access credentials (accounts, keys, secrets, certificates) are stored securely in Key Vault (or the agreed secure secrets store).

•

Documentation

o

All technical and data documentation (connection details, auth steps, schema, IPs, refresh frequency, SLA) is stored in the code.

•

Source owner confirmation

o

A confirmation email/message from the [SOURCE_NAME] owner stating that:



Access has been granted



Connection details and expectations (frequency/SLA) are correct

.
________________________________________
Definition of Done
•

---------------------------------------------------------------------------

Definition of Done

A test connection from the Dev environment to [SOURCE_NAME] is successful using the stored credentials.

•

All received documentation is uploaded to the agreed central repository.

•

All credentials are stored securely in Key Vault.

Dev
Test
Prod

Data Engineer

DEV

[ENV] : Network Whitelisting for [SOURCE_NAME]

Objective

:

Ensure network connectivity from Dev environment Connects to [SOURCE_NAME].

Scope

:

•

Obtain outbound IP of Azure Function / Container App

•

Validate IP with Tanish/Marc before raising request

•

Raise firewall/whitelisting request for:

VDI
Azure Function
Container App
GitHub runners (if required)

•

Confirm port-level access (443/22/etc.)

Deliverables

:

•

Approved firewall change request

•

Connectivity test successful

Definition of Done

:

•

Connection test from Dev Azure Function successful

Dev
Test
Prod

DevOps

Collect [SOURCE_NAME] Schema and Table Metadata details

Objective

Identify all source tables, fields, and any dependencies needed for [PROJECT], and document how they will be used for the purpose of [PROJECT PURPOSE].

Note - Cautious while working or check if the source will be obsolete after some, then in that case there might be some rework/effort

Identify the complexity/Priority for data load

________________________________________
Scope
•

---------------------------------------------------------------------------

Scope

Identify required source tables

o

List all source tables needed for [PROJECT].

o

Capture each table’s structure: schema name, table name.

•

Key and identifier details

o

Identify primary keys, composite keys, and any important business/CLI identifiers used for joins or lookups.

•

Data volume estimation

o

Estimate data volume for each required table (e.g. row counts, growth per day/month).

o

Note any high-volume tables that may impact performance or storage.

•

Sensitive data assessment

o

Identify columns that contain sensitive or PII (Personally Identifiable Information).

o

Flag these fields clearly for later

masking

masking,

encryption

encryption, or access control.

________________________________________
Deliverables
•

---------------------------------------------------------------------------

Deliverables

Source Table Inventory Document

o

List of all required source tables, with basic details (schema, table name, purpose, dependencies).

•

Column-level Metadata

o

For each table: column name, data type, key/identifier flags, sensitivity/PII flags, and short description where available.

________________________________________
Definition of Done
•

---------------------------------------------------------------------------

Definition of Done

All required source tables for [PROJECT] are identified.

•

Table and column details (including keys and sensitive fields) are documented in the Source Table Inventory and column-level metadata.

Dev

DEV:

Create Azure Function – [SOURCE_NAME]

Objective

Build an Azure Function in the Development environment to extract data from [SOURCE_NAME] for the [USE_CASE].

Scope

•

Create a Python Azure Function to connect to [SOURCE_NAME] and extract the required data.

•

Use Python 3.11 as the runtime environment for

the

the function.

Deliverables

•

The Azure Function is deployed and available in

the

the Dev environment.

Definition of Done

•

The Azure Function runs successfully in Dev	Dev	DevOps
Implement Data Ingestion from [Source] to Kafka	Objective

:

Implement ingestion pipeline to publish data from [SOURCE_NAME] to Kafka
For Full Load

Scope

:

•

Implement producer logic in Azure Function

•

Handle error scenarios

Deliverables

:

•

Data published successfully to Kafka topic and Fabric Application zone

•

Sample messages validated

•

Throughput validated

Definition of Done

:

•

Messages visible in Kafka

•

No data loss during retry

•

Error handling tested	Dev	Data Engineer
Implement Delta/Incremetal Logic for [Source]	Objective

:

Implement logic to ingest only new or updated records from source.

Scope

:

•

Load Incremental data
Validate late-arriving data handling

•

Backfill support

Deliverables

:

•

Metadata storage created

•

Successful incremental test run

Definition of Done

:

•

No duplicate ingestion

•

Only new records processed

•

Metadata updated after each successful run

Dev

Data Engineer

Setup Github Repo and Create CI/CD Pipeline

Objective

Set up an automated deployment pipeline for [SOURCE_NAME] so that code can be built, tested and deployed to Dev automatically, and to Prod with manual approval.

Scope

:

1.

Create GitHub repository

•

Create a new GitHub repository for the [SOURCE_NAME] codebase.

2. Define branching strategy

Configure environment variables per environment

Ensure secrets are stored securely

• Use two main branches:
main – for production-ready code
dev – for development and testing
Document how and when code is merged between dev and main romoted to the Main/Prod branch is:
Successfully deployed to the Prod environment via the pipeline after passing the manual approval step.
• Pipeline status is visible in GitHub Actions, and basic run instructions are documented in the repository (e.g. in README.md).
Dev to ProdData EngineerDEV: Setup Monitoring & Data Validation for [Source]Objective:
Implement monitoring and validation checks for ingestion pipeline.
Scope:
• Enable Application Insights
• Create ingestion success/failure logs
• Implement row count validation
• Implement schema validation check
• Track ingestion duration
• Validate Kafka message count vs source count
Deliverables:
• Monitoring dashboard created
• Validation queries implemented
• Test failure scenario validated
Definition of Done:
• Metrics visible in monitoring tool
• Validation alerts triggered on failureDevData EngineerSet Up Alerting and LoggingObjective:
Implement automated alerting for ingestion pipeline failures and performance degradation.
Scope:
• Configure and validate alerts for the ingestion pipeline, including:
• Function execution failures
• Kafka publish failures
• Zero records ingested for a scheduled run
• Abnormally long execution time / SLA breach
• Integration of alerts with email and/or Microsoft Teams channels
• Definition and configuration of appropriate severity levels (e.g. Critical, High, Medium, Low)
Deliverables:
• Alerts tested
• Alert documentation created
Definition of Done:
• Failure simulation triggers alert
• Alert reaches responsible teamDevData EngineerDeploy to Production [Support engineer]Objective
Deploy the finalized Python code for the [SOURCE_NAME] system to the Production environment, following deployment and security best practices.
________________________________________
Description
This task covers preparing, validating, and deploying the final Python code for [SOURCE_NAME] into Production. It ensures that the code is production-ready, uses proper configuration and secrets management, and that deployment is done in a controlled and auditable way.
________________________________________
Scope
Setup Pull request and review it.
• Deployment readiness
o Confirm that the Python code has passed all required checks in lower environments (Dev / QA / UAT).
o Ensure all known defects for this release are either resolved or accepted.
• Configuration & secrets
o Verify that all Prod configuration (environment variables, connection strings, endpoints) is set correctly.
o Ensure all secrets (keys, passwords, tokens) are stored in a secure store (e.g. Key Vault, GitHub/Azure DevOps secrets) and not in code.
• Deployment process
o Use the approved CI/CD pipeline or standard deployment process to deploy the Python code to Production.
o Follow the agreed change management process (e.g. change ticket, approvals, CAB if required).
o Perform a controlled deployment (e.g. scheduled window, blue/green/canary if applicable).
• Post-deployment validation
o Run smoke tests or basic functional checks to confirm that the Python code runs correctly in Production.
o Verify logging and monitoring are working (logs, alerts, dashboards).
• Documentation & handover
o Update deployment notes / release documentation with:
 Deployed version / commit
 Deployment date and time
 Any known issues or follow-up items
o Inform relevant stakeholders that the deployment is complete.
________________________________________
Deliverables
• Python code for [SOURCE_NAME] successfully deployed to the Production environment.
• Updated configuration and secrets for Production stored in the approved secure store.
• Deployment / release notes documented in the project Confluence/SharePoint or release tracker.
________________________________________
Definition of Done
•

(e.g.

via pull requests).
3. Set up GitHub Actions pipeline
• Create a GitHub Actions workflow that runs on relevant events (e.g. pull request, push to dev or main).
4. Implement pipeline steps
• The pipeline should include at least:
a. Code linting – run static code analysis / style checks.
b. Unit tests – run the automated unit test suite and fail the build if tests fail.
c. Build validation – build/pack the application to ensure it compiles/builds successfully.
d. Dev deployment – automatically deploy successful builds from dev (or a chosen branch) to the Dev environment.
e. Prod deployment with approval – deploy to the Prod environment only after a manual approval step (e.g. environment protection rule or manual approval job).
5. Configure environment variables per environment
• Define and store configuration values separately for:
Dev environment
Prod environment
Ensure secrets are stored securely (e.g. GitHub Secrets) and are correctly used by the pipeline.
Deliverables:
• A fully working CI/CD pipeline in GitHub Actions for [SOURCE_NAME].
• Automatic deployment to Dev on successful pipeline runs (as defined in the branching strategy).
• Manual approval-based deployment to Prod with a clear approval step and responsible approvers defined.
Definition of Done:
• A sample change merged to the Dev branch is:
Automatically built, tested, and successfully deployed to the Dev environment via the pipeline.
• A sample change p

GitHub Secrets) and are correctly used by the pipeline. Deliverables A fully working CI/CD pipeline in GitHub Actions for [SOURCE_NAME]. Automatic deployment to Dev on successful pipeline runs (as defined in the branching strategy). Manual approval-based deployment to Prod with a clear approval step and responsible approvers defined. Definition of Done Automatically built, tested, and successfully deployed to the environment via the pipeline. Update the readme.	Dev	Data Engineer
DEV: Setup Monitoring & Data Validation for [Source]	Objective Implement monitoring and validation checks for ingestion pipeline. Scope Enable Application Insights Create ingestion success/failure logs Implement row count validation Implement schema validation check Track ingestion duration Validate Kafka message count vs source count Deliverables Monitoring dashboard created Validation queries implemented Test failure scenario validated Definition of Done Metrics visible in monitoring tool Validation alerts triggered on failure	Dev Test Prod	Data Engineer
Set Up Alerting and Logging	Objective Implement automated alerting for ingestion pipeline failures and performance degradation. Scope Configure and validate alerts for the ingestion pipeline, including: Function execution failures Kafka publish failures Zero records ingested for a scheduled run Abnormally long execution time / SLA breach Integration of alerts with email and/or Microsoft Teams channels Definition and configuration of appropriate severity levels (e.g. Critical, High, Medium, Low) Deliverables Alerts tested Alert documentation created Definition of Done Failure simulation triggers alert Alert reaches responsible team	Dev Test Prod	Data Engineer
Deploy [Source] data to Production	Objective Deploy the finalized Python code for the [SOURCE_NAME] system to the Production environment, following deployment and security best practices. --------------------------------------------------------------------------- Description This task covers preparing, validating, and deploying the final Python code for [SOURCE_NAME] into Production. It ensures that the code is production-ready, uses proper configuration and secrets management, and that deployment is done in a controlled and auditable way. --------------------------------------------------------------------------- Scope Setup Pull request and review it. Deployment readiness Confirm that the Python code has passed all required checks in lower environments (Dev / QA / UAT). Ensure all known defects for this release are either resolved or accepted. Configuration & secrets Verify that all Prod configuration (environment variables, connection strings, endpoints) is set correctly. Ensure all secrets (keys, passwords, tokens) are stored in a secure store (e.g. Key Vault, GitHub/Azure DevOps secrets) and not in code. Deployment process Use the approved CI/CD pipeline or standard deployment process to deploy the Python code to Production. Follow the agreed change management process (e.g. change ticket, approvals, CAB if required). Perform a controlled deployment (e.g. scheduled window, blue/green/canary if applicable). Post-deployment validation Run smoke tests or basic functional checks to confirm that the Python code runs correctly in Production. Verify logging and monitoring are working (logs, alerts, dashboards). Documentation & handover Update deployment notes / release documentation with: Deployed version / commit Deployment date and time Any known issues or follow-up items Inform relevant stakeholders that the deployment is complete. --------------------------------------------------------------------------- Deliverables Python code for [SOURCE_NAME] successfully deployed to the Production environment. Updated configuration and secrets for Production stored in the approved secure store. Deployment / release notes documented in the project Confluence/SharePoint or release tracker. --------------------------------------------------------------------------- Definition of Done

Deployment to Production completes without errors using the approved process.

•

Smoke tests in Production pass and the application behaves as expected.

•

No secrets are stored in source code or plain text; all are managed via the secure store.

•
• Build or configure a process to extract data from [SOURCE_NAME].
•

Deployment details are documented and communicated to stakeholders.

Prod

Support Engineer

Implement Security Policy

Dev
Test
Prod

Data Engineer

Ingest [SOURCE_NAME] Data to [INTERMEDIATE_LAYER_NAME] for Fabric Processing

Objective

Extract data from [SOURCE_NAME] and store it in the [INTERMEDIATE_LAYER_NAME] (e.g. ADLS Gen2, Blob Storage, File Share).

This intermediate layer will be the trusted source for downstream loading into Microsoft Fabric.

Direct ingestion into Fabric is not feasible because of:
[Network restriction / API limitation / Security constraint / Performance limitation / Architectural decision].

________________________________________
Scope

---------------------------------------------------------------------------

Scope

Build or configure a process to extract data from [SOURCE_NAME].

Write the extracted data to the defined intermediate layer ([INTERMEDIATE_LAYER_NAME]).

•

Ensure the data in the intermediate layer is complete, consistent, and ready for ingestion into Microsoft Fabric.

________________________________________
Deliverables
• Data written to intermediate layer
o

---------------------------------------------------------------------------

Deliverables

Data written to intermediate layer

Data from [SOURCE_NAME] is successfully stored in [INTERMEDIATE_LAYER_NAME].

•

Folder / path structure created

o

Clear and consistent folder or path structure in the intermediate layer (e.g. /source_name/entity/date=YYYY-MM-DD/).

•

Incremental load logic implemented

o

Logic in place to load only new or changed data (e.g. based on timestamp, watermark, or change flag).

•

Metadata updated

o

Relevant metadata is captured and maintained (e.g. load time, source system, record counts, watermark value).

•

Monitoring configured

o

Monitoring and logging set up for:



Job status (success/failure)



Data volume checks

 Error handling
________________________________________

Error handling

---------------------------------------------------------------------------

Definition of Done (DoD)

•

Successful test execution in [ENV]

o

End-to-end run in the target environment ([ENV], e.g. Dev/UAT/Prod) completes successfully.

•

No duplicate files on re-run

o

Re-running the job does not create duplicate files or duplicate data in the intermediate layer.

•

Watermark updated correctly

o

Watermark or equivalent mechanism is updated after each run and used correctly for incremental loads.

•

Logs visible in monitoring system

o

Execution logs (success, failures, metrics) are visible in the agreed monitoring/logging tool.

•

Alert tested successfully

o

At least one failure/alert scenario has been tested and notifications are received by the right team.

•

Documentation updated

o

Technical documentation (process flow, paths, incremental logic, watermark strategy, monitoring) is updated in the central repository (e.g. Confluence/SharePoint).

•

Sign-off received from [Team/Owner]

o

Formal sign-off from [Team/Owner] confirming the solution meets requirements and is ready for use by downstream consumers.

Dev
Test
Prod

Data Engineer

Setup Intermidiate layer

Setup [INTERMEDIATE_LAYER_NAME]

Objective

Define and set up the [INTERMEDIATE_LAYER_NAME] (e.g. ADLS Gen2, Blob Storage, File Share) for data coming from [SOURCE], so that data can be loaded into Microsoft Fabric in a controlled and reliable way.

Direct ingestion into Fabric is not feasible due to:
[Network restriction / API limitation / Security constraint / Performance limitation / Architectural decision].

________________________________________

---------------------------------------------------------------------------

Scope of Work

1.

Intermediate Layer Setup

•

Storage account

o

Validate that the storage account [storage_account_name] exists and is suitable for this use case,

o

Or create a new storage account [storage_account_name] if one does not exist.

•

Container

o

Create or validate the container [container_name] in the storage account to store data from [SOURCE].

•

Folder structure

o

Define and document the folder/path structure for organizing data (for example):



/[source]/[entity]/date=YYYY-MM-DD/

o

Ensure the structure supports incremental loads and downstream consumption by Fabric.

•

File naming convention

o

Define and document a standard naming pattern:



"[source]

_

[entity]

_

[YYYYMMDDHHMMSS].[format]"

o

Clarify how each part (source, entity, timestamp, format) will be populated.

•

Retention policy

o

Define and document retention rules for data stored in the intermediate layer:



Raw retention: [X days]



Archive retention: [X days]

________________________________________
Deliverables
•

---------------------------------------------------------------------------

Deliverables

Documented intermediate layer design (storage account, container, folder structure, naming convention, retention).

•

Storage account [storage_account_name] created/validated for use as the intermediate layer.

•

Container [container_name] created/validated in the storage account.

•

Agreed and documented folder structure and file naming convention.

•

Documented retention policy for raw and archived data.

________________________________________

---------------------------------------------------------------------------

Definition of Done

•

The storage account [storage_account_name] and container [container_name] are available and ready to use.

•

Folder structure and file naming convention are clearly documented and approved by the relevant team/owner.

•

Retention policy (raw and archive) is defined, documented, and agreed.

•

The intermediate layer design is stored in the central documentation location

(e.g. Confluence/SharePoint)

and referenced for future ingestion tasks.

Dev
Test
Prod

Data Engineer

Page tree

Versions Compared

Old Version 3

New Version Current

Key