Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Task NameDescriptionEnvResponsibility
Obtain [SOURCE_NAME] Access & Technical Documentation

Objective

Get all the access, credentials, and documentation needed so we can safely and reliably ingest data from the [SOURCE_NAME] system.

________________________________________

---------------------------------------------------------------------------

Description

This task covers everything required to connect to the [SOURCE_NAME] system from our environment. It includes setting up user/service access, collecting connection and authentication details, understanding the data structure, and documenting how often data will be updated.

The outcome is that we can successfully test a connection from Dev and have all information stored securely and centrally.

________________________________________
Scope

---------------------------------------------------------------------------

Scope

  • Access to source system
o
  • Request and obtain a user or service account for the [SOURCE_NAME] system.
o
  • Ensure the account has the correct permissions for data access (read-only unless otherwise agreed).
  • Connection details (as applicable)
o

Collect API details:

base URL, endpoints, required headers, query parameters.
o

  • Base URL
  • Endpoints
  • Required headers
  • Query parameters

Collect database details:

server

  • Server/host
, port, database name, schema, and any
  • Port
  • Database name
  • Schema
  • Any required network info
.
o

Collect SFTP details:

host, port, folder/path, file naming conventions.

  • Host
  • Port
  • Folder/path
  • File naming conventions
  • Authentication details
o

Obtain the required auth method and details, e.g.:

  • OAuth (client ID/secret, token URL, scopes, etc.)
  • API key(s)
  • Username and password
  • Certificates/keys
o

Clarify any token expiry, rotation, or renewal process.

  • Network / IP whitelisting
o
  • Collect the list of IP addresses or ranges from which we will connect (Dev at minimum).
o
  • Share these with the source owner for whitelisting if required.
  • Sample data
o
  • Request sample data files or sample API responses that are representative of real data.
o
  • Use samples to validate structure, formats, and edge cases.
• Data dictionary / schema
o Obtain data dictionary or schema documentation describing:
 Tables/endpoints/files
 Fields, data types, allowed values
 Key relationships and important business rules.
• Frequency and SLA
o
  • Frequency and SLA

Confirm with the source owner:

  • Data refresh
Confirm with the source owner:
 Data refresh
  • frequency (e.g. real-time, hourly, daily)
  • SLA for data availability and expected response times
.
________________________________________
Deliverables
• Credentials
o

---------------------------------------------------------------------------

Deliverables

  • Credentials
  • All access credentials (accounts, keys, secrets, certificates) are stored securely in Key Vault (or the agreed secure secrets store).
  • Documentation
o
  • All technical and data documentation (connection details, auth steps, schema, IPs, refresh frequency, SLA) is stored in the code.
  • Source owner confirmation
o

A confirmation email/message from the [SOURCE_NAME] owner stating that:

  • Access has been granted
  • Connection details and expectations (frequency/SLA) are correct
.
________________________________________
Definition of Done

---------------------------------------------------------------------------

Definition of Done

  • A test connection from the Dev environment to [SOURCE_NAME] is successful using the stored credentials.
  • All received documentation is uploaded to the agreed central repository.
  • All credentials are stored securely in Key Vault.


Dev
Test
Prod
Data Engineer
DEV
[ENV] : Network Whitelisting for [SOURCE_NAME]

Objective

:

Ensure network connectivity from Dev environment Connects to [SOURCE_NAME].

Scope

:

  • Obtain outbound IP of Azure Function / Container App
  • Validate IP with Tanish/Marc before raising request
  • Raise firewall/whitelisting request for:
    • VDI
    • Azure Function
    • Container App
    • GitHub runners (if required)
  • Confirm port-level access (443/22/etc.)

Deliverables

:

  • Approved firewall change request
  • Connectivity test successful

Definition of Done

:

  • Connection test from Dev Azure Function successful
Dev
Test
Prod
DevOps
Collect [SOURCE_NAME] Schema and Table Metadata details

Objective

Identify all source tables, fields, and any dependencies needed for [PROJECT], and document how they will be used for the purpose of [PROJECT PURPOSE].

Note - Cautious while working or check if the source will be obsolete after some, then in that case there might be some rework/effort

Identify the complexity/Priority for data load

________________________________________
Scope


---------------------------------------------------------------------------

Scope

  • Identify required source tables
o
  • List all source tables needed for [PROJECT].
o
  • Capture each table’s structure: schema name, table name.
  • Key and identifier details
o
  • Identify primary keys, composite keys, and any important business/CLI identifiers used for joins or lookups.
  • Data volume estimation
o
  • Estimate data volume for each required table (e.g. row counts, growth per day/month).
o
  • Note any high-volume tables that may impact performance or storage.
  • Sensitive data assessment
o
  • Identify columns that contain sensitive or PII (Personally Identifiable Information).
o
  • Flag these fields clearly for later
masking
  • masking,
encryption
  • encryption, or access control.
________________________________________
Deliverables

---------------------------------------------------------------------------

Deliverables

  • Source Table Inventory Document
o
  • List of all required source tables, with basic details (schema, table name, purpose, dependencies).
  • Column-level Metadata
o
  • For each table: column name, data type, key/identifier flags, sensitivity/PII flags, and short description where available.
________________________________________
Definition of Done

---------------------------------------------------------------------------

Definition of Done

  • All required source tables for [PROJECT] are identified.
  • Table and column details (including keys and sensitive fields) are documented in the Source Table Inventory and column-level metadata.


Dev
DEV:

Create Azure Function – [SOURCE_NAME]

Objective

Build an Azure Function in the Development environment to extract data from [SOURCE_NAME] for the [USE_CASE].

Scope

  • Create a Python Azure Function to connect to [SOURCE_NAME] and extract the required data.
  • Use Python 3.11 as the runtime environment for
the
  • the function.

Deliverables

  • The Azure Function is deployed and available in
the
  • the Dev environment.

Definition of Done

  • The Azure Function runs successfully in Dev
DevDevOps
Implement Data Ingestion from [Source] to Kafka

Objective

:

Implement ingestion pipeline to publish data from [SOURCE_NAME] to Kafka
For Full Load

Scope

:

  • Implement producer logic in Azure Function
  • Handle error scenarios

Deliverables

:

  • Data published successfully to Kafka topic and Fabric Application zone
  • Sample messages validated
  • Throughput validated

Definition of Done

:

  • Messages visible in Kafka
  • No data loss during retry
  • Error handling tested
DevData Engineer
Implement Delta/Incremetal Logic for [Source]

Objective

:

Implement logic to ingest only new or updated records from source.

Scope

:

  • Load Incremental data
  • Validate late-arriving data handling
  • Backfill support

Deliverables

:

  • Metadata storage created
  • Successful incremental test run

Definition of Done

:

  • No duplicate ingestion
  • Only new records processed
  • Metadata updated after each successful run
DevData Engineer
Setup Github Repo and Create CI/CD Pipeline

Objective

Set up an automated deployment pipeline for [SOURCE_NAME] so that code can be built, tested and deployed to Dev automatically, and to Prod with manual approval.

Scope

:

1.
  1. Create GitHub repository
  • Create a new GitHub repository for the [SOURCE_NAME] codebase.
2. Define branching strategy

  • Configure environment variables per environment
  • Ensure secrets are stored securely

• Use two main branches:
main – for production-ready code
dev – for development and testing
Document how and when code is merged between dev and main romoted to the Main/Prod branch is:
Successfully deployed to the Prod environment via the pipeline after passing the manual approval step.
• Pipeline status is visible in GitHub Actions, and basic run instructions are documented in the repository (e.g. in README.md).
Dev to ProdData EngineerDEV: Setup Monitoring & Data Validation for [Source]Objective:
Implement monitoring and validation checks for ingestion pipeline.
Scope:
• Enable Application Insights
• Create ingestion success/failure logs
• Implement row count validation
• Implement schema validation check
• Track ingestion duration
• Validate Kafka message count vs source count
Deliverables:
• Monitoring dashboard created
• Validation queries implemented
• Test failure scenario validated
Definition of Done:
• Metrics visible in monitoring tool
• Validation alerts triggered on failureDevData EngineerSet Up Alerting and LoggingObjective:
Implement automated alerting for ingestion pipeline failures and performance degradation.
Scope:
• Configure and validate alerts for the ingestion pipeline, including:
• Function execution failures
• Kafka publish failures
• Zero records ingested for a scheduled run
• Abnormally long execution time / SLA breach
• Integration of alerts with email and/or Microsoft Teams channels
• Definition and configuration of appropriate severity levels (e.g. Critical, High, Medium, Low)
Deliverables:
• Alerts tested
• Alert documentation created
Definition of Done:
• Failure simulation triggers alert
• Alert reaches responsible teamDevData EngineerDeploy to Production [Support engineer]Objective
Deploy the finalized Python code for the [SOURCE_NAME] system to the Production environment, following deployment and security best practices.
________________________________________
Description
This task covers preparing, validating, and deploying the final Python code for [SOURCE_NAME] into Production. It ensures that the code is production-ready, uses proper configuration and secrets management, and that deployment is done in a controlled and auditable way.
________________________________________
Scope
Setup Pull request and review it.
• Deployment readiness
o Confirm that the Python code has passed all required checks in lower environments (Dev / QA / UAT).
o Ensure all known defects for this release are either resolved or accepted.
• Configuration & secrets
o Verify that all Prod configuration (environment variables, connection strings, endpoints) is set correctly.
o Ensure all secrets (keys, passwords, tokens) are stored in a secure store (e.g. Key Vault, GitHub/Azure DevOps secrets) and not in code.
• Deployment process
o Use the approved CI/CD pipeline or standard deployment process to deploy the Python code to Production.
o Follow the agreed change management process (e.g. change ticket, approvals, CAB if required).
o Perform a controlled deployment (e.g. scheduled window, blue/green/canary if applicable).
• Post-deployment validation
o Run smoke tests or basic functional checks to confirm that the Python code runs correctly in Production.
o Verify logging and monitoring are working (logs, alerts, dashboards).
• Documentation & handover
o Update deployment notes / release documentation with:
 Deployed version / commit
 Deployment date and time
 Any known issue
s or follow-up items
o Inform relevant stakeholders that the deployment is complete.
________________________________________
Deliverables
• Python code for [SOURCE_NAME] successfully deployed to the Production environment.
• Updated configuration and secrets for Production stored in the approved secure store.
• Deployment / release notes documented in the project Confluence/SharePoint or release tracker.
________________________________________
Definition of Done
  • (e.g.
via pull requests).
3. Set up GitHub Actions pipeline
• Create a GitHub Actions workflow that runs on relevant events (e.g. pull request, push to dev or main).
4.
Implement pipeline steps
• The pipeline should include at least:
a. Code linting – run static code analysis / style checks.
b. Unit tests – run the automated unit test suite and fail the build if tests fail.
c. Build validation – build/pack the application to ensure it compiles/builds successfully.
d. Dev deployment – automatically deploy successful builds from dev (or a chosen branch) to the Dev environment.
e. Prod deployment with approval – deploy to the Prod environment only after a manual approval step (e.g. environment protection rule or manual approval job).
5.
Configure environment variables per environment
• Define and store configuration values separately for:
Dev environme
nt
Pr
od environment
Ensure secrets are stored securely (e.g. GitHub Secrets) and are correc
tly used by the pipeline.
Deliverables:
• A fully working CI/CD pipeline in GitHub Actions for [SOURCE_NAME].
• Automatic deployment to Dev on successful pipeline runs (as defined in the branching strategy).
• Manual approval-based deployment to Prod with a clear approval step and responsible approvers defined.
Definition of Done:
• A sample change
merged to the Dev branch is:
Automatically built, tested, and successfully deployed to the Dev envi
ronment via the pipeline.
• A sample change p
  • GitHub Secrets) and are correctly used by the pipeline.

Deliverables

  • A fully working CI/CD pipeline in GitHub Actions for [SOURCE_NAME].
  • Automatic deployment to Dev on successful pipeline runs (as defined in the branching strategy).
  • Manual approval-based deployment to Prod with a clear approval step and responsible approvers defined.

Definition of Done

  • Automatically built, tested, and successfully deployed to the environment via the pipeline.
  • Update the readme.




DevData Engineer
DEV: Setup Monitoring & Data Validation for [Source]

Objective

Implement monitoring and validation checks for ingestion pipeline.

Scope

  • Enable Application Insights
  • Create ingestion success/failure logs
  • Implement row count validation
  • Implement schema validation check
  • Track ingestion duration
  • Validate Kafka message count vs source count

Deliverables

  • Monitoring dashboard created
  • Validation queries implemented
  • Test failure scenario validated

Definition of Done

  • Metrics visible in monitoring tool
  • Validation alerts triggered on failure


Dev
Test
Prod
Data Engineer
Set Up Alerting and Logging

Objective

Implement automated alerting for ingestion pipeline failures and performance degradation.

Scope

  • Configure and validate alerts for the ingestion pipeline, including:
    • Function execution failures
    • Kafka publish failures
    • Zero records ingested for a scheduled run
    • Abnormally long execution time / SLA breach
  • Integration of alerts with email and/or Microsoft Teams channels
  • Definition and configuration of appropriate severity levels (e.g. Critical, High, Medium, Low)

Deliverables

  • Alerts tested
  • Alert documentation created

Definition of Done

  • Failure simulation triggers alert
  • Alert reaches responsible team
Dev
Test
Prod
Data Engineer
Deploy [Source] data to Production

Objective

Deploy the finalized Python code for the [SOURCE_NAME] system to the Production environment, following deployment and security best practices.

---------------------------------------------------------------------------

Description

This task covers preparing, validating, and deploying the final Python code for [SOURCE_NAME] into Production. It ensures that the code is production-ready, uses proper configuration and secrets management, and that deployment is done in a controlled and auditable way.

---------------------------------------------------------------------------

Scope

Setup Pull request and review it.

  • Deployment readiness
  • Confirm that the Python code has passed all required checks in lower environments (Dev / QA / UAT).
  • Ensure all known defects for this release are either resolved or accepted.
  • Configuration & secrets
  • Verify that all Prod configuration (environment variables, connection strings, endpoints) is set correctly.
  • Ensure all secrets (keys, passwords, tokens) are stored in a secure store (e.g. Key Vault, GitHub/Azure DevOps secrets) and not in code.
  • Deployment process
  • Use the approved CI/CD pipeline or standard deployment process to deploy the Python code to Production.
  • Follow the agreed change management process (e.g. change ticket, approvals, CAB if required).
  • Perform a controlled deployment (e.g. scheduled window, blue/green/canary if applicable).
  • Post-deployment validation
  • Run smoke tests or basic functional checks to confirm that the Python code runs correctly in Production.
  • Verify logging and monitoring are working (logs, alerts, dashboards).
  • Documentation & handover
  • Update deployment notes / release documentation with:
    • Deployed version / commit
    • Deployment date and time
    • Any known issues or follow-up items
  • Inform relevant stakeholders that the deployment is complete.

---------------------------------------------------------------------------

Deliverables

  • Python code for [SOURCE_NAME] successfully deployed to the Production environment.
  • Updated configuration and secrets for Production stored in the approved secure store.
  • Deployment / release notes documented in the project Confluence/SharePoint or release tracker.

---------------------------------------------------------------------------

Definition of Done

  • Deployment to Production completes without errors using the approved process.
  • Smoke tests in Production pass and the application behaves as expected.
  • No secrets are stored in source code or plain text; all are managed via the secure store.

• Build or configure a process to extract data from [SOURCE_NAME].
  • Deployment details are documented and communicated to stakeholders.
ProdSupport Engineer
Implement Security Policy
Dev
Test
Prod
Data Engineer
Ingest [SOURCE_NAME] Data to [INTERMEDIATE_LAYER_NAME] for Fabric Processing 

Objective

Extract data from [SOURCE_NAME] and store it in the [INTERMEDIATE_LAYER_NAME] (e.g. ADLS Gen2, Blob Storage, File Share).

This intermediate layer will be the trusted source for downstream loading into Microsoft Fabric.

Direct ingestion into Fabric is not feasible because of:
[Network restriction / API limitation / Security constraint / Performance limitation / Architectural decision].

________________________________________
Scope


---------------------------------------------------------------------------

Scope

  • Build or configure a process to extract data from [SOURCE_NAME].
  • Write the extracted data to the defined intermediate layer ([INTERMEDIATE_LAYER_NAME]).
  • Ensure the data in the intermediate layer is complete, consistent, and ready for ingestion into Microsoft Fabric.
________________________________________
Deliverables
• Data written to intermediate layer
o

---------------------------------------------------------------------------

Deliverables

  • Data written to intermediate layer
  • Data from [SOURCE_NAME] is successfully stored in [INTERMEDIATE_LAYER_NAME].
  • Folder / path structure created
o
  • Clear and consistent folder or path structure in the intermediate layer (e.g. /source_name/entity/date=YYYY-MM-DD/).
  • Incremental load logic implemented
o
  • Logic in place to load only new or changed data (e.g. based on timestamp, watermark, or change flag).
  • Metadata updated
o
  • Relevant metadata is captured and maintained (e.g. load time, source system, record counts, watermark value).
  • Monitoring configured
o

Monitoring and logging set up for:

  • Job status (success/failure)
  • Data volume checks
 Error handling
________________________________________
  • Error handling

---------------------------------------------------------------------------

Definition of Done (DoD)

  • Successful test execution in [ENV]
o
  • End-to-end run in the target environment ([ENV], e.g. Dev/UAT/Prod) completes successfully.
  • No duplicate files on re-run
o
  • Re-running the job does not create duplicate files or duplicate data in the intermediate layer.
  • Watermark updated correctly
o
  • Watermark or equivalent mechanism is updated after each run and used correctly for incremental loads.
  • Logs visible in monitoring system
o
  • Execution logs (success, failures, metrics) are visible in the agreed monitoring/logging tool.
  • Alert tested successfully
o
  • At least one failure/alert scenario has been tested and notifications are received by the right team.
  • Documentation updated
o
  • Technical documentation (process flow, paths, incremental logic, watermark strategy, monitoring) is updated in the central repository (e.g. Confluence/SharePoint).
  • Sign-off received from [Team/Owner]
o
  • Formal sign-off from [Team/Owner] confirming the solution meets requirements and is ready for use by downstream consumers.
Dev
Test
Prod
Data Engineer
Setup Intermidiate layer
Setup [INTERMEDIATE_LAYER_NAME]

Objective

Define and set up the [INTERMEDIATE_LAYER_NAME] (e.g. ADLS Gen2, Blob Storage, File Share) for data coming from [SOURCE], so that data can be loaded into Microsoft Fabric in a controlled and reliable way.

Direct ingestion into Fabric is not feasible due to:
[Network restriction / API limitation / Security constraint / Performance limitation / Architectural decision].

________________________________________

---------------------------------------------------------------------------

Scope of Work

1.
  1. Intermediate Layer Setup
  • Storage account
o
  • Validate that the storage account [storage_account_name] exists and is suitable for this use case,
o
  • Or create a new storage account [storage_account_name] if one does not exist.
  • Container
o
  • Create or validate the container [container_name] in the storage account to store data from [SOURCE].
  • Folder structure
o
  • Define and document the folder/path structure for organizing data (for example):
    • /[source]/[entity]/date=YYYY-MM-DD/
o
  • Ensure the structure supports incremental loads and downstream consumption by Fabric.
  • File naming convention
o
  • Define and document a standard naming pattern:
    • "[source]
_
    • [entity]
_
    • [YYYYMMDDHHMMSS].[format]"
o
  • Clarify how each part (source, entity, timestamp, format) will be populated.
  • Retention policy
o
  • Define and document retention rules for data stored in the intermediate layer:
    • Raw retention: [X days]
    • Archive retention: [X days]
________________________________________
Deliverables

---------------------------------------------------------------------------

Deliverables

  • Documented intermediate layer design (storage account, container, folder structure, naming convention, retention).
  • Storage account [storage_account_name] created/validated for use as the intermediate layer.
  • Container [container_name] created/validated in the storage account.
  • Agreed and documented folder structure and file naming convention.
  • Documented retention policy for raw and archived data.
________________________________________

---------------------------------------------------------------------------

Definition of Done

  • The storage account [storage_account_name] and container [container_name] are available and ready to use.
  • Folder structure and file naming convention are clearly documented and approved by the relevant team/owner.
  • Retention policy (raw and archive) is defined, documented, and agreed.
  • The intermediate layer design is stored in the central documentation location
(e.g. Confluence/SharePoint)
  • and referenced for future ingestion tasks.



Dev
Test
Prod
Data Engineer