| Task Name | Description | Env | Responsibility |
| Obtain [SOURCE_NAME] Access & Technical Documentation | Objective Get all the access, credentials, and documentation needed so we can safely and reliably ingest data from the [SOURCE_NAME] system. Description This task covers everything required to connect to the [SOURCE_NAME] system from our environment. It includes setting up user/service access, collecting connection and authentication details, understanding the data structure, and documenting how often data will be updated. The outcome is that we can successfully test a connection from Dev and have all information stored securely and centrally. Scope
Collect API details:
Collect database details:
Collect SFTP details:
Obtain the required auth method and details, e.g.:
Clarify any token expiry, rotation, or renewal process.
Obtain data dictionary or schema documentation describing:
Confirm with the source owner:
--------------------------------------------------------------------------- Deliverables
A confirmation email/message from the [SOURCE_NAME] owner stating that:
--------------------------------------------------------------------------- Definition of Done
| Dev | Data Engineer |
| DEV: Network Whitelisting for [SOURCE_NAME] | Objective Ensure network connectivity from Dev environment Connects to [SOURCE_NAME]. Scope
Deliverables
Definition of Done
| Dev | DevOps |
| Collect [SOURCE_NAME] Schema and Table Metadata details | Objective Identify all source tables, fields, and any dependencies needed for [PROJECT], and document how they will be used for the purpose of [PROJECT PURPOSE]. Note - Cautious while working or check if the source will be obsolete after some, then in that case there might be some rework/effort Identify the complexity/Priority for data load Scope
--------------------------------------------------------------------------- Deliverables
--------------------------------------------------------------------------- Definition of Done
| Dev | |
| DEV: Create Azure Function – [SOURCE_NAME] | Objective Build an Azure Function in the Development environment to extract data from [SOURCE_NAME] for the [USE_CASE]. Scope
Deliverables
Definition of Done
| Dev | DevOps |
| Implement Data Ingestion from [Source] to Kafka | Objective Implement ingestion pipeline to publish data from [SOURCE_NAME] to Kafka Scope
Deliverables
Definition of Done
| Dev | Data Engineer |
| Implement Delta/Incremetal Logic for [Source] | Objective Implement logic to ingest only new or updated records from source. Scope
Deliverables
Definition of Done
| Dev | Data Engineer |
| Setup Github Repo and Create CI/CD Pipeline | Objective Set up an automated deployment pipeline for [SOURCE_NAME] so that code can be built, tested and deployed to Dev automatically, and to Prod with manual approval. Scope: 1. Create GitHub repository • Create a new GitHub repository for the [SOURCE_NAME] codebase. 2. Define branching strategy • Use two main branches: main – for production-ready code dev – for development and testing Document how and when code is merged between dev and main (e.g. via pull requests). 3. Set up GitHub Actions pipeline • Create a GitHub Actions workflow that runs on relevant events (e.g. pull request, push to dev or main). 4. Implement pipeline steps • The pipeline should include at least: a. Code linting – run static code analysis / style checks. b. Unit tests – run the automated unit test suite and fail the build if tests fail. c. Build validation – build/pack the application to ensure it compiles/builds successfully. d. Dev deployment – automatically deploy successful builds from dev (or a chosen branch) to the Dev environment. e. Prod deployment with approval – deploy to the Prod environment only after a manual approval step (e.g. environment protection rule or manual approval job). 5. Configure environment variables per environment • Define and store configuration values separately for: Dev environment Prod environment Ensure secrets are stored securely (e.g. GitHub Secrets) and are correctly used by the pipeline. Deliverables: • A fully working CI/CD pipeline in GitHub Actions for [SOURCE_NAME]. • Automatic deployment to Dev on successful pipeline runs (as defined in the branching strategy). • Manual approval-based deployment to Prod with a clear approval step and responsible approvers defined. Definition of Done: • A sample change merged to the Dev branch is: Automatically built, tested, and successfully deployed to the Dev environment via the pipeline. • A sample change promoted to the Main/Prod branch is: Successfully deployed to the Prod environment via the pipeline after passing the manual approval step. • Pipeline status is visible in GitHub Actions, and basic run instructions are documented in the repository (e.g. in README.md). | Dev to Prod | Data Engineer |
| DEV: Setup Monitoring & Data Validation for [Source] | Objective Implement monitoring and validation checks for ingestion pipeline. Scope
Deliverables
Definition of Done
| Dev | Data Engineer |
| Set Up Alerting and Logging | Objective Implement automated alerting for ingestion pipeline failures and performance degradation. Scope
Deliverables
Definition of Done
| Dev | Data Engineer |
| Deploy [Source] data to Production | Objective Deploy the finalized Python code for the [SOURCE_NAME] system to the Production environment, following deployment and security best practices. --------------------------------------------------------------------------- Description This task covers preparing, validating, and deploying the final Python code for [SOURCE_NAME] into Production. It ensures that the code is production-ready, uses proper configuration and secrets management, and that deployment is done in a controlled and auditable way. --------------------------------------------------------------------------- Scope Setup Pull request and review it.
--------------------------------------------------------------------------- Deliverables
--------------------------------------------------------------------------- Definition of Done
| Prod | Support Engineer |
| Implement Security Policy | Prod | Data Engineer | |
| Ingest [SOURCE_NAME] Data to [INTERMEDIATE_LAYER_NAME] for Fabric Processing | Objective Extract data from [SOURCE_NAME] and store it in the [INTERMEDIATE_LAYER_NAME] (e.g. ADLS Gen2, Blob Storage, File Share). This intermediate layer will be the trusted source for downstream loading into Microsoft Fabric. Direct ingestion into Fabric is not feasible because of: [Network restriction / API limitation / Security constraint / Performance limitation / Architectural decision]. ________________________________________ Scope • Build or configure a process to extract data from [SOURCE_NAME]. • Write the extracted data to the defined intermediate layer ([INTERMEDIATE_LAYER_NAME]). • Ensure the data in the intermediate layer is complete, consistent, and ready for ingestion into Microsoft Fabric. ________________________________________ Deliverables • Data written to intermediate layer o Data from [SOURCE_NAME] is successfully stored in [INTERMEDIATE_LAYER_NAME]. • Folder / path structure created o Clear and consistent folder or path structure in the intermediate layer (e.g. /source_name/entity/date=YYYY-MM-DD/). • Incremental load logic implemented o Logic in place to load only new or changed data (e.g. based on timestamp, watermark, or change flag). • Metadata updated o Relevant metadata is captured and maintained (e.g. load time, source system, record counts, watermark value). • Monitoring configured o Monitoring and logging set up for: Job status (success/failure) Data volume checks Error handling ________________________________________ Definition of Done (DoD) • Successful test execution in [ENV] o End-to-end run in the target environment ([ENV], e.g. Dev/UAT/Prod) completes successfully. • No duplicate files on re-run o Re-running the job does not create duplicate files or duplicate data in the intermediate layer. • Watermark updated correctly o Watermark or equivalent mechanism is updated after each run and used correctly for incremental loads. • Logs visible in monitoring system o Execution logs (success, failures, metrics) are visible in the agreed monitoring/logging tool. • Alert tested successfully o At least one failure/alert scenario has been tested and notifications are received by the right team. • Documentation updated o Technical documentation (process flow, paths, incremental logic, watermark strategy, monitoring) is updated in the central repository (e.g. Confluence/SharePoint). • Sign-off received from [Team/Owner] o Formal sign-off from [Team/Owner] confirming the solution meets requirements and is ready for use by downstream consumers. | Dev | Data Engineer |
| Setup Intermidiate layer | Objective Define and set up the [INTERMEDIATE_LAYER_NAME] (e.g. ADLS Gen2, Blob Storage, File Share) for data coming from [SOURCE], so that data can be loaded into Microsoft Fabric in a controlled and reliable way. Direct ingestion into Fabric is not feasible due to: [Network restriction / API limitation / Security constraint / Performance limitation / Architectural decision]. ________________________________________ Scope of Work 1. Intermediate Layer Setup • Storage account o Validate that the storage account [storage_account_name] exists and is suitable for this use case, o Or create a new storage account [storage_account_name] if one does not exist. • Container o Create or validate the container [container_name] in the storage account to store data from [SOURCE]. • Folder structure o Define and document the folder/path structure for organizing data (for example): /[source]/[entity]/date=YYYY-MM-DD/ o Ensure the structure supports incremental loads and downstream consumption by Fabric. • File naming convention o Define and document a standard naming pattern: "[source]_[entity]_[YYYYMMDDHHMMSS].[format]" o Clarify how each part (source, entity, timestamp, format) will be populated. • Retention policy o Define and document retention rules for data stored in the intermediate layer: Raw retention: [X days] Archive retention: [X days] ________________________________________ Deliverables • Documented intermediate layer design (storage account, container, folder structure, naming convention, retention). • Storage account [storage_account_name] created/validated for use as the intermediate layer. • Container [container_name] created/validated in the storage account. • Agreed and documented folder structure and file naming convention. • Documented retention policy for raw and archived data. ________________________________________ Definition of Done • The storage account [storage_account_name] and container [container_name] are available and ready to use. • Folder structure and file naming convention are clearly documented and approved by the relevant team/owner. • Retention policy (raw and archive) is defined, documented, and agreed. • The intermediate layer design is stored in the central documentation location (e.g. Confluence/SharePoint) and referenced for future ingestion tasks. | Dev | Data Engineer |
Overview
Content Tools
Tasks