This module provides backward compatibility for legacy tests and systems, while internally leveraging a new modular architecture for data processing. It acts as a bridge, maintaining the old API while using improved, maintainable, and testable components.
Key Features:
src.config, src.utils, src.models, src.services, and src.processors.The module reads configuration from environment variables for backward compatibility:
| Variable Name | Description | Default Value |
|---|---|---|
GCP_PROJECT_ID | Google Cloud Project ID | 'your-project-id' |
GCS_BUCKET_NAME | GCS bucket containing Excel files | 'your-bucket-name' |
BQ_DATASET_ID | BigQuery dataset ID | 'your-dataset-id' |
BQ_TABLE_ID | BigQuery table ID | 'almina_data' |
CUSTOMER_ID | Customer identifier | 'almina_1' |
GOOGLE_APPLICATION_CREDENTIALS | Path to GCP credentials file | None |
Configures structured logging for both local and GCP environments.
Decorator for retrying functions with exponential backoff on failure.
Converts column headers to snake_case and ensures BigQuery compatibility.
Returns the explicit BigQuery schema as a list of SchemaField objects.
Maps incoming DataFrame columns to canonical schema names, tolerating minor variations.
Validates the DataFrame for structure, required columns, data types, and value ranges. Returns a validation report.
Processes an Excel file:
Authenticates with GCP using service account credentials and returns storage and BigQuery clients.
Ensures the BigQuery table exists with the correct schema, partitioning, and clustering.
Uploads the processed DataFrame to BigQuery, ensuring schema compliance.
Processes all .xlsx files in the configured GCS bucket and uploads them to BigQuery.
Entry point for GCP Cloud Function, triggered by file uploads to the bucket.
Main entry point for batch processing. Validates environment, processes all files, and handles errors.
process_xlsx_cloud_function for real-time processing on file upload.Deploy almina-extract-data as a GCP Cloud Function with appropriate triggers and permissions.
This documentation is AI-generated and should be reviewed for accuracy and completeness. Always validate the module’s behavior in your environment and consult your team for best practices.