Overview
This guide walks you through setting up the Gemini AI Column Description Generator in a new Google Cloud Platform (GCP) project. This tool automatically generates descriptions for database columns using Google's Gemini AI model integrated with BigQuery ML.
Step 1: Enable Required APIs
The following APIs need to be activated
- BigQuery API - For data processing and ML operations
- Vertex AI API - For Gemini AI model access gcloud services enable aiplatform.googleapis.com
- Cloud Resource Manager API - For project management
Step 2: Set Up IAM Roles and Permissions
Required Roles for Script Execution
The service account or user running the script needs the following roles:
BigQuery Roles
- BigQuery Data Editor (
roles/bigquery.dataEditor)- Create and modify datasets and tables
- Execute queries and manage temporary tables
- BigQuery Job User (
roles/bigquery.jobUser)- Run BigQuery jobs and queries
- BigQuery Connection User (
roles/bigquery.connectionUser)- Use BigQuery connections for ML models
Vertex AI Roles
- Vertex AI User (
roles/aiplatform.user)- Access Vertex AI services and models
- ML Developer (
roles/ml.developer)- Use ML models in BigQuery ML
Assign Roles via Console
- Go to IAM & Admin > IAM
- Find your user account or service account
- Click "Edit principal"
- Add each required role listed above
Assign Roles via gcloud CLI
# Replace [USER_EMAIL] with the actual email
export USER_EMAIL="your-email@domain.com"
export PROJECT_ID="your-project-id"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="user:$USER_EMAIL" \
--role="roles/bigquery.dataEditor"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="user:$USER_EMAIL" \
--role="roles/bigquery.jobUser"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="user:$USER_EMAIL" \
--role="roles/bigquery.connectionUser"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="user:$USER_EMAIL" \
--role="roles/aiplatform.user"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="user:$USER_EMAIL" \
--role="roles/ml.developer"
Step 3: Create BigQuery Datasets
Create Source Dataset
Create a dataset to hold your source tables (if not already existing):
CREATE SCHEMA IF NOT EXISTS `your-project-id.DM`
OPTIONS (
description = "Data Mart - Source tables for analysis",
location = "US" -- Choose appropriate location
);
Create Output Dataset
Create a dataset for storing the AI-generated descriptions:
CREATE SCHEMA IF NOT EXISTS `your-project-id.TEST_GEMINI`
OPTIONS (
description = "Gemini AI generated column descriptions",
location = "US" -- Must match source dataset location
);
Step 4: Set Up Vertex AI Connection
Create Vertex AI Connection in BigQuery (documentation)
Via BigQuery Console:
- Go to BigQuery in Google Cloud Console
Click + Add data, and then use the search bar for data sources to search for Vertex AI. Click on the result for Vertex AI then click Vertex AI Models: BigQuery Federation.
Select Connection type as
Vertex AI remote models, remote functions and BigLake (Cloud Resource)if not already selected and set Connection ID tovertex-ai-connection.
Get Connection Service Account
After creating the connection, note the service account email that's automatically created. You'll need to grant it permissions.
-- Get connection details
SELECT * FROM `your-project-id.us-central1.INFORMATION_SCHEMA.CONNECTIONS`
WHERE connection_name = 'vertex-ai-connection';
Grant Vertex AI Access to Connection Service Account
The connection service account needs access to Vertex AI:
# Get the service account email from the connection details above
export CONNECTION_SA="service-account-email@gcp-sa-bigquery-condel.iam.gserviceaccount.com"
gcloud projects add-iam-policy-binding $PROJECT_ID \
--member="serviceAccount:$CONNECTION_SA" \
--role="roles/aiplatform.user"
Step 5: Create Gemini AI Model in BigQuery
Create Remote Model
CREATE OR REPLACE MODEL `your-project-id.YOUR_DATASET.gemini_remote_model`
REMOTE WITH CONNECTION `your-project-id.europe-west1.vertex-ai-connection`
OPTIONS (
ENDPOINT = 'gemini-2.0-flash'
);
Test Model Connection
-- Test the model with a simple query
SELECT ml_generate_text_llm_result
FROM ML.GENERATE_TEXT(
MODEL `your-project-id.TEST_GEMINI.gemini_remote_model`,
(SELECT "Test connection to Gemini AI" AS prompt),
STRUCT(TRUE AS flatten_json_output)
);
Step 6: Configure and Deploy the Script
Update Configuration Variables
In the main script, update these variables with your project details:
-- Update these variables in the script
DECLARE dataset_name STRING DEFAULT 'your-project-id.DM';
DECLARE output_dataset STRING DEFAULT 'TEST_GEMINI';
DECLARE model_name STRING DEFAULT 'TEST_GEMINI.gemini_remote_model';
DECLARE sample_size INT64 DEFAULT 5;
Step 7: Execute the Script
- Open BigQuery Console
- Paste the complete script in the query editor: https://gitlab.syensqo.com/syensqo-connected-research/toolkit/toolkit/-/blob/master/SQL/gemini_column_description.sql?ref_type=heads
- Update the configuration variables at the top
- Run the script
Step 8: Verify Installation
Check Results
-- Verify the output table was created
SELECT
table_name,
COUNT(*) as column_count
FROM `your-project-id.TEST_GEMINI.column_descriptions_with_ai`
GROUP BY table_name;
-- View sample results
SELECT * FROM `your-project-id.TEST_GEMINI.column_descriptions_with_ai` LIMIT 5;
Troubleshooting
Common Issues
"Permission denied" errors
- Verify all required APIs are enabled
- Check IAM roles are correctly assigned
- Ensure connection service account has Vertex AI access
"Model not found" errors
- Verify the Vertex AI connection was created successfully
- Check the model name matches exactly
- Ensure the connection region matches your dataset location
"No tables found" errors
- Verify source dataset exists and contains tables
- Check dataset names in the configuration variables
- Ensure the service account has access to source datasets
Quota exceeded errors
- Check Vertex AI quotas in the Cloud Console
- Consider reducing sample_size or processing fewer tables at once
Documentation
For issues specific to: