Overview

This guide walks you through setting up the Gemini AI Column Description Generator in a new Google Cloud Platform (GCP) project. This tool automatically generates descriptions for database columns using Google's Gemini AI model integrated with BigQuery ML.

Step 1: Enable Required APIs

The following APIs need to be activated

  1. BigQuery API - For data processing and ML operations
  2. Vertex AI API - For Gemini AI model access gcloud services enable aiplatform.googleapis.com
  3. Cloud Resource Manager API - For project management

Step 2: Set Up IAM Roles and Permissions

Required Roles for Script Execution

The service account or user running the script needs the following roles:

BigQuery Roles

Vertex AI Roles

Assign Roles via Console

  1. Go to IAM & Admin > IAM
  2. Find your user account or service account
  3. Click "Edit principal"
  4. Add each required role listed above

Assign Roles via gcloud CLI

# Replace [USER_EMAIL] with the actual email
export USER_EMAIL="your-email@domain.com"
export PROJECT_ID="your-project-id"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="user:$USER_EMAIL" \
    --role="roles/bigquery.dataEditor"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="user:$USER_EMAIL" \
    --role="roles/bigquery.jobUser"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="user:$USER_EMAIL" \
    --role="roles/bigquery.connectionUser"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="user:$USER_EMAIL" \
    --role="roles/aiplatform.user"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="user:$USER_EMAIL" \
    --role="roles/ml.developer"

Step 3: Create BigQuery Datasets

Create Source Dataset

Create a dataset to hold your source tables (if not already existing):

CREATE SCHEMA IF NOT EXISTS `your-project-id.DM`
OPTIONS (
  description = "Data Mart - Source tables for analysis",
  location = "US"  -- Choose appropriate location
);

Create Output Dataset

Create a dataset for storing the AI-generated descriptions:

CREATE SCHEMA IF NOT EXISTS `your-project-id.TEST_GEMINI`
OPTIONS (
  description = "Gemini AI generated column descriptions",
  location = "US"  -- Must match source dataset location
);

Step 4: Set Up Vertex AI Connection

Create Vertex AI Connection in BigQuery (documentation)

  1. Via BigQuery Console:


Get Connection Service Account

After creating the connection, note the service account email that's automatically created. You'll need to grant it permissions.

-- Get connection details
SELECT * FROM `your-project-id.us-central1.INFORMATION_SCHEMA.CONNECTIONS`
WHERE connection_name = 'vertex-ai-connection';

Grant Vertex AI Access to Connection Service Account

The connection service account needs access to Vertex AI:

# Get the service account email from the connection details above
export CONNECTION_SA="service-account-email@gcp-sa-bigquery-condel.iam.gserviceaccount.com"

gcloud projects add-iam-policy-binding $PROJECT_ID \
    --member="serviceAccount:$CONNECTION_SA" \
    --role="roles/aiplatform.user"

Step 5: Create Gemini AI Model in BigQuery

Create Remote Model

CREATE OR REPLACE MODEL `your-project-id.YOUR_DATASET.gemini_remote_model`
REMOTE WITH CONNECTION `your-project-id.europe-west1.vertex-ai-connection`
OPTIONS (
  ENDPOINT = 'gemini-2.0-flash'
);

Test Model Connection

-- Test the model with a simple query
SELECT ml_generate_text_llm_result
FROM ML.GENERATE_TEXT(
  MODEL `your-project-id.TEST_GEMINI.gemini_remote_model`,
  (SELECT "Test connection to Gemini AI" AS prompt),
  STRUCT(TRUE AS flatten_json_output)
);

Step 6: Configure and Deploy the Script

Update Configuration Variables

In the main script, update these variables with your project details:

-- Update these variables in the script
DECLARE dataset_name STRING DEFAULT 'your-project-id.DM';
DECLARE output_dataset STRING DEFAULT 'TEST_GEMINI';
DECLARE model_name STRING DEFAULT 'TEST_GEMINI.gemini_remote_model';
DECLARE sample_size INT64 DEFAULT 5;

Step 7: Execute the Script

  1. Open BigQuery Console
  2. Navigate to your project
  3. Paste the complete script in the query editor: https://gitlab.syensqo.com/syensqo-connected-research/toolkit/toolkit/-/blob/master/SQL/gemini_column_description.sql?ref_type=heads
  4. Update the configuration variables at the top
  5. Run the script

Step 8: Verify Installation

Check Results

-- Verify the output table was created
SELECT 
  table_name,
  COUNT(*) as column_count
FROM `your-project-id.TEST_GEMINI.column_descriptions_with_ai`
GROUP BY table_name;

-- View sample results
SELECT * FROM `your-project-id.TEST_GEMINI.column_descriptions_with_ai` LIMIT 5;

Troubleshooting

Common Issues

  1. "Permission denied" errors

  2. "Model not found" errors

  3. "No tables found" errors

  4. Quota exceeded errors

Monitoring and Costs

Security Best Practices

  1. Use least privilege: Grant only the minimum required permissions
  2. Service accounts: Consider using dedicated service accounts for production
  3. Data access: Implement row-level security for sensitive datasets
  4. Audit logging: Enable Cloud Audit Logs for compliance
  5. Network security: Use VPC Service Controls for additional isolation

Next Steps

Once installed, you can:

Support

For issues specific to: