...
- Vertex AI User (
roles/aiplatform.user)- Access Vertex AI services and models
- ML Developer (
roles/ml.developer)- Use ML models in BigQuery ML
Additional Roles
- Service Usage Consumer (
roles/serviceusage.serviceUsageConsumer)- Use enabled Google Cloud services
Assign Roles via Console
- Go to IAM & Admin > IAM
- Find your user account or service account
- Click "Edit principal"
- Add each required role listed above
...
CREATE OR REPLACE MODEL `your-project-id.TESTYOUR_GEMINIDATASET.gemini_remote_model`
REMOTE WITH CONNECTION `your-project-id.useurope-central1west1.vertex-ai-connection`
OPTIONS (
ENDPOINT = 'gemini-12.50-flash' -- or 'gemini-1.5-pro' for more advanced model
);
Test Model Connection
-- Test the model with a simple query
SELECT ml_generate_text_llm_result
FROM ML.GENERATE_TEXT(
MODEL `your-project-id.TEST_GEMINI.gemini_remote_model`,
(SELECT "Test connection to Gemini AI" AS prompt),
STRUCT(TRUE AS flatten_json_output)
);
...
-- Update these variables in the script
DECLARE dataset_name STRING DEFAULT 'your-project-id.DM';
DECLARE output_dataset STRING DEFAULT 'TEST_GEMINI';
DECLARE model_name STRING DEFAULT 'TEST_GEMINI.gemini_remote_model';
DECLARE sample_size INT64 DEFAULT 5;
Upload Sample Data (Optional)
If you don't have existing tables, create some sample tables for testing:
-- Create a sample table for testing
CREATE OR REPLACE TABLE `your-project-id.DM.sample_customers` AS
SELECT
ROW_NUMBER() OVER() as customer_id,
CONCAT("Customer_", CAST(ROW_NUMBER() OVER() AS STRING)) as customer_name,
["New York", "Los Angeles", "Chicago", "Houston", "Phoenix"][SAFE_ORDINAL(MOD(ROW_NUMBER() OVER(), 5) + 1)] as city,
RAND() * 10000 as purchase_amount,
DATE_ADD("2024-01-01", INTERVAL CAST(RAND() * 365 AS INT64) DAY) as purchase_date
FROM UNNEST(GENERATE_ARRAY(1, 100)) as n;
Step 7: Execute the Script
- Open BigQuery ConsoleNavigate to your project
- Paste the complete script in the query editor: https://gitlab.syensqo.com/syensqo-connected-research/toolkit/toolkit/-/blob/master/SQL/gemini_column_description.sql?ref_type=heads
- Update the configuration variables at the top
- Run the script
...
"Permission denied" errors
- Verify all required APIs are enabled
- Check IAM roles are correctly assigned
- Ensure connection service account has Vertex AI access
"Model not found" errors
- Verify the Vertex AI connection was created successfully
- Check the model name matches exactly
- Ensure the connection region matches your dataset location
"No tables found" errors
- Verify source dataset exists and contains tables
- Check dataset names in the configuration variables
- Ensure the service account has access to source datasets
Quota exceeded errors
- Check Vertex AI quotas in the Cloud Console
- Consider reducing sample_size or processing fewer tables at once
Monitoring and Costs
- Monitor usage: Go to Cloud Console > Vertex AI > Quotas to track API usage
- Cost estimation: Gemini AI pricing varies by model and token usage
- BigQuery costs: Query processing and storage costs apply
Security Best Practices
- Use least privilege: Grant only the minimum required permissions
- Service accounts: Consider using dedicated service accounts for production
- Data access: Implement row-level security for sensitive datasets
- Audit logging: Enable Cloud Audit Logs for compliance
- Network security: Use VPC Service Controls for additional isolation
Next Steps
Once installed, you can:
- Schedule the script to run periodically using Cloud Scheduler
- Integrate with data catalog tools for metadata management
- Customize prompts for domain-specific descriptions
- Extend to other Vertex AI models for different use cases
Support
Documentation
For issues specific to:
- BigQuery: BigQuery Documentation
- Vertex AI: Vertex AI Documentation
- IAM: IAM Documentation
...