Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Vertex AI User (roles/aiplatform.user)
    • Access Vertex AI services and models
  • ML Developer (roles/ml.developer)
    • Use ML models in BigQuery ML

Additional Roles

  • Service Usage Consumer (roles/serviceusage.serviceUsageConsumer)
    • Use enabled Google Cloud services

Assign Roles via Console

  1. Go to IAM & Admin > IAM
  2. Find your user account or service account
  3. Click "Edit principal"
  4. Add each required role listed above

...

CREATE OR REPLACE MODEL `your-project-id.TESTYOUR_GEMINIDATASET.gemini_remote_model`
REMOTE WITH CONNECTION `your-project-id.useurope-central1west1.vertex-ai-connection`
OPTIONS (
  ENDPOINT = 'gemini-12.50-flash'  -- or 'gemini-1.5-pro' for more advanced model
);

Test Model Connection

-- Test the model with a simple query
SELECT ml_generate_text_llm_result
FROM ML.GENERATE_TEXT(
  MODEL `your-project-id.TEST_GEMINI.gemini_remote_model`,
  (SELECT "Test connection to Gemini AI" AS prompt),
  STRUCT(TRUE AS flatten_json_output)
);

...

-- Update these variables in the script
DECLARE dataset_name STRING DEFAULT 'your-project-id.DM';
DECLARE output_dataset STRING DEFAULT 'TEST_GEMINI';
DECLARE model_name STRING DEFAULT 'TEST_GEMINI.gemini_remote_model';
DECLARE sample_size INT64 DEFAULT 5;

Upload Sample Data (Optional)

If you don't have existing tables, create some sample tables for testing:

-- Create a sample table for testing
CREATE OR REPLACE TABLE `your-project-id.DM.sample_customers` AS
SELECT 
  ROW_NUMBER() OVER() as customer_id,
  CONCAT("Customer_", CAST(ROW_NUMBER() OVER() AS STRING)) as customer_name,
  ["New York", "Los Angeles", "Chicago", "Houston", "Phoenix"][SAFE_ORDINAL(MOD(ROW_NUMBER() OVER(), 5) + 1)] as city,
  RAND() * 10000 as purchase_amount,
  DATE_ADD("2024-01-01", INTERVAL CAST(RAND() * 365 AS INT64) DAY) as purchase_date
FROM UNNEST(GENERATE_ARRAY(1, 100)) as n;

Step 7: Execute the Script

  1. Open BigQuery ConsoleNavigate to your project
  2. Paste the complete script in the query editor: https://gitlab.syensqo.com/syensqo-connected-research/toolkit/toolkit/-/blob/master/SQL/gemini_column_description.sql?ref_type=heads
  3. Update the configuration variables at the top
  4. Run the script

...

  1. "Permission denied" errors

    • Verify all required APIs are enabled
    • Check IAM roles are correctly assigned
    • Ensure connection service account has Vertex AI access
  2. "Model not found" errors

    • Verify the Vertex AI connection was created successfully
    • Check the model name matches exactly
    • Ensure the connection region matches your dataset location
  3. "No tables found" errors

    • Verify source dataset exists and contains tables
    • Check dataset names in the configuration variables
    • Ensure the service account has access to source datasets
  4. Quota exceeded errors

    • Check Vertex AI quotas in the Cloud Console
    • Consider reducing sample_size or processing fewer tables at once

Monitoring and Costs

  • Monitor usage: Go to Cloud Console > Vertex AI > Quotas to track API usage
  • Cost estimation: Gemini AI pricing varies by model and token usage
  • BigQuery costs: Query processing and storage costs apply

Security Best Practices

  1. Use least privilege: Grant only the minimum required permissions
  2. Service accounts: Consider using dedicated service accounts for production
  3. Data access: Implement row-level security for sensitive datasets
  4. Audit logging: Enable Cloud Audit Logs for compliance
  5. Network security: Use VPC Service Controls for additional isolation

Next Steps

Once installed, you can:

  • Schedule the script to run periodically using Cloud Scheduler
  • Integrate with data catalog tools for metadata management
  • Customize prompts for domain-specific descriptions
  • Extend to other Vertex AI models for different use cases

Support

Documentation

For issues specific to:

...