Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

AI Providers

The pg_ai_query extension supports multiple AI providers, each with different models, capabilities, and pricing structures. This guide helps you choose the right provider and model for your use case.

Supported Providers

OpenAI

Overview: OpenAI provides the GPT family of models, known for strong natural language understanding and code generation capabilities.

API Endpoint: https://api.openai.com/v1/

Available Models: The extension supports any valid OpenAI model name. The following are common examples:

ModelDescriptionContext LengthBest For
gpt-4oLatest GPT-4 Omni model128,000 tokensComplex queries, best accuracy
gpt-4Standard GPT-4 model8,192 tokensHigh-quality SQL generation
gpt-3.5-turboFast and efficient model4,096 tokensSimple queries, fast responses

Anthropic

Overview: Anthropic provides Claude models, known for thoughtful reasoning and safety-conscious outputs.

API Endpoint: https://api.anthropic.com/v1/

Available Models: The extension supports any valid Anthropic model name. The following are common examples:

ModelDescriptionContext LengthBest For
claude-sonnet-4-5-20250929Latest Claude 3.5 Sonnet200,000 tokensComplex analysis, detailed reasoning

Google Gemini

Overview: Google provides the Gemini family of models, known for multimodal capabilities and strong reasoning.

Available Models: The extension supports any valid Gemini model name. The following are common examples:

ModelDescriptionContext LengthBest For
gemini-2.5-proLatest Gemini Pro model1,000,000 tokensComplex queries, best accuracy
gemini-2.5-flashFast and efficient model1,000,000 tokensQuick responses, cost-effective
gemini-2.0-flashPrevious generation flash1,000,000 tokensSimple queries, fast responses

Provider Selection

Automatic Provider Selection

When you use provider = 'auto' (the default), the extension selects a provider based on:

  1. Configuration Priority: First configured provider with a valid API key
  2. Fallback Logic: If primary provider fails, tries others
  3. Default Order: OpenAI → Anthropic → Gemini
-- Uses automatic provider selection
SELECT generate_query('show recent orders');

Explicit Provider Selection

You can specify which provider to use:

-- Force OpenAI
SELECT generate_query('show users', null, 'openai');

-- Force Anthropic
SELECT generate_query('show users', null, 'anthropic');

-- Force Gemini
SELECT generate_query('show users', null, 'gemini');

Configuration-Based Selection

Set your preferred provider in the configuration file:

[general]
# Provider priority is determined by order in config file

[openai]
api_key = "your-openai-key"
default_model = "gpt-4o"

[anthropic]
api_key = "your-anthropic-key"
default_model = "claude-sonnet-4-5-20250929"

[gemini]
api_key = "your-google-key"
default_model = "gemini-2.5-flash"

Provider Comparison

Performance Characteristics

AspectOpenAI GPT-4oOpenAI GPT-3.5Anthropic ClaudeGoogle Gemini
Response Time2-5 seconds1-2 seconds3-6 seconds1-3 seconds
AccuracyExcellentGoodExcellentExcellent
Complex QueriesExcellentFairExcellentVery Good
SQL KnowledgeExcellentGoodVery GoodVery Good
CostHighLowMediumLow-Medium

Use Case Recommendations

For Simple Queries

Recommended: OpenAI GPT-3.5-turbo

-- Simple data retrieval
SELECT generate_query('show all users', null, 'openai');
-- Configuration: default_model = "gpt-3.5-turbo"

Why: Fast, cost-effective, sufficient accuracy for basic queries.

For Complex Analytics

Recommended: OpenAI GPT-4o or Anthropic Claude

-- Complex business intelligence
SELECT generate_query(
    'calculate monthly customer retention rates with cohort analysis',
    null,
    'openai'
);
-- Configuration: default_model = "gpt-4o"

Why: Better reasoning for complex multi-table joins and advanced analytics.

For Data Exploration

Recommended: Anthropic Claude

-- Exploratory analysis
SELECT generate_query(
    'analyze sales trends and identify anomalies in the last quarter',
    null,
    'anthropic'
);

Why: Excellent at understanding context and generating insightful queries.

For Production Systems

Recommended: OpenAI GPT-4 (balanced performance/cost)

-- Production queries
SELECT generate_query('generate daily sales report', null, 'openai');
-- Configuration: default_model = "gpt-4"

Why: Good balance of accuracy, speed, and cost for production workloads.

For Cost-Effective Operations

Recommended: Google Gemini Flash

-- Cost-effective queries with good quality
SELECT generate_query('show recent user activity', null, 'gemini');
-- Configuration: default_model = "gemini-2.5-flash"

Why: Excellent cost-to-performance ratio, fast responses, large context window.

Model-Specific Configurations

OpenAI Configuration

[openai]
api_key = "sk-proj-your-api-key-here"
default_model = "gpt-4o"

# Optional: Model-specific settings (future feature)
# gpt_4o_temperature = 0.7
# gpt_4o_max_tokens = 16384

API Key Format: Must start with sk-proj- or sk-

Getting Started:

  1. Visit platform.openai.com
  2. Create an account and add billing information
  3. Generate an API key
  4. Add to configuration file

Anthropic Configuration

[anthropic]
api_key = "sk-ant-your-api-key-here"
default_model = "claude-sonnet-4-5-20250929"

# Optional: Model-specific settings (future feature)
# claude_temperature = 0.7
# claude_max_tokens = 8192

API Key Format: Must start with sk-ant-

Getting Started:

  1. Visit console.anthropic.com
  2. Create an account and add credits
  3. Generate an API key
  4. Add to configuration file

Google Gemini Configuration

[gemini]
api_key = "AIzaSy-your-api-key-here"
default_model = "gemini-2.5-flash"

# Optional: Model-specific settings (future feature)
# gemini_temperature = 0.7
# gemini_max_tokens = 8192

API Key Format: Must start with AIzaSy

Getting Started:

  1. Visit aistudio.google.com
  2. Create or sign in to your Google account
  3. Generate an API key
  4. Add to configuration file

Cost Considerations

OpenAI Pricing (Approximate)

ModelInput CostOutput CostTypical Query Cost
GPT-4o$2.50/1M tokens$10.00/1M tokens$0.02-0.05
GPT-4$30.00/1M tokens$60.00/1M tokens$0.10-0.20
GPT-3.5-turbo$0.50/1M tokens$1.50/1M tokens$0.001-0.005

Anthropic Pricing (Approximate)

ModelInput CostOutput CostTypical Query Cost
Claude 3.5 Sonnet$3.00/1M tokens$15.00/1M tokens$0.03-0.06

Google Gemini Pricing (Approximate)

ModelInput CostOutput CostTypical Query Cost
Gemini 2.5 Pro$1.25/1M tokens$5.00/1M tokens$0.01-0.03
Gemini 2.5 Flash$0.075/1M tokens$0.30/1M tokens$0.001-0.005
Gemini 2.0 Flash$0.10/1M tokens$0.40/1M tokens$0.001-0.005

Prices are approximate and change frequently. Check provider websites for current pricing.

Cost Optimization Strategies

  1. Use Appropriate Models

    -- For simple queries, use cheaper models
    SELECT generate_query('count users', null, 'openai');
    -- Set: default_model = "gpt-3.5-turbo"
    
  2. Efficient Query Descriptions

    -- Be specific to reduce back-and-forth
    SELECT generate_query('show users with orders in last 30 days including total order value');
    
  3. Batch Similar Queries

    -- Instead of multiple calls:
    -- SELECT generate_query('show users');
    -- SELECT generate_query('show user count');
    
    -- Use one comprehensive query:
    SELECT generate_query('show user list with total count');
    

Provider Reliability and Error Handling

Automatic Fallback

The extension includes automatic fallback logic:

-- If OpenAI fails, automatically tries Anthropic
SELECT generate_query('show data', null, 'auto');

Manual Retry with Different Provider

-- If one provider fails, try another
SELECT generate_query('complex query', null, 'openai');
-- If that fails:
SELECT generate_query('complex query', null, 'anthropic');

Error Handling Configuration

[general]
max_retries = 3           # Retry failed requests
request_timeout_ms = 30000 # Timeout for each request

Advanced Provider Features

Rate Limiting

Each provider has different rate limits:

OpenAI:

  • GPT-3.5: 3,500 requests/minute
  • GPT-4: 500 requests/minute (varies by tier)

Anthropic:

  • Claude: 1,000 requests/minute (varies by tier)

Google Gemini:

  • Gemini: 1,500 requests/minute (varies by tier)

Context Window Optimization

For large schemas, consider context window sizes:

-- Large schema? Use models with larger context windows
-- GPT-4o: 128k tokens
-- Claude 3.5: 200k tokens
-- Gemini 2.5: 1M tokens
SELECT generate_query('analyze relationships across all tables', null, 'gemini');

Regional Considerations

OpenAI: Global availability with regional data centers Anthropic: Primary US-based with expanding regional support Google Gemini: Global availability through Google Cloud infrastructure

Provider-Specific Best Practices

OpenAI Best Practices

  1. Model Selection:

    • Use GPT-3.5-turbo for development and testing
    • Use GPT-4 for production workloads
    • Use GPT-4o for complex analytical queries
  2. Prompt Engineering:

    • Be specific about desired output format
    • Include relevant business context
    • Specify PostgreSQL version if using advanced features

Anthropic Best Practices

  1. Leveraging Reasoning:

    • Claude excels at understanding business logic
    • Good for queries requiring domain expertise
    • Effective for complex multi-step analysis
  2. Safety Features:

    • Claude has built-in safety considerations
    • Good for production environments with compliance needs
    • Transparent about limitations and assumptions

Google Gemini Best Practices

  1. Leveraging Large Context:

    • Gemini excels with large context windows (up to 1M tokens)
    • Ideal for complex schemas with many tables
    • Good for batch processing multiple queries
  2. Cost Efficiency:

    • Gemini Flash models offer excellent price-performance
    • Great for high-volume, cost-sensitive applications
    • Consider Gemini for development and testing environments

Custom API Endpoints

The extension supports custom API endpoints, allowing you to use OpenAI-compatible or Anthropic-compatible APIs. This is useful for:

  • Local LLMs: Run models locally using Ollama, vLLM, or LocalAI
  • Privacy: Keep all data on-premises without sending to external APIs
  • Cost optimization: Use self-hosted models or alternative providers
  • Enterprise proxies: Route requests through custom gateways

Supported OpenAI-Compatible APIs

Any API that implements the OpenAI Chat Completions endpoint (/v1/chat/completions) can be used.

Important: The SDK automatically appends /v1/chat/completions to your base URL, so do not include /v1 in your endpoint configuration.

ProviderCorrect api_endpointNotes
Ollamahttp://localhost:11434Local model inference
vLLMhttp://localhost:8000High-performance inference
LiteLLMhttp://localhost:8000Unified API proxy
OpenRouterhttps://openrouter.ai/apiMultiple model providers
Together AIhttps://api.together.xyzHosted open-source models
Azure OpenAIhttps://YOUR_RESOURCE.openai.azure.comMicrosoft Azure hosting
LocalAIhttp://localhost:8080Local model inference

Configuration Examples

Using Ollama (Local)

[openai]
api_key = "ollama"  # Ollama doesn't require a real key
api_endpoint = "http://localhost:11434"
default_model = "llama3.2"  # Or any model you have pulled

Using LiteLLM Proxy

[openai]
api_key = "your-litellm-key"
api_endpoint = "http://localhost:8000"
default_model = "gpt-4o"  # LiteLLM handles routing

Using OpenRouter

[openai]
api_key = "sk-or-v1-..."  # Your OpenRouter API key
api_endpoint = "https://openrouter.ai/api"
default_model = "openai/gpt-4o"  # Use provider/model format

Important Considerations

  1. API Compatibility: The custom endpoint must implement the same API format as the original provider
  2. Model Names: Different providers may use different model naming conventions
  3. Authentication: Some local servers don’t require API keys; use a placeholder value
  4. Network Security: Local endpoints typically use HTTP; ensure your network is secure
  5. Performance: Local models may have different latency and quality characteristics

Future Provider Support

The extension is designed to easily support additional providers:

  • Azure OpenAI: Possible future integration
  • Local Models: Full support via custom endpoints (see above)

Monitoring Provider Performance

Enable Logging

[general]
enable_logging = true
log_level = "INFO"

Track Performance

-- Monitor which provider is being used
-- Check logs for provider selection and response times

Performance Metrics

Key metrics to monitor:

  • Response Time: How long queries take to generate
  • Success Rate: Percentage of successful generations
  • Cost: Monthly API usage costs
  • Quality: Accuracy of generated queries

Choose your AI provider based on your specific needs: cost sensitivity, query complexity, response time requirements, and accuracy needs. The extension’s flexible provider system allows you to optimize for your particular use case.