AI Providers

The pg_ai_query extension supports multiple AI providers, each with different models, capabilities, and pricing structures. This guide helps you choose the right provider and model for your use case.

Supported Providers

OpenAI

Overview: OpenAI provides the GPT family of models, known for strong natural language understanding and code generation capabilities.

API Endpoint: https://api.openai.com/v1/

Available Models: The extension supports any valid OpenAI model name. The following are common examples:

Model	Description	Context Length	Best For
`gpt-4o`	Latest GPT-4 Omni model	128,000 tokens	Complex queries, best accuracy
`gpt-4`	Standard GPT-4 model	8,192 tokens	High-quality SQL generation
`gpt-3.5-turbo`	Fast and efficient model	4,096 tokens	Simple queries, fast responses

Anthropic

Overview: Anthropic provides Claude models, known for thoughtful reasoning and safety-conscious outputs.

API Endpoint: https://api.anthropic.com/v1/

Available Models: The extension supports any valid Anthropic model name. The following are common examples:

Model	Description	Context Length	Best For
`claude-sonnet-4-5-20250929`	Latest Claude 3.5 Sonnet	200,000 tokens	Complex analysis, detailed reasoning

Google Gemini

Overview: Google provides the Gemini family of models, known for multimodal capabilities and strong reasoning.

Available Models: The extension supports any valid Gemini model name. The following are common examples:

Model	Description	Context Length	Best For
`gemini-2.5-pro`	Latest Gemini Pro model	1,000,000 tokens	Complex queries, best accuracy
`gemini-2.5-flash`	Fast and efficient model	1,000,000 tokens	Quick responses, cost-effective
`gemini-2.0-flash`	Previous generation flash	1,000,000 tokens	Simple queries, fast responses

Provider Selection

Automatic Provider Selection

When you use provider = 'auto' (the default), the extension selects a provider based on:

Configuration Priority: First configured provider with a valid API key
Fallback Logic: If primary provider fails, tries others
Default Order: OpenAI → Anthropic → Gemini

-- Uses automatic provider selection
SELECT generate_query('show recent orders');

Explicit Provider Selection

You can specify which provider to use:

-- Force OpenAI
SELECT generate_query('show users', null, 'openai');

-- Force Anthropic
SELECT generate_query('show users', null, 'anthropic');

-- Force Gemini
SELECT generate_query('show users', null, 'gemini');

Configuration-Based Selection

Set your preferred provider in the configuration file:

[general]
# Provider priority is determined by order in config file

[openai]
api_key = "your-openai-key"
default_model = "gpt-4o"

[anthropic]
api_key = "your-anthropic-key"
default_model = "claude-sonnet-4-5-20250929"

[gemini]
api_key = "your-google-key"
default_model = "gemini-2.5-flash"

Provider Comparison

Performance Characteristics

Aspect	OpenAI GPT-4o	OpenAI GPT-3.5	Anthropic Claude	Google Gemini
Response Time	2-5 seconds	1-2 seconds	3-6 seconds	1-3 seconds
Accuracy	Excellent	Good	Excellent	Excellent
Complex Queries	Excellent	Fair	Excellent	Very Good
SQL Knowledge	Excellent	Good	Very Good	Very Good
Cost	High	Low	Medium	Low-Medium

Use Case Recommendations

For Simple Queries

Recommended: OpenAI GPT-3.5-turbo

-- Simple data retrieval
SELECT generate_query('show all users', null, 'openai');
-- Configuration: default_model = "gpt-3.5-turbo"

Why: Fast, cost-effective, sufficient accuracy for basic queries.

For Complex Analytics

Recommended: OpenAI GPT-4o or Anthropic Claude

-- Complex business intelligence
SELECT generate_query(
    'calculate monthly customer retention rates with cohort analysis',
    null,
    'openai'
);
-- Configuration: default_model = "gpt-4o"

Why: Better reasoning for complex multi-table joins and advanced analytics.

For Data Exploration

Recommended: Anthropic Claude

-- Exploratory analysis
SELECT generate_query(
    'analyze sales trends and identify anomalies in the last quarter',
    null,
    'anthropic'
);

Why: Excellent at understanding context and generating insightful queries.

For Production Systems

Recommended: OpenAI GPT-4 (balanced performance/cost)

-- Production queries
SELECT generate_query('generate daily sales report', null, 'openai');
-- Configuration: default_model = "gpt-4"

Why: Good balance of accuracy, speed, and cost for production workloads.

For Cost-Effective Operations

Recommended: Google Gemini Flash

-- Cost-effective queries with good quality
SELECT generate_query('show recent user activity', null, 'gemini');
-- Configuration: default_model = "gemini-2.5-flash"

Why: Excellent cost-to-performance ratio, fast responses, large context window.

Model-Specific Configurations

OpenAI Configuration

[openai]
api_key = "sk-proj-your-api-key-here"
default_model = "gpt-4o"

# Optional: Model-specific settings (future feature)
# gpt_4o_temperature = 0.7
# gpt_4o_max_tokens = 16384

API Key Format: Must start with sk-proj- or sk-

Getting Started:

Visit platform.openai.com
Create an account and add billing information
Generate an API key
Add to configuration file

Anthropic Configuration

[anthropic]
api_key = "sk-ant-your-api-key-here"
default_model = "claude-sonnet-4-5-20250929"

# Optional: Model-specific settings (future feature)
# claude_temperature = 0.7
# claude_max_tokens = 8192

API Key Format: Must start with sk-ant-

Getting Started:

Visit console.anthropic.com
Create an account and add credits
Generate an API key
Add to configuration file

Google Gemini Configuration

[gemini]
api_key = "AIzaSy-your-api-key-here"
default_model = "gemini-2.5-flash"

# Optional: Model-specific settings (future feature)
# gemini_temperature = 0.7
# gemini_max_tokens = 8192

API Key Format: Must start with AIzaSy

Getting Started:

Visit aistudio.google.com
Create or sign in to your Google account
Generate an API key
Add to configuration file

Cost Considerations

OpenAI Pricing (Approximate)

Model	Input Cost	Output Cost	Typical Query Cost
GPT-4o	$2.50/1M tokens	$10.00/1M tokens	$0.02-0.05
GPT-4	$30.00/1M tokens	$60.00/1M tokens	$0.10-0.20
GPT-3.5-turbo	$0.50/1M tokens	$1.50/1M tokens	$0.001-0.005

Anthropic Pricing (Approximate)

Model	Input Cost	Output Cost	Typical Query Cost
Claude 3.5 Sonnet	$3.00/1M tokens	$15.00/1M tokens	$0.03-0.06

Google Gemini Pricing (Approximate)

Model	Input Cost	Output Cost	Typical Query Cost
Gemini 2.5 Pro	$1.25/1M tokens	$5.00/1M tokens	$0.01-0.03
Gemini 2.5 Flash	$0.075/1M tokens	$0.30/1M tokens	$0.001-0.005
Gemini 2.0 Flash	$0.10/1M tokens	$0.40/1M tokens	$0.001-0.005

Prices are approximate and change frequently. Check provider websites for current pricing.

Cost Optimization Strategies

Use Appropriate Models

-- For simple queries, use cheaper models
SELECT generate_query('count users', null, 'openai');
-- Set: default_model = "gpt-3.5-turbo"

Efficient Query Descriptions

-- Be specific to reduce back-and-forth
SELECT generate_query('show users with orders in last 30 days including total order value');

Batch Similar Queries

-- Instead of multiple calls:
-- SELECT generate_query('show users');
-- SELECT generate_query('show user count');

-- Use one comprehensive query:
SELECT generate_query('show user list with total count');

Provider Reliability and Error Handling

Automatic Fallback

The extension includes automatic fallback logic:

-- If OpenAI fails, automatically tries Anthropic
SELECT generate_query('show data', null, 'auto');

Manual Retry with Different Provider

-- If one provider fails, try another
SELECT generate_query('complex query', null, 'openai');
-- If that fails:
SELECT generate_query('complex query', null, 'anthropic');

Error Handling Configuration

[general]
max_retries = 3           # Retry failed requests
request_timeout_ms = 30000 # Timeout for each request

Advanced Provider Features

Rate Limiting

Each provider has different rate limits:

OpenAI:

GPT-3.5: 3,500 requests/minute
GPT-4: 500 requests/minute (varies by tier)

Anthropic:

Claude: 1,000 requests/minute (varies by tier)

Google Gemini:

Gemini: 1,500 requests/minute (varies by tier)

Context Window Optimization

For large schemas, consider context window sizes:

-- Large schema? Use models with larger context windows
-- GPT-4o: 128k tokens
-- Claude 3.5: 200k tokens
-- Gemini 2.5: 1M tokens
SELECT generate_query('analyze relationships across all tables', null, 'gemini');

Regional Considerations

OpenAI: Global availability with regional data centers Anthropic: Primary US-based with expanding regional support Google Gemini: Global availability through Google Cloud infrastructure

Provider-Specific Best Practices

OpenAI Best Practices

Model Selection:
- Use GPT-3.5-turbo for development and testing
- Use GPT-4 for production workloads
- Use GPT-4o for complex analytical queries
Prompt Engineering:
- Be specific about desired output format
- Include relevant business context
- Specify PostgreSQL version if using advanced features

Anthropic Best Practices

Leveraging Reasoning:
- Claude excels at understanding business logic
- Good for queries requiring domain expertise
- Effective for complex multi-step analysis
Safety Features:
- Claude has built-in safety considerations
- Good for production environments with compliance needs
- Transparent about limitations and assumptions

Google Gemini Best Practices

Leveraging Large Context:
- Gemini excels with large context windows (up to 1M tokens)
- Ideal for complex schemas with many tables
- Good for batch processing multiple queries
Cost Efficiency:
- Gemini Flash models offer excellent price-performance
- Great for high-volume, cost-sensitive applications
- Consider Gemini for development and testing environments

Custom API Endpoints

The extension supports custom API endpoints, allowing you to use OpenAI-compatible or Anthropic-compatible APIs. This is useful for:

Local LLMs: Run models locally using Ollama, vLLM, or LocalAI
Privacy: Keep all data on-premises without sending to external APIs
Cost optimization: Use self-hosted models or alternative providers
Enterprise proxies: Route requests through custom gateways

Supported OpenAI-Compatible APIs

Any API that implements the OpenAI Chat Completions endpoint (/v1/chat/completions) can be used.

Important: The SDK automatically appends /v1/chat/completions to your base URL, so do not include /v1 in your endpoint configuration.

Provider	Correct `api_endpoint`	Notes
Ollama	`http://localhost:11434`	Local model inference
vLLM	`http://localhost:8000`	High-performance inference
LiteLLM	`http://localhost:8000`	Unified API proxy
OpenRouter	`https://openrouter.ai/api`	Multiple model providers
Together AI	`https://api.together.xyz`	Hosted open-source models
Azure OpenAI	`https://YOUR_RESOURCE.openai.azure.com`	Microsoft Azure hosting
LocalAI	`http://localhost:8080`	Local model inference

Configuration Examples

Using Ollama (Local)

[openai]
api_key = "ollama"  # Ollama doesn't require a real key
api_endpoint = "http://localhost:11434"
default_model = "llama3.2"  # Or any model you have pulled

Using LiteLLM Proxy

[openai]
api_key = "your-litellm-key"
api_endpoint = "http://localhost:8000"
default_model = "gpt-4o"  # LiteLLM handles routing

Using OpenRouter

[openai]
api_key = "sk-or-v1-..."  # Your OpenRouter API key
api_endpoint = "https://openrouter.ai/api"
default_model = "openai/gpt-4o"  # Use provider/model format

Important Considerations

API Compatibility: The custom endpoint must implement the same API format as the original provider
Model Names: Different providers may use different model naming conventions
Authentication: Some local servers don’t require API keys; use a placeholder value
Network Security: Local endpoints typically use HTTP; ensure your network is secure
Performance: Local models may have different latency and quality characteristics

Future Provider Support

The extension is designed to easily support additional providers:

Azure OpenAI: Possible future integration
Local Models: Full support via custom endpoints (see above)

Monitoring Provider Performance

Enable Logging

[general]
enable_logging = true
log_level = "INFO"

Track Performance

-- Monitor which provider is being used
-- Check logs for provider selection and response times

Performance Metrics

Key metrics to monitor:

Response Time: How long queries take to generate
Success Rate: Percentage of successful generations
Cost: Monthly API usage costs
Quality: Accuracy of generated queries

Choose your AI provider based on your specific needs: cost sensitivity, query complexity, response time requirements, and accuracy needs. The extension’s flexible provider system allows you to optimize for your particular use case.

Keyboard shortcuts

PostgreSQL AI Query Extension