PostgreSQL AI Query Extension
The PostgreSQL AI Query Extension (pg_ai_query) is a powerful PostgreSQL extension that allows you to generate SQL queries from natural language descriptions using state-of-the-art AI models from OpenAI and Anthropic.
What is pg_ai_query?
pg_ai_query bridges the gap between natural language and SQL by leveraging large language models to understand your intent and automatically generate optimized PostgreSQL queries. The extension integrates directly into your PostgreSQL database, providing a seamless way to query your data using plain English.
Key Features
- Natural Language to SQL: Convert plain English descriptions into valid PostgreSQL queries
- AI-Powered Query Analysis: Analyze query performance with EXPLAIN ANALYZE and get optimization insights
- Automatic Schema Discovery: The extension automatically analyzes your database schema to understand table structures, relationships, and constraints
- Multiple AI Providers: Support for both OpenAI (GPT-4, GPT-3.5) and Anthropic (Claude) models
- Intelligent Query Generation: Generates optimized queries with appropriate JOINs, WHERE clauses, and LIMIT constraints
- Performance Optimization: Get AI-powered recommendations for query improvements and index suggestions
- Safety First: Built-in protections against dangerous operations and unauthorized access to system tables
- Configurable: Flexible configuration system with support for API keys, model selection, and logging
- PostgreSQL Native: Runs directly within PostgreSQL as a native extension
How It Works
- Schema Analysis: The extension automatically discovers and analyzes your database schema
- Natural Language Processing: Your natural language query is processed and understood by AI models
- SQL Generation: The AI generates an appropriate SQL query based on your schema and request
- Query Validation: The generated query is validated for safety and correctness
- Execution Ready: You receive a ready-to-execute SQL query
Example Usage
Query Generation
-- Simple query
SELECT generate_query('show me all users created in the last 7 days');
-- With custom API key
SELECT generate_query('count orders by status', 'your-api-key-here');
-- With specific provider
SELECT generate_query('find top 10 customers by revenue', 'your-api-key', 'openai');
Query Performance Analysis
-- Analyze query performance
SELECT explain_query('SELECT * FROM users WHERE active = true');
-- Get optimization suggestions for complex queries
SELECT explain_query('
SELECT u.username, COUNT(o.id) as orders
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.username
ORDER BY orders DESC
');
-- Combine generation and analysis
WITH generated AS (
SELECT generate_query('find high-value recent orders') as query
)
SELECT explain_query((SELECT query FROM generated));
Supported AI Models
OpenAI Models
- GPT-4o: Latest and most capable model
- GPT-4: High-quality reasoning and code generation
- GPT-3.5 Turbo: Fast and efficient for simpler queries
Anthropic Models
- Claude 3.5 Sonnet: Advanced reasoning and natural language understanding
Use Cases
- Data Exploration: Quickly explore your data without writing complex SQL
- Query Optimization: Analyze and optimize slow-performing queries with AI insights
- Performance Monitoring: Regular analysis of critical queries for performance regression
- Business Intelligence: Generate reports and analytics queries from natural descriptions
- Learning SQL: Understand how natural language translates to SQL syntax and learn optimization techniques
- Index Planning: Get AI-powered recommendations for database index strategies
- Rapid Prototyping: Quickly generate queries for testing and development
- Documentation: Generate example queries for database documentation
Architecture
The extension consists of several key components:
- Query Generator: Core engine that processes natural language and generates SQL
- Query Analyzer: Performance analysis engine using EXPLAIN ANALYZE and AI insights
- Schema Discovery: Automatically analyzes database structure and relationships
- AI Provider Integration: Handles communication with OpenAI and Anthropic APIs
- Configuration Manager: Manages settings, API keys, and model configurations
- Safety Validator: Ensures generated queries are safe and authorized
Security and Safety
pg_ai_query is designed with security in mind:
- No System Access: Cannot access system catalogs or sensitive PostgreSQL internals
- User Table Focus: Only operates on user-created tables and data
- Query Validation: All generated queries are validated before return
- Configurable Limits: Built-in LIMIT enforcement to prevent large result sets
- API Key Protection: Secure handling of API credentials
Getting Started
Ready to start generating SQL queries from natural language? Head over to the Installation guide to get started, or check out our Quick Start tutorial for a hands-on introduction.
Repository
The source code is available on GitHub. Feel free to contribute, report issues, or explore the implementation.
Installation
This guide will walk you through installing the PostgreSQL AI Query extension on your system.
Prerequisites
Before installing pg_ai_query, ensure you have the following:
System Requirements
- PostgreSQL 12+: The extension requires PostgreSQL version 12 or later
- Operating System: Linux, macOS, or Windows (with proper development tools)
- C++ Compiler: GCC 8+ or Clang 10+ with C++20 support
- CMake: Version 3.15 or later for building
Development Dependencies
- PostgreSQL development headers (
postgresql-develon RHEL/CentOS,postgresql-server-dev-allon Ubuntu/Debian) - OpenSSL development libraries
- Git (for cloning the repository)
AI Provider Requirements
You’ll need API access to at least one of the following:
- OpenAI API: Create an account at platform.openai.com
- Anthropic API: Create an account at console.anthropic.com
Installation Methods
Method 1: Build from Source (Recommended)
1. Clone the Repository
git clone https://github.com/benodiwal/pg_ai_query.git
cd pg_ai_query
2. Initialize Submodules
git submodule update --init --recursive
3. Build the Extension
make clean
make
4. Install the Extension
# Install to default PostgreSQL location
sudo make install
# Or specify a custom PostgreSQL installation
PG_CONFIG=/path/to/pg_config make install
Method 2: Package Installation (Future)
Package installations will be available in future releases for major Linux distributions and package managers.
Platform-Specific Instructions
Ubuntu/Debian
# Install dependencies
sudo apt-get update
sudo apt-get install postgresql-server-dev-all cmake gcc g++ libssl-dev git
# Clone and build
git clone https://github.com/benodiwal/pg_ai_query.git
cd pg_ai_query
git submodule update --init --recursive
make && sudo make install
CentOS/RHEL/Fedora
# Install dependencies
sudo yum install postgresql-devel cmake gcc-c++ openssl-devel git
# OR on newer systems:
sudo dnf install postgresql-devel cmake gcc-c++ openssl-devel git
# Clone and build
git clone https://github.com/benodiwal/pg_ai_query.git
cd pg_ai_query
git submodule update --init --recursive
make && sudo make install
macOS
# Install dependencies (with Homebrew)
brew install postgresql cmake openssl git
# Clone and build
git clone https://github.com/benodiwal/pg_ai_query.git
cd pg_ai_query
git submodule update --init --recursive
make && sudo make install
Windows
For Windows installation, you’ll need:
- Visual Studio 2019 or later with C++ tools
- PostgreSQL installed from EDB or compiled from source
- CMake for Windows
Detailed Windows installation instructions will be added in a future update.
Verification
After installation, verify that the extension is properly installed:
1. Connect to PostgreSQL
psql -d your_database
2. Create the Extension
CREATE EXTENSION IF NOT EXISTS pg_ai_query;
3. Test the Installation
-- Check if functions are available
\\df generate_query
-- Test with a simple query (will fail without API key, which is expected)
SELECT generate_query('show me all tables');
You should see the function listed and get an error about missing API configuration (which is normal - we’ll configure that next).
Troubleshooting Installation
Common Issues
PostgreSQL Development Headers Missing
Error: postgres.h: No such file or directory
Solution: Install PostgreSQL development packages:
# Ubuntu/Debian
sudo apt-get install postgresql-server-dev-all
# CentOS/RHEL
sudo yum install postgresql-devel
CMake Not Found
Error: cmake: command not found
Solution: Install CMake:
# Ubuntu/Debian
sudo apt-get install cmake
# macOS
brew install cmake
# CentOS/RHEL
sudo yum install cmake
Compilation Errors
Error: Various C++ compilation errors
Solutions:
- Ensure you have a C++20-compatible compiler
- Check that all submodules are initialized:
git submodule update --init --recursive - Clean and rebuild:
make clean && make
Permission Denied During Installation
Error: Permission denied when running make install
Solution: Use sudo for installation:
sudo make install
Wrong PostgreSQL Installation
Error: Extension installs to wrong PostgreSQL version
Solution: Specify the correct pg_config:
PG_CONFIG=/usr/pgsql-14/bin/pg_config make install
Getting Help
If you encounter issues not covered here:
- Check the Troubleshooting guide
- Review the FAQ for common questions
- Open an issue on the GitHub repository
Next Steps
Once installation is complete:
- Configure the extension with your API keys
- Follow the Quick Start Guide to generate your first queries
- Explore Usage Examples for inspiration
Configuration
The pg_ai_query extension supports flexible configuration through a configuration file that allows you to customize AI providers, API keys, logging, and query behavior.
Configuration File Location
The configuration file should be placed at:
~/.pg_ai.config
This file will be automatically detected and loaded when the extension is first used.
Configuration File Format
The configuration uses an INI-style format with sections and key-value pairs:
# PG AI Query Configuration File
[section_name]
key = value
Complete Configuration Example
Here’s a complete example configuration file with all available options:
# PG AI Query Configuration File
# Place this file at ~/.pg_ai.config
[general]
# Logging level: DEBUG, INFO, WARNING, ERROR
log_level = "INFO"
# Enable or disable all logging output (true/false)
enable_logging = false
# Request timeout in milliseconds
request_timeout_ms = 30000
# Maximum number of retries for failed requests
max_retries = 3
[query]
# Always enforce LIMIT clause on SELECT queries
enforce_limit = true
# Default LIMIT value when not specified by user
default_limit = 1000
[response]
# Show detailed explanation of what the query does
show_explanation = true
# Show warnings about performance, security, or data implications
show_warnings = true
# Show suggested visualization type for query results
show_suggested_visualization = true
# Use formatted response (JSON format) instead of plain SQL
# When enabled, returns structured JSON with query, explanation, warnings, etc.
# When disabled, returns plain SQL with optional comments
use_formatted_response = false
[openai]
# Your OpenAI API key
api_key = "sk-your-openai-api-key-here"
# Default model to use (options: gpt-4o, gpt-4, gpt-3.5-turbo)
default_model = "gpt-4o"
[anthropic]
# Your Anthropic API key (if using Claude)
api_key = "sk-ant-your-anthropic-api-key-here"
# Default model to use (options: claude-3-5-sonnet-20241022)
default_model = "claude-3-5-sonnet-20241022"
Configuration Sections
[general] Section
Controls general behavior of the extension.
| Option | Type | Default | Description |
|---|---|---|---|
log_level | string | “INFO” | Minimum level for log messages: DEBUG, INFO, WARNING, ERROR |
enable_logging | boolean | false | Enable/disable all logging output |
request_timeout_ms | integer | 30000 | Timeout for AI API requests in milliseconds |
max_retries | integer | 3 | Maximum retry attempts for failed API requests |
[query] Section
Controls query generation behavior.
| Option | Type | Default | Description |
|---|---|---|---|
enforce_limit | boolean | true | Always add LIMIT clause to SELECT queries |
default_limit | integer | 1000 | Default row limit when none specified |
[response] Section
Controls how query results are formatted and what additional information is included.
| Option | Type | Default | Description |
|---|---|---|---|
show_explanation | boolean | true | Include detailed explanation of what the query does |
show_warnings | boolean | true | Include warnings about performance, security, or data implications |
show_suggested_visualization | boolean | false | Include suggested visualization type for the query results |
use_formatted_response | boolean | false | Return structured JSON instead of plain SQL |
[openai] Section
OpenAI provider configuration.
| Option | Type | Default | Description |
|---|---|---|---|
api_key | string | “” | Your OpenAI API key from platform.openai.com |
default_model | string | “gpt-4o” | Default OpenAI model to use |
Available OpenAI Models:
gpt-4o- Latest GPT-4 Omni model (recommended)gpt-4- High-quality GPT-4 modelgpt-3.5-turbo- Fast and efficient model
[anthropic] Section
Anthropic (Claude) provider configuration.
| Option | Type | Default | Description |
|---|---|---|---|
api_key | string | “” | Your Anthropic API key from console.anthropic.com |
default_model | string | “claude-3-5-sonnet-20241022” | Default Claude model to use |
Available Anthropic Models:
claude-3-5-sonnet-20241022- Latest Claude 3.5 Sonnet model
Setting Up API Keys
Getting an OpenAI API Key
- Visit platform.openai.com
- Create an account or sign in
- Navigate to API Keys section
- Create a new API key
- Copy the key and add it to your config file
Getting an Anthropic API Key
- Visit console.anthropic.com
- Create an account or sign in
- Navigate to API Keys section
- Create a new API key
- Copy the key and add it to your config file
Provider Selection Priority
The extension automatically selects an AI provider based on the following priority:
- Explicit provider parameter in function call
- First configured provider with a valid API key in config file
- Error if no providers are configured
Configuration Validation
The extension validates configuration on startup:
- API Key Format: Checks that API keys follow expected format
- Model Availability: Validates that specified models are available
- Numeric Values: Ensures timeouts and limits are positive integers
- Boolean Values: Validates true/false values
Configuration File Only
Currently, only configuration files are supported. All settings must be specified in ~/.pg_ai.config.
Security Considerations
API Key Security
- Never commit API keys to version control
- Use appropriate file permissions:
chmod 600 ~/.pg_ai.config - Rotate keys regularly as per your organization’s security policy
- Monitor usage through your AI provider’s dashboard
Network Security
- API requests are made over HTTPS
- Consider firewall rules for outbound connections to AI providers
- Monitor network traffic for unexpected API usage
Database Security
- The extension only accesses user tables (not system catalogs)
- Generated queries respect PostgreSQL’s permission system
- Consider using dedicated database users with limited privileges
Debugging Configuration
Enable Logging
To troubleshoot configuration issues, enable logging:
[general]
enable_logging = true
log_level = "DEBUG"
Check Configuration Loading
You can verify configuration loading by observing log messages when first calling the extension:
-- This will log configuration loading details
SELECT generate_query('test query');
Validate API Keys
Test your API configuration:
-- Test with explicit API key
SELECT generate_query('show tables', 'your-api-key-here', 'openai');
Updating Configuration
Configuration is loaded on first use. To reload after making changes:
- Disconnect from PostgreSQL
- Reconnect to your database
- Call any extension function to trigger reload
Or restart your PostgreSQL session.
Next Steps
Once configured:
- Follow the Quick Start Guide to test your setup
- Explore Usage Examples
- Learn about AI Providers to optimize your model selection
Response Formatting
The pg_ai_query extension offers flexible response formatting options that allow you to customize how query results are presented. You can choose to receive just the SQL query, or enhance it with explanations, warnings, and visualization suggestions.
Response Format Options
Basic SQL Response (Default)
By default, the extension returns only the generated SQL query:
SELECT generate_query('show all customers');
-- Returns: SELECT * FROM customers LIMIT 1000;
Enhanced Text Response
When explanation and warnings are enabled, the response includes additional context as SQL comments:
-- Configuration: show_explanation=true, show_warnings=true, use_formatted_response=false
SELECT generate_query('find customers with high order values');
-- Returns:
-- SELECT c.customer_id, c.name, SUM(o.total_amount) as total_spent
-- FROM customers c
-- JOIN orders o ON c.customer_id = o.customer_id
-- GROUP BY c.customer_id, c.name
-- HAVING SUM(o.total_amount) > 1000
-- ORDER BY total_spent DESC
-- LIMIT 1000;
--
-- Explanation:
-- This query finds customers who have spent more than $1000 total across all their orders,
-- showing their total spending ranked from highest to lowest.
--
-- Warnings:
-- 1. Large dataset: This query may be slow on tables with millions of rows
-- 2. Consider indexing: customer_id and total_amount columns should be indexed
JSON Response
When use_formatted_response=true, the extension returns structured JSON:
{
"query": "SELECT c.customer_id, c.name, SUM(o.total_amount) as total_spent FROM customers c JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.name HAVING SUM(o.total_amount) > 1000 ORDER BY total_spent DESC LIMIT 1000;",
"success": true,
"explanation": "This query finds customers who have spent more than $1000 total across all their orders, showing their total spending ranked from highest to lowest.",
"warnings": [
"Large dataset: This query may be slow on tables with millions of rows",
"Consider indexing: customer_id and total_amount columns should be indexed"
],
"suggested_visualization": "bar",
"row_limit_applied": true
}
Configuration Options
Response Settings
Add these options to the [response] section of your ~/.pg_ai.config file:
[response]
# Show detailed explanation of what the query does
show_explanation = true
# Show warnings about performance, security, or data implications
show_warnings = true
# Show suggested visualization type for query results
show_suggested_visualization = true
# Use formatted response (JSON format) instead of plain SQL
use_formatted_response = false
Individual Option Details
show_explanation
- Type: Boolean
- Default: true
- Description: Includes a plain English explanation of what the query does, including business context and logic
show_warnings
- Type: Boolean
- Default: true
- Description: Includes warnings about:
- Performance implications for large datasets
- Security considerations
- Potential data quality issues
- Index recommendations
- Best practice suggestions
show_suggested_visualization
- Type: Boolean
- Default: false
- Description: Suggests the most appropriate visualization type based on the query structure and result type:
bar- For categorical comparisonsline- For time series datapie- For part-to-whole relationshipstable- For detailed data explorationnone- When visualization isn’t applicable
use_formatted_response
- Type: Boolean
- Default: false
- Description: When enabled, returns structured JSON with all components. When disabled, returns SQL with optional comment annotations.
Response Examples
Sales Analysis Query
Input:
SELECT generate_query('monthly sales trend for this year');
Text Response (use_formatted_response=false):
SELECT
DATE_TRUNC('month', order_date) as month,
SUM(total_amount) as monthly_sales,
COUNT(*) as order_count
FROM orders
WHERE EXTRACT(year FROM order_date) = EXTRACT(year FROM CURRENT_DATE)
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month
LIMIT 1000;
-- Explanation:
-- Calculates total sales and order count by month for the current year,
-- useful for identifying seasonal trends and business performance patterns.
-- Suggested Visualization: line
-- Reasoning: Time series data showing trends over months is best displayed as a line chart
JSON Response (use_formatted_response=true):
{
"query": "SELECT DATE_TRUNC('month', order_date) as month, SUM(total_amount) as monthly_sales, COUNT(*) as order_count FROM orders WHERE EXTRACT(year FROM order_date) = EXTRACT(year FROM CURRENT_DATE) GROUP BY DATE_TRUNC('month', order_date) ORDER BY month LIMIT 1000;",
"success": true,
"explanation": "Calculates total sales and order count by month for the current year, useful for identifying seasonal trends and business performance patterns.",
"warnings": [],
"suggested_visualization": "line",
"row_limit_applied": true
}
Complex Join Query
Input:
SELECT generate_query('customers who bought more than 5 products in the electronics category');
Enhanced Response:
SELECT
c.customer_id,
c.name,
COUNT(DISTINCT oi.product_id) as electronics_products_bought
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE p.category = 'electronics'
GROUP BY c.customer_id, c.name
HAVING COUNT(DISTINCT oi.product_id) > 5
ORDER BY electronics_products_bought DESC
LIMIT 1000;
-- Explanation:
-- Finds customers who have purchased more than 5 different electronics products,
-- showing customer details and the count of unique electronics products they bought.
-- Warnings:
-- 1. Multiple JOINs: This query involves 4 table joins which may be slow on large datasets
-- 2. Index recommendation: Ensure indexes exist on customer_id, order_id, product_id, and category columns
-- 3. Category filter: Verify 'electronics' is the exact category name used in your database
-- Suggested Visualization: table
-- Reasoning: Customer list with counts is best displayed in a table format for detailed analysis
Using Responses in Applications
Parsing JSON Responses
When using use_formatted_response=true, you can parse the JSON in your application:
import psycopg2
import json
# Execute query
cursor.execute("SELECT generate_query('top selling products')")
response_json = cursor.fetchone()[0]
# Parse the JSON response
response = json.loads(response_json)
# Extract components
sql_query = response['query']
explanation = response.get('explanation', '')
warnings = response.get('warnings', [])
visualization = response.get('suggested_visualization', 'table')
# Use the SQL query for actual data retrieval
cursor.execute(sql_query)
results = cursor.fetchall()
Working with Text Responses
For text responses with comments, you can extract just the SQL:
def extract_sql_from_response(response_text):
"""Extract just the SQL query from a text response."""
lines = response_text.split('\n')
sql_lines = []
for line in lines:
# Skip comment lines
if line.strip().startswith('--'):
break
if line.strip():
sql_lines.append(line)
return '\n'.join(sql_lines)
# Usage
response = cursor.fetchone()[0]
clean_sql = extract_sql_from_response(response)
Best Practices
For Development
- Enable all response features during development for better understanding
- Use JSON format for programmatic parsing
- Pay attention to warnings for performance optimization
For Production
- Consider disabling detailed explanations to reduce response size
- Keep warnings enabled for monitoring potential issues
- Use text format for human-readable responses
For Business Intelligence
- Enable visualization suggestions to guide chart creation
- Use explanations to document generated queries
- Monitor warnings for data quality issues
Performance Considerations
Response Size
- JSON responses are larger than text responses
- Explanations and warnings add to response size
- Consider response formatting impact on network transfer
AI Token Usage
- Enhanced responses require more AI processing
- Detailed explanations use additional API tokens
- Balance information value vs. cost
Caching
- Consider caching formatted responses for repeated queries
- JSON responses are easier to cache and manipulate
- Text responses are more human-readable but harder to parse
Troubleshooting Response Formatting
Configuration Not Applied
If response formatting isn’t working:
- Check configuration file location:
~/.pg_ai.config - Verify configuration syntax (no quotes around true/false values)
- Restart PostgreSQL connection to reload configuration
- Check PostgreSQL logs for configuration errors
Incomplete Responses
If responses are missing expected components:
- Verify AI provider supports detailed responses
- Check API rate limits and quotas
- Review request timeout settings
- Enable logging to see full AI responses
JSON Parsing Errors
When using use_formatted_response=true:
- Ensure AI provider returns valid JSON
- Check for truncated responses due to token limits
- Verify your JSON parsing library handles the response format
- Use text format as fallback if JSON parsing fails
Quick Start Guide
This guide will get you up and running with pg_ai_query in just a few minutes. We’ll walk through setting up the extension, configuring it, and generating your first AI-powered SQL queries.
Step 1: Install and Enable the Extension
First, make sure you have pg_ai_query installed (see Installation Guide if needed), then enable it in your database:
-- Connect to your database
psql -d your_database
-- Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_ai_query;
-- Verify installation
\df generate_query
You should see the generate_query function listed.
Step 2: Configure API Access
Create a configuration file at ~/.pg_ai.config:
# Quick start configuration
[general]
enable_logging = true
[openai]
api_key = "your-openai-api-key-here"
default_model = "gpt-4o"
Get your OpenAI API key:
- Visit platform.openai.com
- Sign up or log in
- Create an API key in the API keys section
- Replace
your-openai-api-key-herewith your actual key
Step 3: Create Sample Data
Let’s create some sample tables to work with:
-- Create a users table
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(150) UNIQUE NOT NULL,
age INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
city VARCHAR(50)
);
-- Create an orders table
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
product_name VARCHAR(200),
amount DECIMAL(10,2),
status VARCHAR(20) DEFAULT 'pending',
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Insert sample data
INSERT INTO users (name, email, age, city) VALUES
('Alice Johnson', 'alice@example.com', 28, 'New York'),
('Bob Smith', 'bob@example.com', 35, 'San Francisco'),
('Carol Davis', 'carol@example.com', 31, 'Chicago'),
('David Wilson', 'david@example.com', 27, 'Seattle'),
('Eva Brown', 'eva@example.com', 33, 'Boston');
INSERT INTO orders (user_id, product_name, amount, status) VALUES
(1, 'Laptop Pro', 1299.99, 'completed'),
(1, 'Wireless Mouse', 79.99, 'completed'),
(2, 'Monitor 4K', 399.99, 'pending'),
(3, 'Keyboard Mechanical', 159.99, 'completed'),
(4, 'Tablet', 599.99, 'shipped'),
(5, 'Phone Case', 29.99, 'completed');
Step 4: Generate Your First Queries
Now let’s use natural language to query our data:
Basic Query
-- Simple user lookup
SELECT generate_query('show me all users');
Result:
SELECT id, name, email, age, created_at, city FROM public.users LIMIT 1000;
Query with Conditions
-- Users from specific city
SELECT generate_query('find all users from New York');
Result:
SELECT id, name, email, age, created_at, city
FROM public.users
WHERE city = 'New York'
LIMIT 1000;
Query with Joins
-- Orders with user information
SELECT generate_query('show me all orders with customer names');
Result:
SELECT o.id, o.product_name, o.amount, o.status, o.order_date, u.name as customer_name
FROM public.orders o
JOIN public.users u ON o.user_id = u.id
LIMIT 1000;
Aggregation Query
-- Revenue analysis
SELECT generate_query('calculate total revenue by order status');
Result:
SELECT status, SUM(amount) as total_revenue
FROM public.orders
GROUP BY status
LIMIT 1000;
Step 5: Execute Generated Queries
The extension generates SQL queries that you can execute immediately:
-- Generate and execute in one step
WITH generated AS (
SELECT generate_query('show users older than 30') as query
)
SELECT query FROM generated;
-- Copy the generated query and run it:
SELECT id, name, email, age, created_at, city
FROM public.users
WHERE age > 30
LIMIT 1000;
Step 6: Explore Advanced Features
Schema Discovery
Check what tables the extension can see:
-- View all tables in your database
SELECT get_database_tables();
-- Get detailed information about a specific table
SELECT get_table_details('users');
Different AI Providers
If you have multiple providers configured:
-- Use specific provider
SELECT generate_query('show recent orders', null, 'openai');
SELECT generate_query('show recent orders', null, 'anthropic');
Complex Queries
Try more complex natural language requests:
-- Complex aggregation
SELECT generate_query('show top 3 customers by total order amount with their contact info');
-- Date-based filtering
SELECT generate_query('find orders placed in the last 7 days');
-- Multiple conditions
SELECT generate_query('show users from California or New York who are older than 25');
Common Patterns
1. Exploratory Data Analysis
-- Understand your data structure
SELECT generate_query('describe the structure of my database');
SELECT generate_query('show me sample data from each table');
SELECT generate_query('count records in all tables');
2. Business Intelligence
-- Sales analysis
SELECT generate_query('monthly revenue trends');
SELECT generate_query('top selling products');
SELECT generate_query('customer acquisition by month');
3. Data Quality Checks
-- Find data issues
SELECT generate_query('find users with missing email addresses');
SELECT generate_query('show duplicate orders');
SELECT generate_query('find orders without valid user references');
Troubleshooting Quick Start
Error: “Extension not found”
-- Solution: Install the extension first
CREATE EXTENSION pg_ai_query;
Error: “API key not configured”
- Check your
~/.pg_ai.configfile exists - Verify the API key is valid
- Restart your PostgreSQL session
Error: “No tables found”
- Make sure you have user tables (not just system tables)
- Check table permissions
- Try creating the sample tables from Step 3
Unexpected Results
- Enable logging to see what’s happening:
[general] enable_logging = true log_level = "DEBUG" - Check the generated query makes sense
- Try simpler natural language descriptions
Next Steps
Now that you’re up and running:
- Learn more about Usage Patterns for advanced query generation
- Explore Examples for inspiration
- Read about AI Providers to optimize your model choice
- Check Best Practices for production usage
Getting Help
If you run into issues:
- Check the Troubleshooting Guide
- Review the FAQ
- Look at more Examples for inspiration
generate_query Function
The generate_query function converts natural language descriptions into valid PostgreSQL queries using AI models from OpenAI and Anthropic.
Function Signature
generate_query(
natural_language_query text,
api_key text DEFAULT NULL,
provider text DEFAULT 'auto'
) RETURNS text
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
natural_language_query | text | required | The natural language description of the desired query |
api_key | text | NULL | OpenAI or Anthropic API key (uses config if not provided) |
provider | text | 'auto' | AI provider: 'openai', 'anthropic', or 'auto' |
Basic Usage
Simple Queries
-- Basic data retrieval
SELECT generate_query('show all users');
-- With filtering
SELECT generate_query('find users created in the last week');
-- Counting records
SELECT generate_query('count total orders');
Complex Queries
-- Joins and aggregation
SELECT generate_query('show top 10 customers by total order value with their email addresses');
-- Date-based analysis
SELECT generate_query('monthly revenue trend for the last year');
Configuration
API Key Setup
Create ~/.pg_ai.config:
[openai]
api_key = "sk-your-openai-api-key"
default_model = "gpt-4o"
[anthropic]
api_key = "sk-ant-your-anthropic-key"
default_model = "claude-3-5-sonnet-20241022"
See Also
- explain_query Function - Analyze query performance
- Examples - More usage examples
- Error Codes - Troubleshooting guide
explain_query Function
The explain_query function runs EXPLAIN ANALYZE on PostgreSQL queries and provides AI-powered performance analysis and optimization recommendations.
Overview
This function combines PostgreSQL’s built-in EXPLAIN ANALYZE functionality with advanced AI analysis to provide:
- Detailed execution plan analysis
- Performance bottleneck identification
- Index recommendations
- Query optimization suggestions
- Easy-to-understand explanations in plain English
Function Signature
explain_query(
query_text text,
api_key text DEFAULT NULL,
provider text DEFAULT 'auto'
) RETURNS text
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
query_text | text | required | The SQL query to analyze |
api_key | text | NULL | OpenAI or Anthropic API key (uses config if not provided) |
provider | text | 'auto' | AI provider: 'openai', 'anthropic', or 'auto' |
Basic Usage
Simple Query Analysis
SELECT explain_query('SELECT * FROM users WHERE created_at > NOW() - INTERVAL ''7 days''');
Complex Query Analysis
SELECT explain_query('
SELECT u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > NOW() - INTERVAL ''30 days''
GROUP BY u.id, u.username
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC
LIMIT 100
');
Using Specific AI Provider
-- Use OpenAI specifically
SELECT explain_query(
'SELECT * FROM products WHERE price > 100 ORDER BY price DESC LIMIT 10',
'your-openai-api-key',
'openai'
);
-- Use Anthropic specifically
SELECT explain_query(
'SELECT * FROM products WHERE price > 100 ORDER BY price DESC LIMIT 10',
'your-anthropic-api-key',
'anthropic'
);
Output Format
The function returns a structured text analysis with these sections:
Query Overview
Brief description of what the query accomplishes.
Performance Summary
- Overall execution time
- Total cost estimate (PostgreSQL’s relative measure)
- Number of rows processed
Execution Plan Analysis
- Key operations in the execution plan
- Join strategies and scan methods
- Focus on expensive operations
Performance Issues
- Identified bottlenecks
- Inefficient operations
- Resource usage concerns
Optimization Suggestions
- Specific recommendations for improvement
- Query rewriting suggestions
- Configuration recommendations
Index Recommendations
- Missing indexes that could improve performance
- Specific
CREATE INDEXstatements - Partial index suggestions where applicable
Example Output
Query Overview:
This query retrieves users created within the last 7 days along with their order statistics,
focusing on active customers with more than 5 orders.
Performance Summary:
- Overall Execution Time: 45.2 milliseconds
- Total Cost: 1250.75 (PostgreSQL's relative cost estimate)
- Rows Processed: 156 rows returned from 50,000 rows examined
Execution Plan Analysis:
- Hash Join: Efficiently joins users and orders tables
- Index Scan: Uses existing index on users.created_at
- Sequential Scan: Full table scan on orders table (potential bottleneck)
- HashAggregate: Groups results for COUNT calculations
- Sort: Orders results by order count
Performance Issues:
- Sequential scan on orders table indicates missing index on user_id
- Hash join spills to disk due to large orders table size
- HAVING clause applied after aggregation, could be optimized
Optimization Suggestions:
1. Add index on orders.user_id to eliminate sequential scan
2. Consider partitioning orders table by date if very large
3. Move some HAVING conditions to WHERE clause if possible
4. Increase work_mem setting if hash joins frequently spill to disk
Index Recommendations:
-- Primary recommendation
CREATE INDEX idx_orders_user_id ON orders(user_id);
-- Optional: Composite index for better performance
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Consider partial index for recent orders
CREATE INDEX idx_orders_recent ON orders(user_id) WHERE created_at > NOW() - INTERVAL '1 year';
Supported Query Types
The function supports analysis of:
SELECTstatementsWITH(Common Table Expressions) queriesVALUESclauses
Note: Only read-only queries are supported for security reasons. DDL, DML, and other statement types will return an error.
Configuration
The function uses the same configuration system as other pg_ai_query functions.
API Keys
Configure API keys in ~/.pg_ai.config:
[openai]
api_key = "your-openai-api-key"
default_model = "gpt-4o"
[anthropic]
api_key = "your-anthropic-api-key"
default_model = "claude-3-5-sonnet-20241022"
Provider Selection
'auto'(default): Uses the first available API key from configuration'openai': Forces use of OpenAI models'anthropic': Forces use of Anthropic models
Error Handling
Common error scenarios and their solutions:
Invalid Query
SELECT explain_query('SELECT * FROM non_existent_table');
-- Error: relation "non_existent_table" does not exist
Missing API Key
-- When no API key is configured
SELECT explain_query('SELECT * FROM users');
-- Error: API key required. Pass as parameter or configure ~/.pg_ai.config
Syntax Error
SELECT explain_query('SELECT * FORM users'); -- typo in FROM
-- Error: syntax error at or near "FORM"
Unsupported Query Type
SELECT explain_query('DROP TABLE users');
-- Error: Only SELECT, WITH, and VALUES queries are allowed
Performance Considerations
- Query Execution: The function actually executes your query via EXPLAIN ANALYZE
- Execution Time: Query execution time is included in the analysis
- AI Processing: AI analysis adds typically 1-3 seconds of processing time
- Large Queries: Very complex queries may take longer to analyze
Security Notes
- Queries are executed with the same permissions as the calling user
- No privilege escalation occurs
- API keys are handled securely and not logged
- Only read-only query types are permitted
Best Practices
- Use for Optimization: Run on queries that are performing slowly
- Test Variations: Compare different approaches to the same query
- Monitor Trends: Regular analysis helps track performance changes over time
- Validate Recommendations: Always test suggested indexes before implementing in production
- Consider Data Volume: Remember that performance characteristics change with data size
Integration with Other Functions
The explain_query function works well with other pg_ai_query functions:
-- Generate a query, then analyze its performance
WITH generated_query AS (
SELECT generate_query('show recent high-value orders') as sql
)
SELECT explain_query((SELECT sql FROM generated_query));
Common Use Cases
- Query Optimization: Identify and fix slow queries
- Index Planning: Determine what indexes to create
- Performance Monitoring: Regular health checks of critical queries
- Code Reviews: Analyze query performance before deployment
- Learning: Understand how PostgreSQL executes different query patterns
See Also
get_database_tables Function
The get_database_tables function returns metadata about all user tables in the database, providing a comprehensive overview of your database schema.
Function Signature
get_database_tables() RETURNS text
Parameters
This function takes no parameters.
Basic Usage
-- Get all database tables
SELECT get_database_tables();
-- Pretty print with formatting
SELECT jsonb_pretty(get_database_tables()::jsonb);
Output Format
The function returns a JSON array containing metadata for each table:
[
{
"table_name": "users",
"schema_name": "public",
"table_type": "BASE TABLE",
"estimated_rows": 1500
},
{
"table_name": "orders",
"schema_name": "public",
"table_type": "BASE TABLE",
"estimated_rows": 5000
}
]
Extracting Information
Get Table Names Only
SELECT
jsonb_array_elements(get_database_tables()::jsonb)->>'table_name' as table_name
FROM (SELECT get_database_tables()) t;
Get Tables with Row Counts
SELECT
table_name,
estimated_rows
FROM jsonb_to_recordset(get_database_tables()::jsonb)
AS x(table_name text, estimated_rows int)
ORDER BY estimated_rows DESC;
Use Cases
- Database exploration and documentation
- Schema discovery for applications
- Inventory of available tables
- Integration with generate_query function
See Also
- get_table_details Function - Get detailed table information
- generate_query Function - Generate queries from natural language
get_table_details Function
The get_table_details function returns detailed information about a specific table including columns, constraints, foreign keys, and indexes.
Function Signature
get_table_details(
table_name text,
schema_name text DEFAULT 'public'
) RETURNS text
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
table_name | text | required | Name of the table to analyze |
schema_name | text | 'public' | Schema containing the table |
Basic Usage
-- Basic usage
SELECT get_table_details('users');
-- Specific schema
SELECT get_table_details('orders', 'sales');
-- Pretty print
SELECT jsonb_pretty(get_table_details('users')::jsonb);
Output Format
The function returns detailed JSON information about the table:
{
"table_name": "users",
"schema_name": "public",
"columns": [
{
"column_name": "id",
"data_type": "integer",
"is_nullable": false,
"column_default": "nextval('users_id_seq'::regclass)",
"is_primary_key": true,
"is_foreign_key": false
},
{
"column_name": "email",
"data_type": "character varying",
"is_nullable": false,
"column_default": null,
"is_primary_key": false,
"is_foreign_key": false
}
],
"indexes": [
"users_pkey",
"users_email_key"
]
}
Extracting Information
Get Column Information
SELECT
column_name,
data_type,
is_nullable,
is_primary_key
FROM jsonb_to_recordset(
(get_table_details('users')::jsonb)->'columns'
) AS x(
column_name text,
data_type text,
is_nullable boolean,
is_primary_key boolean
);
Find Primary Key Columns
SELECT
column_name
FROM jsonb_to_recordset(
(get_table_details('users')::jsonb)->'columns'
) AS x(column_name text, is_primary_key boolean)
WHERE is_primary_key = true;
List All Indexes
SELECT
jsonb_array_elements_text(
(get_table_details('orders')::jsonb)->'indexes'
) as index_name;
Use Cases
- Schema Documentation: Generate comprehensive table documentation
- Database Exploration: Understand table structure before writing queries
- Data Migration Planning: Analyze table structure for migration scripts
- Query Optimization: Understand available indexes and constraints
- Integration Planning: Map foreign key relationships
Common Patterns
Analyzing All Tables
-- Get details for all tables in database
WITH all_tables AS (
SELECT
jsonb_array_elements(get_database_tables()::jsonb)->>'table_name' as table_name
)
SELECT
table_name,
get_table_details(table_name) as details
FROM all_tables;
Finding Foreign Key Relationships
SELECT
column_name,
foreign_table,
foreign_column
FROM jsonb_to_recordset(
(get_table_details('orders')::jsonb)->'columns'
) AS x(
column_name text,
is_foreign_key boolean,
foreign_table text,
foreign_column text
)
WHERE is_foreign_key = true;
Integration with Other Functions
With generate_query
-- First understand the table structure
SELECT get_table_details('users');
-- Then generate queries based on that knowledge
SELECT generate_query('show users with their profile information');
With explain_query
-- Analyze table structure
SELECT get_table_details('orders');
-- Generate and analyze a query
SELECT explain_query('SELECT * FROM orders WHERE user_id = 123');
Error Handling
Common errors and solutions:
| Error | Cause | Solution |
|---|---|---|
"Table 'tablename' does not exist" | Table not found | Check table name and schema |
"Access denied to table" | Insufficient permissions | Grant SELECT permission on table |
"Schema 'schemaname' does not exist" | Schema not found | Verify schema name |
See Also
- get_database_tables Function - List all database tables
- generate_query Function - Generate queries from natural language
- Error Codes - Troubleshooting guide
Basic Usage
This guide covers the fundamental ways to use pg_ai_query for generating SQL queries from natural language. Whether you’re exploring data, creating reports, or learning SQL, this guide will help you get the most out of the extension.
Core Function: generate_query()
The main function you’ll use is generate_query(), which converts natural language into SQL:
generate_query(natural_language_query, [api_key], [provider])
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
natural_language_query | text | Yes | - | Your question in plain English |
api_key | text | No | null | API key (uses config if not provided) |
provider | text | No | ‘auto’ | AI provider: ‘openai’, ‘anthropic’, or ‘auto’ |
Basic Examples
-- Simplest form - uses configured API key and provider
SELECT generate_query('show me all users');
-- With custom API key
SELECT generate_query('count total orders', 'your-api-key-here');
-- With specific provider
SELECT generate_query('find recent customers', null, 'openai');
Query Types
1. Simple Data Retrieval
Goal: Get basic data from tables
-- Show all records
SELECT generate_query('show me all products');
SELECT generate_query('list all customers');
SELECT generate_query('display user information');
-- Show specific columns
SELECT generate_query('show user names and emails');
SELECT generate_query('get product names and prices');
2. Filtered Queries
Goal: Find specific records based on conditions
-- Simple filters
SELECT generate_query('find users older than 25');
SELECT generate_query('show completed orders');
SELECT generate_query('get products under $100');
-- Multiple conditions
SELECT generate_query('find users from New York who are older than 30');
SELECT generate_query('show pending orders from the last month');
-- Date-based filters
SELECT generate_query('show orders from yesterday');
SELECT generate_query('find users created in 2024');
SELECT generate_query('get sales from last quarter');
3. Aggregation and Analysis
Goal: Calculate totals, averages, counts, and other statistics
-- Simple counts
SELECT generate_query('count total users');
SELECT generate_query('how many orders do we have');
-- Sums and totals
SELECT generate_query('calculate total revenue');
SELECT generate_query('sum all order amounts');
-- Averages
SELECT generate_query('average order value');
SELECT generate_query('mean user age');
-- Group by analysis
SELECT generate_query('count orders by status');
SELECT generate_query('total sales by month');
SELECT generate_query('average order value by customer');
4. Joined Data
Goal: Combine information from multiple tables
-- Simple joins
SELECT generate_query('show orders with customer names');
SELECT generate_query('list products with category information');
-- Complex relationships
SELECT generate_query('show customers with their total order amounts');
SELECT generate_query('find users who have never placed an order');
SELECT generate_query('list top customers by number of orders');
5. Ranking and Sorting
Goal: Order results and find top/bottom records
-- Top N queries
SELECT generate_query('top 10 customers by revenue');
SELECT generate_query('5 most expensive products');
SELECT generate_query('latest 20 orders');
-- Sorting
SELECT generate_query('show users ordered by age');
SELECT generate_query('list products by price descending');
Best Practices for Natural Language
Be Specific About What You Want
Good:
SELECT generate_query('show user names and email addresses for active users');
Less Good:
SELECT generate_query('show users');
Use Clear Column References
Good:
SELECT generate_query('find orders with amount greater than 1000');
Better:
SELECT generate_query('find orders where order_amount > 1000');
Specify Time Periods Clearly
Good:
SELECT generate_query('show orders from the last 30 days');
SELECT generate_query('find users created between January 1, 2024 and March 31, 2024');
Use Business Terms
The AI understands common business terminology:
-- These work well
SELECT generate_query('show quarterly revenue');
SELECT generate_query('find churned customers');
SELECT generate_query('calculate customer lifetime value');
SELECT generate_query('show conversion rates by month');
Advanced Usage Patterns
1. Incremental Query Building
Start simple and add complexity:
-- Step 1: Basic query
SELECT generate_query('show orders');
-- Step 2: Add filters
SELECT generate_query('show completed orders');
-- Step 3: Add aggregation
SELECT generate_query('count completed orders by customer');
-- Step 4: Add sorting
SELECT generate_query('count completed orders by customer, ordered by count desc');
2. Data Exploration Workflow
-- 1. Understand data structure
SELECT get_database_tables();
-- 2. Explore individual tables
SELECT generate_query('show sample data from users table');
SELECT generate_query('describe the structure of orders table');
-- 3. Understand relationships
SELECT generate_query('show how users and orders are connected');
-- 4. Analyze patterns
SELECT generate_query('show distribution of users by city');
SELECT generate_query('analyze order patterns by day of week');
3. Report Generation
-- Daily report
SELECT generate_query('daily sales summary for today');
-- Customer analysis
SELECT generate_query('customer segmentation by order frequency');
-- Product performance
SELECT generate_query('product performance metrics for last month');
Working with Results
1. Copying and Executing Queries
-- Method 1: Copy-paste the result
SELECT generate_query('show top customers');
-- Copy the returned SQL and execute it separately
-- Method 2: Store in a variable (if your client supports it)
\set query `SELECT generate_query('show top customers')`
:query
2. Query Refinement
If the generated query isn’t quite right:
-- Try rephrasing
SELECT generate_query('show customers with highest total orders');
-- vs
SELECT generate_query('rank customers by total order value');
-- Be more specific
SELECT generate_query('show customers with more than 5 orders and total value over $1000');
3. Validation
Always review generated queries before using in production:
-- Check the generated query
SELECT generate_query('delete old records');
-- Review before executing - the extension won't generate dangerous queries,
-- but always verify the logic matches your intent
Error Handling
Common Issues and Solutions
“No tables found”
-- Check your database has user tables
SELECT get_database_tables();
“Query too complex”
-- Break down into simpler parts
-- Instead of: "show customers with orders and products and categories"
-- Try: "show customers with their order details"
“Ambiguous column reference”
-- Be more specific about table/column names
-- Instead of: "show orders with customer info"
-- Try: "show orders with customer names from users table"
Performance Considerations
Query Limits
All queries automatically include LIMIT clauses for safety:
-- This automatically gets LIMIT 1000 (configurable)
SELECT generate_query('show all orders');
Large Datasets
For large tables, be specific about what you need:
-- Instead of: "show all user data"
-- Use: "show user summary with names and creation dates"
Indexes
Generated queries work best with proper indexes:
-- Create indexes on commonly filtered columns
CREATE INDEX idx_users_city ON users(city);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_status ON orders(status);
Integration Patterns
1. Application Development
# Python example
cursor.execute("SELECT generate_query(%s)", ['show recent user activity'])
query = cursor.fetchone()[0]
cursor.execute(query)
results = cursor.fetchall()
2. Business Intelligence Tools
Many BI tools can use the extension to generate queries dynamically based on user input.
3. Data Documentation
Use the extension to generate example queries for documentation:
-- Generate examples for each table
SELECT generate_query('show sample data from ' || table_name)
FROM information_schema.tables
WHERE table_schema = 'public';
Next Steps
- Explore the Function Reference for complete API details
- Check out Examples for more complex use cases
- Learn about Schema Discovery to understand how the extension analyzes your database
Query Performance Analysis
Learn how to use the explain_query function to analyze and optimize your PostgreSQL queries with AI-powered insights.
Introduction
Query performance analysis is crucial for maintaining fast, efficient databases. The explain_query function combines PostgreSQL’s EXPLAIN ANALYZE with advanced AI analysis to provide actionable insights for query optimization.
Getting Started
Basic Analysis
Start with a simple query to understand the output format:
SELECT explain_query('SELECT id, email FROM users WHERE active = true LIMIT 100');
This will give you:
- How PostgreSQL executes the query
- Whether indexes are being used
- Performance metrics and timing
- AI-generated optimization suggestions
Analyzing Slow Queries
If you have a query that’s running slowly:
SELECT explain_query('
SELECT u.username, COUNT(o.id) as total_orders
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
WHERE u.created_at > ''2024-01-01''
GROUP BY u.username
ORDER BY total_orders DESC
');
Look for:
- Sequential scans on large tables
- Missing indexes
- Inefficient join operations
- Sort operations without indexes
Common Performance Patterns
Index Usage Analysis
Checking if Indexes are Used
-- This should use an index if one exists on user_id
SELECT explain_query('SELECT * FROM orders WHERE user_id = 123');
-- This might not use an index efficiently
SELECT explain_query('SELECT * FROM orders WHERE UPPER(status) = ''PENDING''');
Composite Index Analysis
-- Check if a composite index would help
SELECT explain_query('
SELECT * FROM orders
WHERE status = ''completed''
AND created_at > NOW() - INTERVAL ''30 days''
ORDER BY created_at DESC
');
Look for recommendations about creating indexes like:
CREATE INDEX idx_orders_status_created ON orders(status, created_at);
Join Performance
Nested Loop vs Hash Join
-- Small result set - might use nested loop
SELECT explain_query('
SELECT u.email, o.total
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.id = 123
');
-- Large result set - should use hash or merge join
SELECT explain_query('
SELECT u.username, COUNT(o.id)
FROM users u
JOIN orders o ON u.id = o.user_id
GROUP BY u.username
');
Optimizing Join Order
-- Multiple joins - check join order efficiency
SELECT explain_query('
SELECT u.username, p.name as product, oi.quantity
FROM users u
JOIN orders o ON u.id = o.user_id
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.created_at > NOW() - INTERVAL ''7 days''
');
Aggregation Performance
GROUP BY Analysis
-- Check if grouping is efficient
SELECT explain_query('
SELECT DATE(created_at), COUNT(*)
FROM orders
WHERE created_at >= ''2024-01-01''
GROUP BY DATE(created_at)
ORDER BY DATE(created_at)
');
Window Functions
-- Analyze window function performance
SELECT explain_query('
SELECT
user_id,
created_at,
ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) as order_number
FROM orders
WHERE created_at > NOW() - INTERVAL ''1 month''
');
Optimization Workflows
Before and After Comparison
1. Analyze Original Query
SELECT explain_query('
SELECT * FROM orders o, users u
WHERE o.user_id = u.id AND o.total > 100
');
2. Apply Optimizations
Based on the AI recommendations, rewrite the query:
SELECT explain_query('
SELECT o.id, o.total, u.username
FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE o.total > 100
');
3. Create Recommended Indexes
-- Example recommendation from AI analysis
CREATE INDEX idx_orders_total ON orders(total) WHERE total > 0;
4. Re-analyze
SELECT explain_query('
SELECT o.id, o.total, u.username
FROM orders o
INNER JOIN users u ON o.user_id = u.id
WHERE o.total > 100
');
Systematic Query Review
Create a systematic approach to review your application’s queries:
-- Create a function to analyze multiple queries
CREATE OR REPLACE FUNCTION analyze_app_queries()
RETURNS TABLE(query_name text, analysis text) AS $$
BEGIN
RETURN QUERY
SELECT 'user_dashboard'::text,
explain_query('SELECT * FROM users WHERE last_login > NOW() - INTERVAL ''30 days''')
UNION ALL
SELECT 'order_summary'::text,
explain_query('SELECT COUNT(*), SUM(total) FROM orders WHERE created_at >= CURRENT_DATE')
UNION ALL
SELECT 'top_products'::text,
explain_query('
SELECT p.name, SUM(oi.quantity) as sold
FROM products p
JOIN order_items oi ON p.id = oi.product_id
JOIN orders o ON oi.order_id = o.id
WHERE o.created_at >= CURRENT_DATE - INTERVAL ''7 days''
GROUP BY p.id, p.name
ORDER BY sold DESC
LIMIT 10
');
END;
$$ LANGUAGE plpgsql;
-- Run the analysis
SELECT * FROM analyze_app_queries();
Performance Monitoring
Regular Health Checks
Set up regular performance monitoring:
-- Monitor daily active users query
SELECT explain_query('
SELECT COUNT(DISTINCT user_id)
FROM user_sessions
WHERE DATE(created_at) = CURRENT_DATE
');
-- Monitor order processing performance
SELECT explain_query('
SELECT AVG(processing_time_seconds)
FROM orders
WHERE status = ''completed''
AND created_at >= CURRENT_DATE
');
Automated Analysis
Create views for continuous monitoring:
CREATE VIEW query_performance_monitor AS
SELECT
'recent_orders' as query_type,
explain_query('SELECT * FROM orders WHERE created_at > NOW() - INTERVAL ''1 hour''') as analysis
UNION ALL
SELECT
'active_sessions',
explain_query('SELECT COUNT(*) FROM user_sessions WHERE last_activity > NOW() - INTERVAL ''15 minutes''');
Advanced Analysis Techniques
Subquery vs JOIN Performance
-- Analyze EXISTS subquery
SELECT explain_query('
SELECT u.username
FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = u.id
AND o.total > 1000
)
');
-- Compare with JOIN approach
SELECT explain_query('
SELECT DISTINCT u.username
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE o.total > 1000
');
CTE vs Subquery Analysis
-- Common Table Expression approach
SELECT explain_query('
WITH high_value_customers AS (
SELECT user_id, SUM(total) as total_spent
FROM orders
WHERE created_at > NOW() - INTERVAL ''1 year''
GROUP BY user_id
HAVING SUM(total) > 5000
)
SELECT u.username, hvc.total_spent
FROM users u
JOIN high_value_customers hvc ON u.id = hvc.user_id
');
-- Subquery approach
SELECT explain_query('
SELECT u.username, sub.total_spent
FROM users u
JOIN (
SELECT user_id, SUM(total) as total_spent
FROM orders
WHERE created_at > NOW() - INTERVAL ''1 year''
GROUP BY user_id
HAVING SUM(total) > 5000
) sub ON u.id = sub.user_id
');
Partition Analysis
For partitioned tables:
-- Check if partition pruning is working
SELECT explain_query('
SELECT COUNT(*)
FROM orders_2024
WHERE created_at BETWEEN ''2024-06-01'' AND ''2024-06-30''
');
Interpreting AI Recommendations
Understanding Cost Estimates
- Total Cost: PostgreSQL’s relative measure of query expense
- Startup Cost: Cost before returning first row
- Execution Time: Actual time taken (from ANALYZE)
- Rows: Estimated vs actual rows processed
Index Recommendations
The AI will suggest:
- Simple indexes:
CREATE INDEX idx_table_column ON table(column); - Composite indexes:
CREATE INDEX idx_table_multi ON table(col1, col2); - Partial indexes:
CREATE INDEX idx_table_partial ON table(column) WHERE condition; - Expression indexes:
CREATE INDEX idx_table_expr ON table(LOWER(column));
Query Rewriting Suggestions
Common recommendations include:
- Rewriting
WHEREclauses to be more selective - Changing join order or join types
- Using
LIMITmore effectively - Replacing correlated subqueries with joins
Troubleshooting Common Issues
No Index Usage
If indexes aren’t being used:
-- Check if the query can use the index
SELECT explain_query('SELECT * FROM users WHERE email = ''user@example.com''');
-- Vs a query that can't use the index efficiently
SELECT explain_query('SELECT * FROM users WHERE email LIKE ''%@example.com''');
Slow Aggregations
For slow GROUP BY queries:
-- Analyze aggregation performance
SELECT explain_query('
SELECT product_category, COUNT(*), AVG(price)
FROM products
GROUP BY product_category
');
Large Result Sets
When dealing with large results:
-- Use LIMIT to test query efficiency
SELECT explain_query('
SELECT * FROM orders
ORDER BY created_at DESC
LIMIT 100
');
Best Practices
- Start Small: Begin with
LIMITto understand query patterns - Focus on Bottlenecks: Look for the most expensive operations first
- Test Incrementally: Make one optimization at a time
- Validate Results: Ensure optimizations actually improve performance
- Monitor Regularly: Set up regular analysis of critical queries
- Document Changes: Keep track of optimizations and their impact
Integration with Development Workflow
Code Review Process
-- Add performance analysis to your development process
CREATE OR REPLACE FUNCTION review_new_query(query_text text)
RETURNS text AS $$
BEGIN
RETURN explain_query(query_text);
END;
$$ LANGUAGE plpgsql;
-- Use in code reviews
SELECT review_new_query('SELECT * FROM new_feature_table WHERE complex_condition = true');
CI/CD Integration
Create performance tests:
-- Performance regression test
DO $$
DECLARE
analysis text;
BEGIN
SELECT explain_query('SELECT * FROM critical_table WHERE important_column = 123') INTO analysis;
-- Log or validate the analysis
IF analysis LIKE '%Sequential Scan%' THEN
RAISE WARNING 'Performance regression detected: Sequential scan in critical query';
END IF;
END;
$$;
Next Steps
- Function Reference - Complete API documentation
- Best Practices - Advanced optimization techniques
- Integration Patterns - Using with other tools
- Troubleshooting - Common issues and solutions
Examples
This page provides practical examples of using pg_ai_query for common database tasks and use cases.
Sample Database
For these examples, we’ll use a sample e-commerce database with the following tables:
-- Users table
CREATE TABLE users (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL,
email VARCHAR(150) UNIQUE NOT NULL,
age INTEGER,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
city VARCHAR(50),
status VARCHAR(20) DEFAULT 'active'
);
-- Products table
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(200) NOT NULL,
category VARCHAR(50),
price DECIMAL(10,2),
stock_quantity INTEGER DEFAULT 0,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Orders table
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
user_id INTEGER REFERENCES users(id),
total_amount DECIMAL(10,2),
status VARCHAR(20) DEFAULT 'pending',
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
shipping_city VARCHAR(50)
);
-- Order items table
CREATE TABLE order_items (
id SERIAL PRIMARY KEY,
order_id INTEGER REFERENCES orders(id),
product_id INTEGER REFERENCES products(id),
quantity INTEGER,
unit_price DECIMAL(10,2)
);
Basic Queries
Simple Data Retrieval
-- Show all users
SELECT generate_query('show me all users');
-- Result: SELECT id, name, email, age, created_at, city, status FROM public.users LIMIT 1000;
-- List products with their prices
SELECT generate_query('show product names and prices');
-- Result: SELECT name, price FROM public.products LIMIT 1000;
-- Get recent orders
SELECT generate_query('show orders from the last 7 days');
-- Result: SELECT * FROM public.orders WHERE order_date >= CURRENT_DATE - INTERVAL '7 days' LIMIT 1000;
Filtered Searches
-- Find active users
SELECT generate_query('find all active users');
-- Result: SELECT * FROM public.users WHERE status = 'active' LIMIT 1000;
-- Search by city
SELECT generate_query('show users from New York');
-- Result: SELECT * FROM public.users WHERE city = 'New York' LIMIT 1000;
-- Price range filtering
SELECT generate_query('find products under $50');
-- Result: SELECT * FROM public.products WHERE price < 50 LIMIT 1000;
-- Age-based filtering
SELECT generate_query('show users between 25 and 35 years old');
-- Result: SELECT * FROM public.users WHERE age BETWEEN 25 AND 35 LIMIT 1000;
Aggregation and Analytics
Counting and Totals
-- Count users by city
SELECT generate_query('count users by city');
-- Result: SELECT city, COUNT(*) as count FROM public.users GROUP BY city LIMIT 1000;
-- Total revenue
SELECT generate_query('calculate total revenue from all orders');
-- Result: SELECT SUM(total_amount) as total_revenue FROM public.orders;
-- Average order value
SELECT generate_query('what is the average order value');
-- Result: SELECT AVG(total_amount) as average_order_value FROM public.orders;
-- Product inventory summary
SELECT generate_query('show total stock quantity by product category');
-- Result: SELECT category, SUM(stock_quantity) as total_stock FROM public.products GROUP BY category LIMIT 1000;
Time-Based Analysis
-- Monthly sales trends
SELECT generate_query('show monthly sales for this year');
-- Result: SELECT DATE_TRUNC('month', order_date) as month, SUM(total_amount) as monthly_sales
-- FROM public.orders WHERE EXTRACT(year FROM order_date) = EXTRACT(year FROM CURRENT_DATE)
-- GROUP BY DATE_TRUNC('month', order_date) ORDER BY month LIMIT 1000;
-- Daily order count
SELECT generate_query('count orders by day for the last 30 days');
-- Result: SELECT DATE(order_date) as order_day, COUNT(*) as order_count
-- FROM public.orders WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
-- GROUP BY DATE(order_date) ORDER BY order_day LIMIT 1000;
-- User registration trends
SELECT generate_query('show user registrations by month');
-- Result: SELECT DATE_TRUNC('month', created_at) as month, COUNT(*) as new_users
-- FROM public.users GROUP BY DATE_TRUNC('month', created_at) ORDER BY month LIMIT 1000;
Complex Joins and Relationships
Customer Order Analysis
-- Orders with customer information
SELECT generate_query('show orders with customer names and emails');
-- Result: SELECT o.id, o.total_amount, o.status, o.order_date, u.name, u.email
-- FROM public.orders o JOIN public.users u ON o.user_id = u.id LIMIT 1000;
-- Customer lifetime value
SELECT generate_query('calculate total spent by each customer');
-- Result: SELECT u.name, u.email, SUM(o.total_amount) as total_spent
-- FROM public.users u LEFT JOIN public.orders o ON u.id = o.user_id
-- GROUP BY u.id, u.name, u.email ORDER BY total_spent DESC LIMIT 1000;
-- Top customers by order count
SELECT generate_query('find customers with the most orders');
-- Result: SELECT u.name, u.email, COUNT(o.id) as order_count
-- FROM public.users u LEFT JOIN public.orders o ON u.id = o.user_id
-- GROUP BY u.id, u.name, u.email ORDER BY order_count DESC LIMIT 1000;
Product and Sales Analysis
-- Best-selling products
SELECT generate_query('show top selling products by quantity');
-- Result: SELECT p.name, SUM(oi.quantity) as total_sold
-- FROM public.products p JOIN public.order_items oi ON p.id = oi.product_id
-- GROUP BY p.id, p.name ORDER BY total_sold DESC LIMIT 1000;
-- Revenue by product category
SELECT generate_query('calculate revenue by product category');
-- Result: SELECT p.category, SUM(oi.quantity * oi.unit_price) as category_revenue
-- FROM public.products p JOIN public.order_items oi ON p.id = oi.product_id
-- GROUP BY p.category ORDER BY category_revenue DESC LIMIT 1000;
-- Detailed order breakdown
SELECT generate_query('show order details with product names and quantities');
-- Result: SELECT o.id, o.order_date, u.name as customer_name, p.name as product_name,
-- oi.quantity, oi.unit_price, (oi.quantity * oi.unit_price) as line_total
-- FROM public.orders o
-- JOIN public.users u ON o.user_id = u.id
-- JOIN public.order_items oi ON o.id = oi.order_id
-- JOIN public.products p ON oi.product_id = p.id LIMIT 1000;
Business Intelligence Queries
Customer Segmentation
-- High-value customers
SELECT generate_query('find customers who have spent more than $1000 total');
-- Result: SELECT u.name, u.email, SUM(o.total_amount) as total_spent
-- FROM public.users u JOIN public.orders o ON u.id = o.user_id
-- GROUP BY u.id, u.name, u.email
-- HAVING SUM(o.total_amount) > 1000 ORDER BY total_spent DESC LIMIT 1000;
-- Customer activity segments
SELECT generate_query('categorize customers by number of orders');
-- Result: SELECT
-- CASE
-- WHEN order_count = 0 THEN 'No Orders'
-- WHEN order_count BETWEEN 1 AND 3 THEN 'Low Activity'
-- WHEN order_count BETWEEN 4 AND 10 THEN 'Medium Activity'
-- ELSE 'High Activity'
-- END as customer_segment,
-- COUNT(*) as customer_count
-- FROM (
-- SELECT u.id, COUNT(o.id) as order_count
-- FROM public.users u LEFT JOIN public.orders o ON u.id = o.user_id
-- GROUP BY u.id
-- ) customer_orders
-- GROUP BY customer_segment LIMIT 1000;
-- Inactive customers
SELECT generate_query('find customers who haven\'t ordered in the last 90 days');
-- Result: SELECT u.name, u.email, MAX(o.order_date) as last_order_date
-- FROM public.users u LEFT JOIN public.orders o ON u.id = o.user_id
-- GROUP BY u.id, u.name, u.email
-- HAVING MAX(o.order_date) < CURRENT_DATE - INTERVAL '90 days'
-- OR MAX(o.order_date) IS NULL LIMIT 1000;
Inventory and Product Analysis
-- Low stock alerts
SELECT generate_query('show products with less than 10 items in stock');
-- Result: SELECT name, stock_quantity FROM public.products WHERE stock_quantity < 10 LIMIT 1000;
-- Product performance metrics
SELECT generate_query('show product sales performance with revenue and quantity sold');
-- Result: SELECT p.name, p.category,
-- COUNT(oi.id) as times_ordered,
-- SUM(oi.quantity) as total_quantity_sold,
-- SUM(oi.quantity * oi.unit_price) as total_revenue
-- FROM public.products p
-- LEFT JOIN public.order_items oi ON p.id = oi.product_id
-- GROUP BY p.id, p.name, p.category
-- ORDER BY total_revenue DESC LIMIT 1000;
-- Seasonal trends
SELECT generate_query('show monthly product sales trends');
-- Result: SELECT p.category,
-- DATE_TRUNC('month', o.order_date) as month,
-- SUM(oi.quantity) as quantity_sold
-- FROM public.products p
-- JOIN public.order_items oi ON p.id = oi.product_id
-- JOIN public.orders o ON oi.order_id = o.id
-- GROUP BY p.category, DATE_TRUNC('month', o.order_date)
-- ORDER BY month, category LIMIT 1000;
Advanced Analytics
Cohort Analysis
-- Customer acquisition by month
SELECT generate_query('show new customer registrations by month with their first order date');
-- Result: SELECT DATE_TRUNC('month', u.created_at) as registration_month,
-- COUNT(*) as new_customers,
-- COUNT(first_order.user_id) as customers_with_orders
-- FROM public.users u
-- LEFT JOIN (
-- SELECT user_id, MIN(order_date) as first_order_date
-- FROM public.orders
-- GROUP BY user_id
-- ) first_order ON u.id = first_order.user_id
-- GROUP BY DATE_TRUNC('month', u.created_at)
-- ORDER BY registration_month LIMIT 1000;
Geographic Analysis
-- Sales by shipping location
SELECT generate_query('show total sales by shipping city');
-- Result: SELECT shipping_city,
-- COUNT(*) as order_count,
-- SUM(total_amount) as total_sales
-- FROM public.orders
-- WHERE shipping_city IS NOT NULL
-- GROUP BY shipping_city
-- ORDER BY total_sales DESC LIMIT 1000;
-- Customer distribution
SELECT generate_query('show customer count and average order value by city');
-- Result: SELECT u.city,
-- COUNT(DISTINCT u.id) as customer_count,
-- COUNT(o.id) as total_orders,
-- AVG(o.total_amount) as avg_order_value
-- FROM public.users u
-- LEFT JOIN public.orders o ON u.id = o.user_id
-- GROUP BY u.city
-- ORDER BY customer_count DESC LIMIT 1000;
Data Quality and Validation
Data Integrity Checks
-- Find orphaned records
SELECT generate_query('find orders without valid customer references');
-- Result: SELECT o.* FROM public.orders o
-- LEFT JOIN public.users u ON o.user_id = u.id
-- WHERE u.id IS NULL LIMIT 1000;
-- Missing email addresses
SELECT generate_query('find users with missing or invalid email addresses');
-- Result: SELECT * FROM public.users
-- WHERE email IS NULL OR email = '' OR email NOT LIKE '%@%' LIMIT 1000;
-- Duplicate detection
SELECT generate_query('find duplicate user email addresses');
-- Result: SELECT email, COUNT(*) as count
-- FROM public.users
-- GROUP BY email
-- HAVING COUNT(*) > 1 LIMIT 1000;
Data Range Validation
-- Unusual values
SELECT generate_query('find orders with negative amounts');
-- Result: SELECT * FROM public.orders WHERE total_amount < 0 LIMIT 1000;
-- Age validation
SELECT generate_query('find users with unrealistic ages');
-- Result: SELECT * FROM public.users WHERE age < 0 OR age > 120 LIMIT 1000;
-- Future dates
SELECT generate_query('find orders with future dates');
-- Result: SELECT * FROM public.orders WHERE order_date > CURRENT_TIMESTAMP LIMIT 1000;
Performance Monitoring
Database Statistics
-- Table sizes and row counts
SELECT generate_query('show row count for each table');
-- Result: This will generate appropriate queries to count rows in each table
-- Activity monitoring
SELECT generate_query('show order activity by hour of day');
-- Result: SELECT EXTRACT(hour FROM order_date) as hour_of_day,
-- COUNT(*) as order_count
-- FROM public.orders
-- GROUP BY EXTRACT(hour FROM order_date)
-- ORDER BY hour_of_day LIMIT 1000;
Tips for Better Results
1. Be Specific
-- Instead of: "show sales"
SELECT generate_query('show total sales amount by month for 2024');
-- Instead of: "find customers"
SELECT generate_query('find customers with more than 5 orders and total spent over $500');
2. Use Clear Time References
-- Good time references:
SELECT generate_query('show orders from last 30 days');
SELECT generate_query('find users registered between January 1, 2024 and March 31, 2024');
SELECT generate_query('show sales for Q1 2024');
3. Specify Sorting and Limits
-- Include sorting preferences:
SELECT generate_query('show top 10 customers by total revenue');
SELECT generate_query('list products ordered by price ascending');
SELECT generate_query('show recent orders sorted by date descending');
4. Reference Column Names When Needed
-- When ambiguous, specify column names:
SELECT generate_query('find users where user status is active');
SELECT generate_query('show products where stock quantity is less than reorder level');
These examples demonstrate the flexibility and power of pg_ai_query for various database tasks, from simple queries to complex business intelligence analysis.
Enhanced Response Formatting Examples
The following examples show how different response configuration options affect the output:
Example: Enhanced Text Response
Configuration:
[response]
show_explanation = true
show_warnings = true
show_suggested_visualization = true
use_formatted_response = false
Query:
SELECT generate_query('monthly sales trend for the last year with top performing categories');
Enhanced Response:
SELECT
DATE_TRUNC('month', o.order_date) as month,
p.category,
SUM(o.total_amount) as monthly_sales,
COUNT(o.id) as order_count
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY DATE_TRUNC('month', o.order_date), p.category
ORDER BY month DESC, monthly_sales DESC
LIMIT 1000;
-- Explanation:
-- Analyzes monthly sales trends over the past year, broken down by product category.
-- Shows total sales amount and order count for each category per month,
-- helping identify seasonal patterns and top-performing categories over time.
-- Warnings:
-- 1. Multiple JOINs: Query involves 3 table joins which may impact performance on large datasets
-- 2. Date calculation: Uses CURRENT_DATE which may give different results when run at different times
-- 3. Index recommendation: Consider indexes on order_date, order_id, and product_id for optimal performance
-- Suggested Visualization: line
-- Reasoning: Time series data showing trends over months is best displayed as line charts,
-- with separate lines for each category to compare performance patterns
Example: JSON Response
Configuration:
[response]
show_explanation = true
show_warnings = true
show_suggested_visualization = true
use_formatted_response = true
Query:
SELECT generate_query('top 10 customers by lifetime value');
JSON Response:
{
"query": "SELECT u.id, u.name, u.email, SUM(o.total_amount) as lifetime_value, COUNT(o.id) as total_orders FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.id, u.name, u.email ORDER BY lifetime_value DESC LIMIT 10;",
"success": true,
"explanation": "Identifies the top 10 customers based on their total lifetime purchase value. Shows customer details along with their total spending and number of orders, useful for customer relationship management and targeted marketing.",
"warnings": [
"Customer segmentation: Consider including date ranges for more meaningful lifetime value calculations",
"Performance: Large customer and orders tables may require optimization with proper indexing"
],
"suggested_visualization": "bar",
"row_limit_applied": true
}
Example: Minimal Response
Configuration:
[response]
show_explanation = false
show_warnings = false
show_suggested_visualization = false
use_formatted_response = false
Query:
SELECT generate_query('all products in electronics category');
Minimal Response:
SELECT * FROM products WHERE category = 'electronics' LIMIT 1000;
Example: Warnings Only
Configuration:
[response]
show_explanation = false
show_warnings = true
show_suggested_visualization = false
use_formatted_response = false
Query:
SELECT generate_query('delete old order records from 2020');
Response with Warnings:
DELETE FROM orders WHERE EXTRACT(year FROM order_date) = 2020;
-- Warnings:
-- 1. DESTRUCTIVE OPERATION: This DELETE statement will permanently remove data
-- 2. No transaction control: Consider wrapping in BEGIN/COMMIT for safety
-- 3. Foreign key impact: Deleting orders may affect order_items table if cascade is not set
-- 4. Backup recommendation: Ensure you have recent backups before executing
-- 5. Row count check: Consider SELECT COUNT(*) first to verify the number of rows that will be deleted
Working with JSON Responses in Applications
Python Example:
import psycopg2
import json
# Connect to database
conn = psycopg2.connect("dbname=mydb user=myuser")
cur = conn.cursor()
# Execute query with JSON response
cur.execute("SELECT generate_query('monthly sales by product category')")
response_json = cur.fetchone()[0]
# Parse JSON response
response = json.loads(response_json)
if response['success']:
# Extract query and execute it
data_query = response['query']
cur.execute(data_query)
results = cur.fetchall()
# Use metadata for visualization
viz_type = response.get('suggested_visualization', 'table')
explanation = response.get('explanation', '')
warnings = response.get('warnings', [])
print(f"Query explanation: {explanation}")
print(f"Suggested visualization: {viz_type}")
if warnings:
print("Warnings:")
for warning in warnings:
print(f" - {warning}")
# Process results based on suggested visualization
if viz_type == 'line':
# Create line chart
create_line_chart(results)
elif viz_type == 'bar':
# Create bar chart
create_bar_chart(results)
else:
# Display as table
display_table(results)
These enhanced formatting examples show how the new configuration options provide flexible control over response detail and format, making the extension suitable for both interactive use and application integration.
Schema Discovery
The pg_ai_query extension includes an intelligent schema discovery system that automatically analyzes your PostgreSQL database structure to provide context for AI-powered query generation. This page explains how schema discovery works and how you can optimize it.
Overview
Schema discovery is the process by which the extension examines your database to understand:
- Tables and Views: What data structures exist
- Columns and Data Types: What fields are available and their types
- Relationships: How tables connect through foreign keys
- Constraints: Primary keys, unique constraints, and check constraints
- Indexes: Available indexes for query optimization
- Statistics: Row counts and table sizes for context
This information is then used to generate accurate, well-structured SQL queries from your natural language requests.
How Schema Discovery Works
1. Automatic Triggering
Schema discovery happens automatically when:
- The extension is first used in a PostgreSQL session
- You explicitly call schema discovery functions
- The AI needs context to generate a query
-- These actions trigger schema discovery:
SELECT generate_query('show users'); -- Automatic discovery
SELECT get_database_tables(); -- Explicit discovery
SELECT get_table_details('users'); -- Detailed table analysis
2. Discovery Process
The extension follows this process:
Step 1: Table Enumeration
-- Extension queries information_schema to find user tables
-- Equivalent to:
SELECT table_name, table_schema, table_type
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema', 'pg_catalog', 'pg_toast');
Step 2: Column Analysis
-- For each table, analyze columns
-- Equivalent to:
SELECT column_name, data_type, is_nullable, column_default
FROM information_schema.columns
WHERE table_name = 'your_table';
Step 3: Constraint Discovery
-- Find primary keys, foreign keys, and unique constraints
-- Equivalent to:
SELECT constraint_name, constraint_type, column_name
FROM information_schema.table_constraints tc
JOIN information_schema.constraint_column_usage ccu
ON tc.constraint_name = ccu.constraint_name;
Step 4: Relationship Mapping
-- Discover foreign key relationships
-- Equivalent to:
SELECT
tc.table_name,
kcu.column_name,
ccu.table_name AS foreign_table_name,
ccu.column_name AS foreign_column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
ON ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY';
Step 5: Statistics Collection
-- Gather table statistics for optimization
-- Equivalent to:
SELECT
schemaname,
tablename,
n_tup_ins,
n_tup_upd,
n_tup_del,
n_live_tup
FROM pg_stat_user_tables;
3. Information Processing
The collected information is then:
- Structured into JSON format for AI consumption
- Cached in the PostgreSQL session for performance
- Filtered to exclude system tables and sensitive information
- Contextualized with business-friendly descriptions
Schema Discovery Functions
get_database_tables()
Returns high-level information about all tables in your database.
Usage:
SELECT get_database_tables();
Example Output:
[
{
"table_name": "users",
"schema_name": "public",
"table_type": "BASE TABLE",
"estimated_rows": 1500,
"table_size": "128 kB"
},
{
"table_name": "orders",
"schema_name": "public",
"table_type": "BASE TABLE",
"estimated_rows": 5000,
"table_size": "512 kB"
}
]
Information Provided:
- Table names and schemas
- Table type (BASE TABLE, VIEW, etc.)
- Estimated row counts
- Physical table sizes
get_table_details()
Returns comprehensive information about a specific table.
Usage:
SELECT get_table_details('table_name', 'schema_name');
Example Output:
{
"table_name": "users",
"schema_name": "public",
"columns": [
{
"column_name": "id",
"data_type": "integer",
"is_nullable": false,
"column_default": "nextval('users_id_seq'::regclass)",
"is_primary_key": true,
"is_unique": true
},
{
"column_name": "email",
"data_type": "character varying",
"character_maximum_length": 150,
"is_nullable": false,
"is_unique": true
},
{
"column_name": "created_at",
"data_type": "timestamp without time zone",
"is_nullable": true,
"column_default": "CURRENT_TIMESTAMP"
}
],
"constraints": [
{
"constraint_name": "users_pkey",
"constraint_type": "PRIMARY KEY",
"column_names": ["id"]
}
],
"foreign_keys": [],
"indexes": [
{
"index_name": "users_email_idx",
"columns": ["email"],
"is_unique": true
}
]
}
Optimizing Schema Discovery
1. Table and Column Naming
Use clear, descriptive names that help the AI understand your data:
Good Examples:
-- Clear table names
CREATE TABLE customer_orders (...);
CREATE TABLE product_categories (...);
CREATE TABLE user_preferences (...);
-- Descriptive column names
CREATE TABLE users (
user_id SERIAL PRIMARY KEY,
full_name VARCHAR(100),
email_address VARCHAR(150),
registration_date TIMESTAMP,
account_status VARCHAR(20)
);
Less Optimal:
-- Unclear abbreviations
CREATE TABLE usr (...);
CREATE TABLE prod_cat (...);
-- Ambiguous column names
CREATE TABLE users (
id INT,
nm VARCHAR(100),
em VARCHAR(150),
dt TIMESTAMP,
st VARCHAR(20)
);
2. Adding Table Comments
Enhance schema discovery with descriptive comments:
-- Table-level comments
COMMENT ON TABLE users IS 'Customer information and account details';
COMMENT ON TABLE orders IS 'Customer purchase orders and transaction history';
COMMENT ON TABLE products IS 'Product catalog with pricing and inventory';
-- Column-level comments
COMMENT ON COLUMN users.account_status IS 'User account status: active, inactive, suspended';
COMMENT ON COLUMN orders.fulfillment_status IS 'Order processing status: pending, shipped, delivered, cancelled';
3. Proper Constraint Definition
Well-defined constraints help the AI understand data relationships:
-- Primary keys
ALTER TABLE users ADD CONSTRAINT users_pkey PRIMARY KEY (user_id);
-- Foreign keys with descriptive names
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer
FOREIGN KEY (customer_id) REFERENCES users(user_id);
-- Meaningful unique constraints
ALTER TABLE users ADD CONSTRAINT unique_user_email
UNIQUE (email_address);
-- Check constraints with business logic
ALTER TABLE orders ADD CONSTRAINT check_order_amount_positive
CHECK (total_amount > 0);
4. Strategic Indexing
Indexes help the AI understand query optimization opportunities:
-- Indexes on frequently queried columns
CREATE INDEX idx_users_registration_date ON users(registration_date);
CREATE INDEX idx_orders_order_date ON orders(order_date);
-- Composite indexes for common query patterns
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);
Schema Discovery Best Practices
1. Database Design
- Normalized Structure: Use proper normalization to make relationships clear
- Consistent Naming: Follow consistent naming conventions across tables
- Foreign Key Relationships: Always define foreign key constraints
- Meaningful Defaults: Use sensible default values where appropriate
2. Documentation
-- Document complex business rules
COMMENT ON TABLE order_items IS 'Individual line items within customer orders, linked to product catalog';
COMMENT ON COLUMN products.discontinued_date IS 'Date when product was discontinued; NULL means still active';
3. Data Types
Choose appropriate data types that convey meaning:
-- Good: Specific types
CREATE TABLE events (
event_date DATE, -- Clearly a date
event_time TIME, -- Clearly a time
duration_minutes INTEGER, -- Clearly numeric
event_type VARCHAR(50), -- Clearly textual with reasonable limit
is_public BOOLEAN -- Clearly boolean
);
-- Less clear: Generic types
CREATE TABLE events (
event_date TEXT, -- Could be anything
event_time TEXT, -- Ambiguous format
duration_minutes TEXT, -- Should be numeric
event_type TEXT, -- No length constraint
is_public TEXT -- Should be boolean
);
4. Schema Organization
Organize related tables logically:
-- Group related tables with consistent prefixes
CREATE TABLE user_accounts (...);
CREATE TABLE user_preferences (...);
CREATE TABLE user_sessions (...);
CREATE TABLE product_catalog (...);
CREATE TABLE product_categories (...);
CREATE TABLE product_reviews (...);
CREATE TABLE order_headers (...);
CREATE TABLE order_items (...);
CREATE TABLE order_shipments (...);
Performance Considerations
Schema Discovery Caching
- Session-level Caching: Schema information is cached per PostgreSQL session
- Cache Invalidation: Schema cache is cleared when the session ends
- Refresh Triggers: Schema is re-analyzed if tables are modified
Optimization Tips
- Minimize Table Count: Large numbers of tables slow down discovery
- Use Views: Create views to present simplified interfaces to complex schemas
- Partition Strategy: Consider table partitioning for very large tables
- Statistics Updates: Keep table statistics current with
ANALYZE
-- Update statistics for better schema discovery
ANALYZE users;
ANALYZE orders;
-- Or update all tables
ANALYZE;
Security Considerations
What Schema Discovery Accesses
✅ Accessed Information:
- User table structures
- Column names and types
- Constraint definitions
- Index information
- Table statistics
❌ Not Accessed:
- Actual data content
- System catalog details
- User passwords or sensitive data
- Database configuration
Privacy Protection
The schema discovery system:
- Only reads metadata, never actual data
- Respects PostgreSQL’s permission system
- Excludes system tables and schemas
- Never transmits sensitive information to external APIs
Troubleshooting Schema Discovery
Common Issues
Problem: “No tables found”
-- Check if tables exist
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'public' AND table_type = 'BASE TABLE';
-- Verify permissions
SELECT has_table_privilege('users', 'SELECT');
Problem: Missing relationships
-- Check foreign key constraints
SELECT
tc.table_name,
kcu.column_name,
ccu.table_name AS foreign_table_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY';
Problem: Outdated statistics
-- Update table statistics
ANALYZE your_table_name;
-- Check last analyze time
SELECT schemaname, tablename, last_analyze
FROM pg_stat_user_tables;
Schema discovery is a powerful feature that makes your database “AI-ready” by providing the context needed for intelligent query generation. Following these best practices will help you get the most accurate and useful results from the pg_ai_query extension.
AI Providers
The pg_ai_query extension supports multiple AI providers, each with different models, capabilities, and pricing structures. This guide helps you choose the right provider and model for your use case.
Supported Providers
OpenAI
Overview: OpenAI provides the GPT family of models, known for strong natural language understanding and code generation capabilities.
API Endpoint: https://api.openai.com/v1/
Available Models:
| Model | Description | Context Length | Best For |
|---|---|---|---|
gpt-4o | Latest GPT-4 Omni model | 128,000 tokens | Complex queries, best accuracy |
gpt-4 | Standard GPT-4 model | 8,192 tokens | High-quality SQL generation |
gpt-3.5-turbo | Fast and efficient model | 4,096 tokens | Simple queries, fast responses |
Anthropic
Overview: Anthropic provides Claude models, known for thoughtful reasoning and safety-conscious outputs.
API Endpoint: https://api.anthropic.com/v1/
Available Models:
| Model | Description | Context Length | Best For |
|---|---|---|---|
claude-3-5-sonnet-20241022 | Latest Claude 3.5 Sonnet | 200,000 tokens | Complex analysis, detailed reasoning |
Provider Selection
Automatic Provider Selection
When you use provider = 'auto' (the default), the extension selects a provider based on:
- Configuration Priority: First configured provider with a valid API key
- Fallback Logic: If primary provider fails, tries others
- Default Order: OpenAI → Anthropic
-- Uses automatic provider selection
SELECT generate_query('show recent orders');
Explicit Provider Selection
You can specify which provider to use:
-- Force OpenAI
SELECT generate_query('show users', null, 'openai');
-- Force Anthropic
SELECT generate_query('show users', null, 'anthropic');
Configuration-Based Selection
Set your preferred provider in the configuration file:
[general]
# Provider priority is determined by order in config file
[openai]
api_key = "your-openai-key"
default_model = "gpt-4o"
[anthropic]
api_key = "your-anthropic-key"
default_model = "claude-3-5-sonnet-20241022"
Provider Comparison
Performance Characteristics
| Aspect | OpenAI GPT-4o | OpenAI GPT-3.5 | Anthropic Claude |
|---|---|---|---|
| Response Time | 2-5 seconds | 1-2 seconds | 3-6 seconds |
| Accuracy | Excellent | Good | Excellent |
| Complex Queries | Excellent | Fair | Excellent |
| SQL Knowledge | Excellent | Good | Very Good |
| Cost | High | Low | Medium |
Use Case Recommendations
For Simple Queries
Recommended: OpenAI GPT-3.5-turbo
-- Simple data retrieval
SELECT generate_query('show all users', null, 'openai');
-- Configuration: default_model = "gpt-3.5-turbo"
Why: Fast, cost-effective, sufficient accuracy for basic queries.
For Complex Analytics
Recommended: OpenAI GPT-4o or Anthropic Claude
-- Complex business intelligence
SELECT generate_query(
'calculate monthly customer retention rates with cohort analysis',
null,
'openai'
);
-- Configuration: default_model = "gpt-4o"
Why: Better reasoning for complex multi-table joins and advanced analytics.
For Data Exploration
Recommended: Anthropic Claude
-- Exploratory analysis
SELECT generate_query(
'analyze sales trends and identify anomalies in the last quarter',
null,
'anthropic'
);
Why: Excellent at understanding context and generating insightful queries.
For Production Systems
Recommended: OpenAI GPT-4 (balanced performance/cost)
-- Production queries
SELECT generate_query('generate daily sales report', null, 'openai');
-- Configuration: default_model = "gpt-4"
Why: Good balance of accuracy, speed, and cost for production workloads.
Model-Specific Configurations
OpenAI Configuration
[openai]
api_key = "sk-proj-your-api-key-here"
default_model = "gpt-4o"
# Optional: Model-specific settings (future feature)
# gpt_4o_temperature = 0.7
# gpt_4o_max_tokens = 16384
API Key Format: Must start with sk-proj- or sk-
Getting Started:
- Visit platform.openai.com
- Create an account and add billing information
- Generate an API key
- Add to configuration file
Anthropic Configuration
[anthropic]
api_key = "sk-ant-your-api-key-here"
default_model = "claude-3-5-sonnet-20241022"
# Optional: Model-specific settings (future feature)
# claude_temperature = 0.7
# claude_max_tokens = 8192
API Key Format: Must start with sk-ant-
Getting Started:
- Visit console.anthropic.com
- Create an account and add credits
- Generate an API key
- Add to configuration file
Cost Considerations
OpenAI Pricing (Approximate)
| Model | Input Cost | Output Cost | Typical Query Cost |
|---|---|---|---|
| GPT-4o | $2.50/1M tokens | $10.00/1M tokens | $0.02-0.05 |
| GPT-4 | $30.00/1M tokens | $60.00/1M tokens | $0.10-0.20 |
| GPT-3.5-turbo | $0.50/1M tokens | $1.50/1M tokens | $0.001-0.005 |
Anthropic Pricing (Approximate)
| Model | Input Cost | Output Cost | Typical Query Cost |
|---|---|---|---|
| Claude 3.5 Sonnet | $3.00/1M tokens | $15.00/1M tokens | $0.03-0.06 |
Prices are approximate and change frequently. Check provider websites for current pricing.
Cost Optimization Strategies
-
Use Appropriate Models
-- For simple queries, use cheaper models SELECT generate_query('count users', null, 'openai'); -- Set: default_model = "gpt-3.5-turbo" -
Efficient Query Descriptions
-- Be specific to reduce back-and-forth SELECT generate_query('show users with orders in last 30 days including total order value'); -
Batch Similar Queries
-- Instead of multiple calls: -- SELECT generate_query('show users'); -- SELECT generate_query('show user count'); -- Use one comprehensive query: SELECT generate_query('show user list with total count');
Provider Reliability and Error Handling
Automatic Fallback
The extension includes automatic fallback logic:
-- If OpenAI fails, automatically tries Anthropic
SELECT generate_query('show data', null, 'auto');
Manual Retry with Different Provider
-- If one provider fails, try another
SELECT generate_query('complex query', null, 'openai');
-- If that fails:
SELECT generate_query('complex query', null, 'anthropic');
Error Handling Configuration
[general]
max_retries = 3 # Retry failed requests
request_timeout_ms = 30000 # Timeout for each request
Advanced Provider Features
Rate Limiting
Each provider has different rate limits:
OpenAI:
- GPT-3.5: 3,500 requests/minute
- GPT-4: 500 requests/minute (varies by tier)
Anthropic:
- Claude: 1,000 requests/minute (varies by tier)
Context Window Optimization
For large schemas, consider context window sizes:
-- Large schema? Use models with larger context windows
-- GPT-4o: 128k tokens
-- Claude 3.5: 200k tokens
SELECT generate_query('analyze relationships across all tables', null, 'anthropic');
Regional Considerations
OpenAI: Global availability with regional data centers Anthropic: Primary US-based with expanding regional support
Provider-Specific Best Practices
OpenAI Best Practices
-
Model Selection:
- Use GPT-3.5-turbo for development and testing
- Use GPT-4 for production workloads
- Use GPT-4o for complex analytical queries
-
Prompt Engineering:
- Be specific about desired output format
- Include relevant business context
- Specify PostgreSQL version if using advanced features
Anthropic Best Practices
-
Leveraging Reasoning:
- Claude excels at understanding business logic
- Good for queries requiring domain expertise
- Effective for complex multi-step analysis
-
Safety Features:
- Claude has built-in safety considerations
- Good for production environments with compliance needs
- Transparent about limitations and assumptions
Future Provider Support
The extension is designed to easily support additional providers:
- Google (Gemini): Planned support
- Azure OpenAI: Possible future integration
- Local Models: Exploring support for self-hosted models
Monitoring Provider Performance
Enable Logging
[general]
enable_logging = true
log_level = "INFO"
Track Performance
-- Monitor which provider is being used
-- Check logs for provider selection and response times
Performance Metrics
Key metrics to monitor:
- Response Time: How long queries take to generate
- Success Rate: Percentage of successful generations
- Cost: Monthly API usage costs
- Quality: Accuracy of generated queries
Choose your AI provider based on your specific needs: cost sensitivity, query complexity, response time requirements, and accuracy needs. The extension’s flexible provider system allows you to optimize for your particular use case.
Performance & Best Practices
This guide provides recommendations for using pg_ai_query effectively in production environments, optimizing performance, and following security best practices.
Performance Optimization
Database Design for AI Query Generation
1. Schema Design
Use Descriptive Naming
-- Good: Clear, business-friendly names
CREATE TABLE customer_orders (
order_id SERIAL PRIMARY KEY,
customer_id INTEGER,
order_date TIMESTAMP,
total_amount DECIMAL(10,2),
order_status VARCHAR(20)
);
-- Avoid: Cryptic abbreviations
CREATE TABLE cust_ord (
id INT,
cid INT,
dt TIMESTAMP,
amt DECIMAL,
st VARCHAR
);
Proper Relationships
-- Always define foreign key relationships
ALTER TABLE customer_orders
ADD CONSTRAINT fk_customer_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers(customer_id);
-- Add meaningful constraints
ALTER TABLE customer_orders
ADD CONSTRAINT check_positive_amount CHECK (total_amount > 0);
Strategic Indexing
-- Index frequently queried columns
CREATE INDEX idx_orders_date ON customer_orders(order_date);
CREATE INDEX idx_orders_customer ON customer_orders(customer_id);
CREATE INDEX idx_orders_status ON customer_orders(order_status);
-- Composite indexes for common query patterns
CREATE INDEX idx_orders_customer_date
ON customer_orders(customer_id, order_date);
2. Data Types and Constraints
Choose Appropriate Data Types
-- Use specific types that convey meaning
CREATE TABLE products (
product_id SERIAL PRIMARY KEY,
product_name VARCHAR(200) NOT NULL, -- Limited length
category_id INTEGER, -- Clear numeric ID
price DECIMAL(10,2), -- Exact decimal for money
is_active BOOLEAN DEFAULT true, -- Clear boolean
created_date DATE, -- Date only, not timestamp
last_updated TIMESTAMP DEFAULT NOW() -- Full timestamp for updates
);
Add Table and Column Comments
COMMENT ON TABLE products IS 'Product catalog with pricing and inventory information';
COMMENT ON COLUMN products.category_id IS 'References product_categories.category_id';
COMMENT ON COLUMN products.is_active IS 'false = discontinued product, true = available for sale';
Query Generation Performance
1. Optimize Natural Language Requests
Be Specific and Clear
-- Good: Specific request
SELECT generate_query('show customers who placed orders in the last 30 days with their total order value');
-- Less optimal: Vague request
SELECT generate_query('show customer stuff');
Include Context When Needed
-- For complex schemas, provide business context
SELECT generate_query('show quarterly revenue trends by product category for fiscal year 2024');
-- Specify table relationships when ambiguous
SELECT generate_query('show orders joined with customer information including customer names and emails');
2. Schema Size Optimization
Limit Table Discovery Scope
-- For very large schemas, use specific schemas
-- Configure to focus on business tables only
SELECT generate_query('show sales data from last month');
Use Views for Complex Schemas
-- Create business-friendly views
CREATE VIEW sales_summary AS
SELECT
o.order_id,
c.customer_name,
o.order_date,
o.total_amount,
p.product_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id;
-- AI can then reference the simplified view
SELECT generate_query('show recent sales from sales_summary view');
3. Caching and Session Management
Understand Caching Behavior
- Schema information is cached per PostgreSQL session
- Reconnecting clears the cache and triggers re-analysis
- Keep sessions alive for better performance
Session Management
# Python example: Reuse connections
import psycopg2
from psycopg2 import pool
# Create connection pool
connection_pool = psycopg2.pool.SimpleConnectionPool(1, 20, database="mydb")
def generate_ai_query(natural_query):
conn = connection_pool.getconn()
try:
cursor = conn.cursor()
cursor.execute("SELECT generate_query(%s)", (natural_query,))
return cursor.fetchone()[0]
finally:
connection_pool.putconn(conn)
Configuration Optimization
1. Provider and Model Selection
Choose Appropriate Models
# For production: Balance speed and accuracy
[openai]
api_key = "your-key"
default_model = "gpt-4" # Good balance
# For development: Use faster models
[openai]
default_model = "gpt-3.5-turbo" # Faster, cheaper
Configure Timeouts
[general]
request_timeout_ms = 45000 # Increase for complex schemas
max_retries = 5 # Increase for production reliability
2. Logging Configuration
Production Logging
[general]
enable_logging = false # Disable for performance
enable_postgresql_elog = true # Use PostgreSQL logging only
Development Logging
[general]
enable_logging = true
log_level = "INFO" # or "DEBUG" for troubleshooting
Security Best Practices
1. API Key Security
Secure Storage
# Set proper file permissions
chmod 600 ~/.pg_ai.config
# Ensure correct ownership
chown postgres:postgres ~/.pg_ai.config # For PostgreSQL user
Configuration Files Only
# Use secure configuration files
[openai]
api_key = "your-key-here"
Key Rotation
# Regularly rotate API keys
[openai]
api_key = "new-rotated-key"
2. Database Security
Principle of Least Privilege
-- Create dedicated user for AI queries
CREATE USER ai_query_user WITH PASSWORD 'secure_password';
-- Grant only necessary permissions
GRANT SELECT ON ALL TABLES IN SCHEMA public TO ai_query_user;
GRANT USAGE ON SCHEMA public TO ai_query_user;
-- Revoke dangerous permissions
REVOKE INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public FROM ai_query_user;
Schema Isolation
-- Create separate schema for AI-accessible views
CREATE SCHEMA ai_views;
-- Create simplified, safe views
CREATE VIEW ai_views.customer_summary AS
SELECT
customer_id,
customer_name,
city,
registration_date
FROM customers
WHERE is_active = true; -- Only show active customers
-- Grant access only to AI schema
GRANT USAGE ON SCHEMA ai_views TO ai_query_user;
GRANT SELECT ON ALL TABLES IN SCHEMA ai_views TO ai_query_user;
3. Query Validation
Always Review Generated Queries
def safe_ai_query_execution(natural_query):
# Generate the query
generated_sql = generate_query(natural_query)
# Log for audit
logger.info(f"Generated SQL: {generated_sql}")
# Basic safety checks
if any(keyword in generated_sql.upper() for keyword in ['DROP', 'DELETE', 'TRUNCATE']):
raise SecurityError("Potentially dangerous SQL generated")
# Execute with read-only user
return execute_with_readonly_user(generated_sql)
Implement Query Whitelisting
-- Create function to validate query patterns
CREATE OR REPLACE FUNCTION validate_ai_query(query_text TEXT)
RETURNS BOOLEAN AS $$
BEGIN
-- Only allow SELECT statements
IF NOT query_text ILIKE 'SELECT%' THEN
RETURN FALSE;
END IF;
-- Block system table access
IF query_text ILIKE '%pg_%' OR query_text ILIKE '%information_schema%' THEN
RETURN FALSE;
END IF;
RETURN TRUE;
END;
$$ LANGUAGE plpgsql;
4. Audit and Monitoring
Query Logging
import logging
# Set up audit logging
audit_logger = logging.getLogger('ai_query_audit')
audit_handler = logging.FileHandler('/var/log/ai_queries.log')
audit_logger.addHandler(audit_handler)
def audited_generate_query(natural_query, user_id):
audit_logger.info(f"User {user_id} requested: {natural_query}")
result = generate_query(natural_query)
audit_logger.info(f"Generated for user {user_id}: {result}")
return result
Monitor API Usage
# Enable logging to monitor API costs
[general]
enable_logging = true
log_level = "INFO"
Production Deployment
1. Architecture Patterns
Application-Level Integration
class AIQueryService:
def __init__(self):
self.connection_pool = create_pool()
self.cache = {}
def generate_query(self, natural_query, user_context=None):
# Add user context to improve query generation
contextual_query = f"{natural_query}"
if user_context:
contextual_query += f" for user role {user_context['role']}"
return self._execute_ai_query(contextual_query)
def _execute_ai_query(self, query):
# Implement caching, security checks, etc.
pass
Microservice Pattern
# Docker Compose example
version: '3.8'
services:
ai-query-service:
image: your-app:latest
environment:
- DATABASE_URL=postgresql://ai_user:pass@db:5432/mydb
volumes:
- ./pg_ai_config:/etc/pg_ai
depends_on:
- database
database:
image: postgres:14
environment:
- POSTGRES_DB=mydb
volumes:
- postgres_data:/var/lib/postgresql/data
2. Error Handling
Graceful Degradation
def robust_query_generation(natural_query, fallback_provider=None):
try:
return generate_query(natural_query)
except APIError as e:
if fallback_provider:
logger.warning(f"Primary provider failed: {e}, trying {fallback_provider}")
return generate_query(natural_query, provider=fallback_provider)
else:
logger.error(f"Query generation failed: {e}")
return None
except Exception as e:
logger.error(f"Unexpected error: {e}")
return None
Retry Logic
import time
from functools import wraps
def retry_with_backoff(max_retries=3, base_delay=1):
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
for attempt in range(max_retries):
try:
return func(*args, **kwargs)
except Exception as e:
if attempt == max_retries - 1:
raise
delay = base_delay * (2 ** attempt)
time.sleep(delay)
return None
return wrapper
return decorator
@retry_with_backoff(max_retries=3)
def generate_query_with_retry(natural_query):
return generate_query(natural_query)
3. Monitoring and Alerting
Key Metrics to Monitor
- Query generation success rate
- Average response time
- API cost per query
- Error rates by provider
- Schema discovery performance
Alerting Setup
import prometheus_client
# Prometheus metrics
query_generation_duration = prometheus_client.Histogram(
'ai_query_generation_duration_seconds',
'Time spent generating AI queries'
)
query_generation_errors = prometheus_client.Counter(
'ai_query_generation_errors_total',
'Total number of query generation errors',
['provider', 'error_type']
)
@query_generation_duration.time()
def monitored_generate_query(natural_query):
try:
return generate_query(natural_query)
except Exception as e:
query_generation_errors.labels(
provider='openai',
error_type=type(e).__name__
).inc()
raise
Cost Management
1. Usage Optimization
Request Optimization
def optimize_query_request(natural_query):
# Cache common queries
cache_key = hashlib.md5(natural_query.encode()).hexdigest()
if cache_key in query_cache:
return query_cache[cache_key]
# Use cheaper model for simple queries
if is_simple_query(natural_query):
result = generate_query(natural_query, provider='openai', model='gpt-3.5-turbo')
else:
result = generate_query(natural_query, provider='openai', model='gpt-4')
query_cache[cache_key] = result
return result
def is_simple_query(query):
simple_patterns = ['show', 'list', 'count', 'find']
return any(pattern in query.lower() for pattern in simple_patterns)
2. Budget Controls
API Cost Tracking
class CostTracker:
def __init__(self, monthly_budget=100):
self.monthly_budget = monthly_budget
self.current_spend = 0
def track_request(self, provider, model, tokens_used):
cost = calculate_cost(provider, model, tokens_used)
self.current_spend += cost
if self.current_spend > self.monthly_budget * 0.9:
logger.warning("Approaching monthly budget limit")
if self.current_spend > self.monthly_budget:
raise BudgetExceededException("Monthly budget exceeded")
3. Usage Analytics
Query Pattern Analysis
-- Analyze query patterns for optimization
CREATE TABLE ai_query_log (
id SERIAL PRIMARY KEY,
user_id INTEGER,
natural_query TEXT,
generated_sql TEXT,
provider VARCHAR(20),
model VARCHAR(50),
execution_time_ms INTEGER,
created_at TIMESTAMP DEFAULT NOW()
);
-- Find most common query types
SELECT
LEFT(natural_query, 50) as query_pattern,
COUNT(*) as frequency,
AVG(execution_time_ms) as avg_time
FROM ai_query_log
GROUP BY LEFT(natural_query, 50)
ORDER BY frequency DESC;
Following these best practices will help you deploy pg_ai_query successfully in production environments while maintaining security, performance, and cost efficiency.
Integration Patterns
This guide covers common patterns for integrating pg_ai_query with applications, development workflows, and data analysis tools.
Application Integration
Python Integration
import psycopg2
import json
class NaturalLanguageDB:
def __init__(self, connection_string):
self.conn = psycopg2.connect(connection_string)
def query(self, description, execute=False):
"""Generate and optionally execute a query from natural language"""
cur = self.conn.cursor()
# Generate the query
cur.execute("SELECT generate_query(%s)", (description,))
query = cur.fetchone()[0]
if execute:
# Execute the generated query
cur.execute(query)
results = cur.fetchall()
return query, results
else:
return query, None
def analyze_performance(self, query):
"""Analyze query performance"""
cur = self.conn.cursor()
cur.execute("SELECT explain_query(%s)", (query,))
return cur.fetchone()[0]
# Usage example
db = NaturalLanguageDB("postgresql://user:pass@localhost/dbname")
# Generate a query
query, _ = db.query("show top customers by revenue")
print(f"Generated: {query}")
# Execute and get results
query, results = db.query("count orders by status", execute=True)
print(f"Results: {results}")
# Analyze performance
analysis = db.analyze_performance(query)
print(f"Performance: {analysis}")
Node.js Integration
const { Pool } = require('pg');
class AIQueryClient {
constructor(config) {
this.pool = new Pool(config);
}
async generateQuery(description, apiKey = null, provider = 'auto') {
const client = await this.pool.connect();
try {
const result = await client.query(
'SELECT generate_query($1, $2, $3)',
[description, apiKey, provider]
);
return result.rows[0].generate_query;
} finally {
client.release();
}
}
async executeNaturalQuery(description) {
const query = await this.generateQuery(description);
const client = await this.pool.connect();
try {
const result = await client.query(query);
return {
query: query,
results: result.rows,
rowCount: result.rowCount
};
} finally {
client.release();
}
}
async analyzeQuery(query) {
const client = await this.pool.connect();
try {
const result = await client.query(
'SELECT explain_query($1)',
[query]
);
return result.rows[0].explain_query;
} finally {
client.release();
}
}
}
// Usage
const aiClient = new AIQueryClient({
user: 'postgres',
host: 'localhost',
database: 'myapp',
password: 'password',
port: 5432,
});
// Generate and execute
aiClient.executeNaturalQuery('show recent orders')
.then(result => {
console.log('Query:', result.query);
console.log('Results:', result.results);
});
Development Workflow Integration
Code Review Integration
Create a function to validate new queries during code review:
-- Create a query validation function
CREATE OR REPLACE FUNCTION validate_query_performance(query_text text)
RETURNS TABLE(
performance_score int,
issues text[],
recommendations text[]
) AS $$
DECLARE
analysis text;
score int := 100;
issues_array text[] := '{}';
recommendations_array text[] := '{}';
BEGIN
-- Get AI analysis
SELECT explain_query(query_text) INTO analysis;
-- Simple scoring based on analysis content
IF analysis ILIKE '%sequential scan%' THEN
score := score - 30;
issues_array := issues_array || 'Sequential scan detected';
recommendations_array := recommendations_array || 'Consider adding indexes';
END IF;
IF analysis ILIKE '%high cost%' OR analysis ILIKE '%expensive%' THEN
score := score - 20;
issues_array := issues_array || 'High execution cost';
END IF;
RETURN QUERY SELECT score, issues_array, recommendations_array;
END;
$$ LANGUAGE plpgsql;
-- Usage in CI/CD
SELECT * FROM validate_query_performance('
SELECT * FROM orders o
JOIN users u ON o.user_id = u.id
WHERE o.created_at > NOW() - INTERVAL ''30 days''
');
Git Hooks Integration
#!/bin/bash
# .git/hooks/pre-commit
# Extract SQL queries from staged files
git diff --cached --name-only | grep '\.sql$' | while read file; do
echo "Analyzing SQL file: $file"
# Extract queries and analyze with pg_ai_query
psql -d mydb -c "
SELECT explain_query(\$\$$(cat $file)\$\$)
" | grep -i 'warning\|issue\|slow'
if [ $? -eq 0 ]; then
echo "Performance issues detected in $file"
exit 1
fi
done
Business Intelligence Integration
Jupyter Notebook Integration
# Install: pip install psycopg2-binary pandas plotly
import pandas as pd
import psycopg2
import plotly.express as px
def natural_query_to_dataframe(connection_string, description):
"""Convert natural language to SQL and return pandas DataFrame"""
with psycopg2.connect(connection_string) as conn:
# Generate query
query_df = pd.read_sql("SELECT generate_query(%s) as query", conn, params=[description])
query = query_df.iloc[0]['query']
# Execute query
result_df = pd.read_sql(query, conn)
print(f"Generated Query: {query}")
return result_df
# Usage in Jupyter
df = natural_query_to_dataframe(
"postgresql://user:pass@localhost/db",
"monthly revenue trend for the last year"
)
# Create visualization
fig = px.line(df, x='month', y='revenue', title='Monthly Revenue Trend')
fig.show()
Grafana Integration
Create a PostgreSQL data source query that uses pg_ai_query:
-- Grafana query template
SELECT
time_column,
value_column
FROM (
${__query(SELECT generate_query('${natural_description:text}'))}
) generated_query
WHERE $__timeFilter(time_column)
Data Analysis Workflows
Automated Reporting
import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
import psycopg2
import pandas as pd
from datetime import datetime, timedelta
class AutomatedReporter:
def __init__(self, db_config, email_config):
self.db_config = db_config
self.email_config = email_config
def generate_report(self, queries_descriptions):
"""Generate automated reports from natural language queries"""
report_data = {}
with psycopg2.connect(**self.db_config) as conn:
for name, description in queries_descriptions.items():
# Generate and execute query
df = pd.read_sql(
"SELECT generate_query(%s) as query",
conn, params=[description]
)
query = df.iloc[0]['query']
# Execute the generated query
result_df = pd.read_sql(query, conn)
report_data[name] = result_df
# Analyze performance
analysis_df = pd.read_sql(
"SELECT explain_query(%s) as analysis",
conn, params=[query]
)
print(f"Performance analysis for {name}:")
print(analysis_df.iloc[0]['analysis'])
return report_data
def send_report(self, report_data):
"""Send the generated report via email"""
# Create email content from report data
# Implementation depends on your email service
pass
# Usage
reporter = AutomatedReporter(
db_config={'host': 'localhost', 'database': 'mydb', ...},
email_config={'smtp_server': 'smtp.gmail.com', ...}
)
daily_queries = {
'user_signups': 'daily user registrations for the last 7 days',
'revenue': 'daily revenue for the last 7 days',
'top_products': 'top 10 products by sales today'
}
report = reporter.generate_report(daily_queries)
Query Performance Monitoring
-- Create a monitoring system
CREATE TABLE query_performance_log (
id SERIAL PRIMARY KEY,
query_description text,
generated_query text,
execution_time interval,
analysis_summary text,
created_at timestamp DEFAULT NOW()
);
-- Function to log and analyze queries
CREATE OR REPLACE FUNCTION monitor_query_performance(description text)
RETURNS void AS $$
DECLARE
query text;
start_time timestamp;
end_time timestamp;
exec_time interval;
analysis text;
BEGIN
-- Generate query
SELECT generate_query(description) INTO query;
-- Time the execution
start_time := clock_timestamp();
EXECUTE query;
end_time := clock_timestamp();
exec_time := end_time - start_time;
-- Get performance analysis
SELECT explain_query(query) INTO analysis;
-- Log the results
INSERT INTO query_performance_log
(query_description, generated_query, execution_time, analysis_summary)
VALUES (description, query, exec_time, analysis);
-- Alert on slow queries
IF exec_time > interval '5 seconds' THEN
RAISE WARNING 'Slow query detected: % took %', description, exec_time;
END IF;
END;
$$ LANGUAGE plpgsql;
-- Usage
SELECT monitor_query_performance('daily active users count');
API Integration Patterns
REST API Wrapper
from flask import Flask, request, jsonify
import psycopg2
import os
app = Flask(__name__)
def get_db_connection():
return psycopg2.connect(
host=os.getenv('DB_HOST'),
database=os.getenv('DB_NAME'),
user=os.getenv('DB_USER'),
password=os.getenv('DB_PASSWORD')
)
@app.route('/api/query', methods=['POST'])
def natural_language_query():
"""Generate and optionally execute queries from natural language"""
data = request.json
description = data.get('description')
execute = data.get('execute', False)
api_key = data.get('api_key')
provider = data.get('provider', 'auto')
if not description:
return jsonify({'error': 'Description required'}), 400
try:
conn = get_db_connection()
cur = conn.cursor()
# Generate query
cur.execute(
"SELECT generate_query(%s, %s, %s)",
(description, api_key, provider)
)
query = cur.fetchone()[0]
result = {'query': query}
if execute:
# Execute the generated query
cur.execute(query)
results = cur.fetchall()
columns = [desc[0] for desc in cur.description]
result['results'] = [dict(zip(columns, row)) for row in results]
result['row_count'] = len(results)
conn.close()
return jsonify(result)
except Exception as e:
return jsonify({'error': str(e)}), 500
@app.route('/api/explain', methods=['POST'])
def explain_query():
"""Analyze query performance"""
data = request.json
query = data.get('query')
if not query:
return jsonify({'error': 'Query required'}), 400
try:
conn = get_db_connection()
cur = conn.cursor()
cur.execute("SELECT explain_query(%s)", (query,))
analysis = cur.fetchone()[0]
conn.close()
return jsonify({'analysis': analysis})
except Exception as e:
return jsonify({'error': str(e)}), 500
if __name__ == '__main__':
app.run(debug=True)
Testing Integration
Unit Testing Queries
import unittest
import psycopg2
class TestQueryGeneration(unittest.TestCase):
def setUp(self):
self.conn = psycopg2.connect("postgresql://test_user:test_pass@localhost/test_db")
def test_basic_query_generation(self):
"""Test basic query generation functionality"""
cur = self.conn.cursor()
# Test simple query
cur.execute("SELECT generate_query('show all users')")
query = cur.fetchone()[0]
self.assertIn('SELECT', query.upper())
self.assertIn('users', query.lower())
def test_query_performance(self):
"""Test that generated queries perform reasonably well"""
cur = self.conn.cursor()
# Generate a query
cur.execute("SELECT generate_query('count active users')")
query = cur.fetchone()[0]
# Analyze performance
cur.execute("SELECT explain_query(%s)", (query,))
analysis = cur.fetchone()[0]
# Check for performance issues
self.assertNotIn('Sequential Scan', analysis)
def tearDown(self):
self.conn.close()
Production Deployment Patterns
Connection Pooling
from psycopg2 import pool
import threading
class AIQueryPool:
def __init__(self, minconn, maxconn, **kwargs):
self.pool = pool.ThreadedConnectionPool(
minconn, maxconn, **kwargs
)
self.lock = threading.Lock()
def execute_natural_query(self, description):
conn = self.pool.getconn()
try:
cur = conn.cursor()
cur.execute("SELECT generate_query(%s)", (description,))
query = cur.fetchone()[0]
cur.execute(query)
results = cur.fetchall()
return query, results
finally:
self.pool.putconn(conn)
# Usage
query_pool = AIQueryPool(
minconn=5, maxconn=20,
host='localhost', database='prod_db',
user='app_user', password='secure_password'
)
Error Handling and Logging
import logging
from functools import wraps
def handle_ai_query_errors(func):
@wraps(func)
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except psycopg2.Error as e:
logging.error(f"Database error in {func.__name__}: {e}")
raise
except Exception as e:
logging.error(f"Unexpected error in {func.__name__}: {e}")
raise
return wrapper
@handle_ai_query_errors
def safe_natural_query(connection, description):
"""Safely execute natural language queries with error handling"""
cur = connection.cursor()
cur.execute("SELECT generate_query(%s)", (description,))
return cur.fetchone()[0]
See Also
- Function Reference - Complete API documentation
- Best Practices - Performance and security guidelines
- Examples - Additional usage examples
- Error Codes - Troubleshooting guide
Function Reference
This page provides complete reference documentation for all functions provided by the pg_ai_query extension.
Main Functions
generate_query()
Generates SQL queries from natural language descriptions with automatic schema discovery.
Signature
generate_query(
natural_language_query text,
api_key text DEFAULT NULL,
provider text DEFAULT 'auto'
) RETURNS text
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
natural_language_query | text | ✓ | - | The natural language description of the query you want |
api_key | text | ✗ | NULL | API key for AI provider (uses config if NULL) |
provider | text | ✗ | ‘auto’ | AI provider to use: ‘openai’, ‘anthropic’, or ‘auto’ |
Returns
- Type:
text - Content: A valid PostgreSQL SQL query
Examples
-- Basic usage
SELECT generate_query('show all users');
-- With custom API key
SELECT generate_query(
'find users created in the last week',
'sk-your-api-key-here'
);
-- With specific provider
SELECT generate_query(
'calculate monthly revenue by product category',
NULL,
'openai'
);
-- Complex query
SELECT generate_query('show top 10 customers by total order value with their contact information');
Behavior
- Schema Discovery: Automatically analyzes your database schema to understand table structures and relationships
- Safety Limits: Always adds LIMIT clauses to SELECT queries (configurable)
- Query Validation: Validates generated queries for safety and correctness
- Error Handling: Returns descriptive error messages for invalid requests
Supported Query Types
| Type | Description | Examples |
|---|---|---|
| SELECT | Data retrieval with filtering, joins, aggregation | 'show users', 'count orders by status' |
| INSERT | Data insertion | 'insert a new user with name John' |
| UPDATE | Data modification | 'update user email where id is 5' |
| DELETE | Data removal | 'delete cancelled orders' |
explain_query()
Analyzes query performance using EXPLAIN ANALYZE and provides AI-powered optimization insights.
Signature
explain_query(
query_text text,
api_key text DEFAULT NULL,
provider text DEFAULT 'auto'
) RETURNS text
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
query_text | text | ✓ | - | The SQL query to analyze |
api_key | text | ✗ | NULL | API key for AI provider (uses config if NULL) |
provider | text | ✗ | ‘auto’ | AI provider to use: ‘openai’, ‘anthropic’, or ‘auto’ |
Returns
- Type:
text - Content: Detailed performance analysis and optimization recommendations
Examples
-- Basic usage
SELECT explain_query('SELECT * FROM users WHERE active = true');
-- Complex query analysis
SELECT explain_query('
SELECT u.username, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
GROUP BY u.username
ORDER BY order_count DESC
LIMIT 100
');
-- With specific provider
SELECT explain_query(
'SELECT * FROM products WHERE price > 100',
'your-api-key',
'anthropic'
);
-- Integration with generate_query
WITH generated AS (
SELECT generate_query('find recent high-value orders') as query
)
SELECT explain_query((SELECT query FROM generated));
Output Structure
The function returns structured text analysis containing:
- Query Overview: Brief description of query purpose
- Performance Summary: Execution time, cost estimates, rows processed
- Execution Plan Analysis: Breakdown of PostgreSQL’s execution strategy
- Performance Issues: Identified bottlenecks and inefficiencies
- Optimization Suggestions: Specific recommendations for improvement
- Index Recommendations: Suggested indexes with CREATE statements
Supported Query Types
| Type | Description | Notes |
|---|---|---|
| SELECT | Data retrieval queries | Fully supported |
| WITH | Common Table Expressions | Fully supported |
| VALUES | Value lists | Fully supported |
Behavior
- Actual Execution: Uses EXPLAIN ANALYZE which executes the query
- Performance Metrics: Provides real execution times and row counts
- AI Analysis: Processes execution plan through AI for insights
- Security: Only allows read-only query types
- Error Handling: Graceful handling of invalid queries
Performance Considerations
- Query Execution: The function actually runs your query
- Analysis Time: AI processing adds 1-3 seconds
- Resource Usage: Same as running the query manually
- Best for: Development, testing, and performance optimization
get_database_tables()
Returns metadata about all user tables in the database.
Signature
get_database_tables() RETURNS text
Parameters
None.
Returns
- Type:
text(JSON format) - Content: Array of table metadata objects
JSON Structure
[
{
"table_name": "users",
"schema_name": "public",
"table_type": "BASE TABLE",
"estimated_rows": 1500,
"table_size": "128 kB"
},
{
"table_name": "orders",
"schema_name": "public",
"table_type": "BASE TABLE",
"estimated_rows": 5000,
"table_size": "512 kB"
}
]
Example Usage
-- Get all tables
SELECT get_database_tables();
-- Pretty print with formatting
SELECT jsonb_pretty(get_database_tables()::jsonb);
-- Extract specific information
SELECT
table_name,
estimated_rows
FROM jsonb_to_recordset(get_database_tables()::jsonb)
AS x(table_name text, estimated_rows int);
get_table_details()
Returns detailed information about a specific table including columns, constraints, and relationships.
Signature
get_table_details(
table_name text,
schema_name text DEFAULT 'public'
) RETURNS text
Parameters
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
table_name | text | ✓ | - | Name of the table to analyze |
schema_name | text | ✗ | ‘public’ | Schema containing the table |
Returns
- Type:
text(JSON format) - Content: Detailed table information object
JSON Structure
{
"table_name": "users",
"schema_name": "public",
"columns": [
{
"column_name": "id",
"data_type": "integer",
"is_nullable": false,
"column_default": "nextval('users_id_seq'::regclass)",
"is_primary_key": true
},
{
"column_name": "email",
"data_type": "character varying",
"character_maximum_length": 150,
"is_nullable": false,
"column_default": null,
"is_unique": true
}
],
"constraints": [
{
"constraint_name": "users_pkey",
"constraint_type": "PRIMARY KEY",
"column_names": ["id"]
},
{
"constraint_name": "users_email_key",
"constraint_type": "UNIQUE",
"column_names": ["email"]
}
],
"foreign_keys": [],
"indexes": [
{
"index_name": "users_pkey",
"columns": ["id"],
"is_unique": true,
"is_primary": true
}
]
}
Example Usage
-- Basic usage
SELECT get_table_details('users');
-- Specific schema
SELECT get_table_details('orders', 'sales');
-- Extract column information
SELECT
column_name,
data_type,
is_nullable
FROM jsonb_to_recordset(
(get_table_details('users')::jsonb)->'columns'
) AS x(column_name text, data_type text, is_nullable boolean);
-- Find all foreign keys
SELECT
constraint_name,
column_names
FROM jsonb_to_recordset(
(get_table_details('orders')::jsonb)->'foreign_keys'
) AS x(constraint_name text, column_names text[]);
Utility Functions
Schema Discovery Process
The extension uses these internal processes when analyzing your database:
- Table Discovery: Identifies all user tables (excludes system catalogs)
- Column Analysis: Examines data types, constraints, and nullable fields
- Relationship Mapping: Discovers foreign key relationships between tables
- Index Analysis: Identifies indexes that might optimize query performance
- Statistics Gathering: Collects row counts and table sizes for optimization
Internal Validation
All functions perform these validations:
- Permission Checks: Ensures access to only authorized tables
- Schema Validation: Verifies table and schema existence
- SQL Injection Prevention: Sanitizes all inputs
- Query Safety: Prevents generation of harmful operations
Error Codes and Messages
Common Error Conditions
| Error | Cause | Solution |
|---|---|---|
"API key not configured" | No valid API key found | Configure API key in ~/.pg_ai.config or pass as parameter |
"No tables found" | Database has no user tables | Create some tables or check permissions |
"Table does not exist" | Specified table not found | Check table name and schema |
"Query generation failed" | AI service error | Check API key, network connectivity, and service status |
"Invalid provider" | Unknown provider specified | Use ‘openai’, ‘anthropic’, or ‘auto’ |
Debugging Functions
-- Check extension version and status
SELECT extversion FROM pg_extension WHERE extname = 'pg_ai_query';
-- Verify function availability
\df generate_query
\df get_database_tables
\df get_table_details
-- Test basic functionality
SELECT generate_query('SELECT 1');
Performance Characteristics
Query Generation Performance
| Factor | Impact | Recommendations |
|---|---|---|
| Database Size | Minimal | Schema analysis is cached |
| Table Count | Low | Only user tables are analyzed |
| Query Complexity | Medium | More complex requests take longer |
| AI Model | High | GPT-4o is slower but more accurate than GPT-3.5 |
Memory Usage
- Schema Analysis: ~1MB per 100 tables
- Query Generation: ~5-10MB per request
- Caching: Schema information cached in session
Network Requirements
- API Requests: HTTPS to OpenAI/Anthropic APIs
- Bandwidth: ~1-10KB per query generation request
- Latency: Typically 1-5 seconds per request
Function Security
Access Control
- Functions execute with caller’s privileges
- No privilege escalation
- Respects PostgreSQL’s standard permission system
Data Protection
- Never sends actual data to AI providers
- Only schema metadata is transmitted
- Generated queries can be reviewed before execution
API Security
- API keys transmitted over HTTPS only
- Keys stored securely in configuration files
- No logging of API keys
Next Steps
- See Examples for practical usage patterns
- Review Best Practices for production usage
- Check API Reference for complete technical details
API Reference
This page provides complete technical reference for the pg_ai_query extension API.
Extension Information
| Property | Value |
|---|---|
| Name | pg_ai_query |
| Version | 1.0 |
| Schema | public |
| Dependencies | PostgreSQL 12+ |
SQL API
CREATE EXTENSION
CREATE EXTENSION [IF NOT EXISTS] pg_ai_query [WITH] [SCHEMA schema_name];
Creates the extension and installs all functions.
Example:
CREATE EXTENSION IF NOT EXISTS pg_ai_query;
DROP EXTENSION
DROP EXTENSION [IF EXISTS] pg_ai_query [CASCADE | RESTRICT];
Removes the extension and all its functions.
Example:
DROP EXTENSION IF EXISTS pg_ai_query CASCADE;
Core Functions API
generate_query()
Signature:
generate_query(
natural_language_query text,
api_key text DEFAULT NULL,
provider text DEFAULT 'auto'
) RETURNS text
Parameters:
| Parameter | Type | Constraints | Description |
|---|---|---|---|
natural_language_query | text | NOT NULL, max 4096 chars | Natural language description of desired query |
api_key | text | NULL allowed | API key for AI provider (optional if configured) |
provider | text | ‘openai’, ‘anthropic’, ‘auto’ | AI provider selection |
Returns:
- Type:
text - Format: Valid PostgreSQL SQL query
- Constraints: Always includes LIMIT clause for SELECT statements
Exceptions:
EXTERNAL_ROUTINE_EXCEPTION: AI API communication failuresINVALID_PARAMETER_VALUE: Invalid provider or malformed inputCONFIGURATION_FILE_ERROR: Configuration issues
Example:
SELECT generate_query('show top 10 customers by revenue');
get_database_tables()
Signature:
get_database_tables() RETURNS text
Parameters: None
Returns:
- Type:
text - Format: JSON array
- Content: Table metadata objects
JSON Schema:
{
"type": "array",
"items": {
"type": "object",
"properties": {
"table_name": {"type": "string"},
"schema_name": {"type": "string"},
"table_type": {"type": "string"},
"estimated_rows": {"type": "integer"},
"table_size": {"type": "string"}
},
"required": ["table_name", "schema_name", "table_type"]
}
}
Exceptions:
EXTERNAL_ROUTINE_EXCEPTION: Database introspection failures
Example:
SELECT get_database_tables();
get_table_details()
Signature:
get_table_details(
table_name text,
schema_name text DEFAULT 'public'
) RETURNS text
Parameters:
| Parameter | Type | Constraints | Description |
|---|---|---|---|
table_name | text | NOT NULL, must exist | Name of table to analyze |
schema_name | text | Valid schema name | Schema containing the table |
Returns:
- Type:
text - Format: JSON object
- Content: Detailed table structure information
JSON Schema:
{
"type": "object",
"properties": {
"table_name": {"type": "string"},
"schema_name": {"type": "string"},
"columns": {
"type": "array",
"items": {
"type": "object",
"properties": {
"column_name": {"type": "string"},
"data_type": {"type": "string"},
"is_nullable": {"type": "boolean"},
"column_default": {"type": ["string", "null"]},
"character_maximum_length": {"type": ["integer", "null"]},
"numeric_precision": {"type": ["integer", "null"]},
"is_primary_key": {"type": "boolean"},
"is_unique": {"type": "boolean"}
}
}
},
"constraints": {"type": "array"},
"foreign_keys": {"type": "array"},
"indexes": {"type": "array"}
}
}
Exceptions:
INVALID_PARAMETER_VALUE: Table does not existEXTERNAL_ROUTINE_EXCEPTION: Database introspection failures
Example:
SELECT get_table_details('users', 'public');
Configuration API
Configuration File Location
The extension reads configuration from:
~/.pg_ai.config
Configuration Structure
[general]
# General settings
log_level = "INFO" | "DEBUG" | "WARNING" | "ERROR"
enable_logging = true | false
enable_postgresql_elog = true | false
request_timeout_ms = <integer>
max_retries = <integer>
[query]
# Query generation settings
enforce_limit = true | false
default_limit = <integer>
[openai]
# OpenAI provider settings
api_key = "<api_key_string>"
default_model = "gpt-4o" | "gpt-4" | "gpt-3.5-turbo"
[anthropic]
# Anthropic provider settings
api_key = "<api_key_string>"
default_model = "claude-3-5-sonnet-20241022"
Configuration Validation Rules
| Setting | Type | Range/Values | Default |
|---|---|---|---|
log_level | string | DEBUG, INFO, WARNING, ERROR | INFO |
enable_logging | boolean | true, false | false |
enable_postgresql_elog | boolean | true, false | true |
request_timeout_ms | integer | 1000-300000 | 30000 |
max_retries | integer | 0-10 | 3 |
enforce_limit | boolean | true, false | true |
default_limit | integer | 1-1000000 | 1000 |
api_key | string | Provider-specific format | “” |
default_model | string | Provider-specific values | Provider default |
Error Codes
Extension Error Codes
| Code | Name | Description |
|---|---|---|
22023 | INVALID_PARAMETER_VALUE | Invalid function parameter |
38001 | CONTAINING_SQL_NOT_PERMITTED | Unsafe SQL operation attempted |
38003 | PROHIBITED_SQL_STATEMENT_ATTEMPTED | Forbidden query type |
39001 | EXTERNAL_ROUTINE_EXCEPTION | AI API communication error |
58030 | IO_ERROR | File system or network I/O error |
XX000 | INTERNAL_ERROR | Unexpected internal error |
AI Provider Error Codes
OpenAI Errors
| HTTP Code | Error Type | Description |
|---|---|---|
| 400 | invalid_request_error | Invalid request parameters |
| 401 | authentication_error | Invalid API key |
| 403 | permission_error | Insufficient permissions |
| 429 | rate_limit_error | Rate limit exceeded |
| 500 | api_error | OpenAI server error |
Anthropic Errors
| HTTP Code | Error Type | Description |
|---|---|---|
| 400 | invalid_request_error | Invalid request format |
| 401 | authentication_error | Invalid API key |
| 403 | permission_error | Access denied |
| 429 | rate_limit_error | Rate limit exceeded |
| 500 | api_error | Anthropic server error |
Internal APIs
Schema Discovery Engine
The extension uses these internal functions for schema analysis:
-- Internal function signatures (not callable by users)
_pg_ai_analyze_schema() RETURNS schema_info;
_pg_ai_get_table_relationships() RETURNS relationship_map;
_pg_ai_validate_query(query text) RETURNS boolean;
Query Generation Pipeline
- Input Validation: Validates natural language input
- Schema Discovery: Analyzes database structure
- Context Building: Creates AI prompt with schema context
- AI Request: Sends request to configured provider
- Response Parsing: Extracts SQL from AI response
- Query Validation: Validates generated SQL for safety
- Result Return: Returns validated SQL query
Security Validation
The extension performs these security checks:
- SQL Injection Prevention: Sanitizes all inputs
- System Table Protection: Blocks access to pg_* and information_schema
- DDL Restriction: Can generate DDL but with warnings
- Privilege Respect: Honors PostgreSQL permission system
Performance Characteristics
Memory Usage
| Operation | Memory Usage | Duration |
|---|---|---|
| Schema Analysis | ~1MB per 100 tables | 100-500ms |
| Query Generation | ~5-10MB per request | 1-5 seconds |
| Configuration Load | ~1KB | <10ms |
Network Requirements
| Provider | Endpoint | Protocol | Bandwidth |
|---|---|---|---|
| OpenAI | api.openai.com | HTTPS | 1-10KB per request |
| Anthropic | api.anthropic.com | HTTPS | 1-10KB per request |
Caching Behavior
- Schema Information: Cached per PostgreSQL session
- Configuration: Cached until session restart
- AI Responses: Not cached (each request is fresh)
Extension Metadata
System Catalog Integration
The extension registers these entries:
-- Extension registration
SELECT * FROM pg_extension WHERE extname = 'pg_ai_query';
-- Function registration
SELECT * FROM pg_proc WHERE proname LIKE '%generate%';
-- Dependencies
SELECT * FROM pg_depend WHERE refobjid IN (
SELECT oid FROM pg_extension WHERE extname = 'pg_ai_query'
);
Version Information
-- Check extension version
SELECT extversion FROM pg_extension WHERE extname = 'pg_ai_query';
-- Function definitions
\df+ generate_query
\df+ get_database_tables
\df+ get_table_details
Compatibility
PostgreSQL Versions
| PostgreSQL Version | Support Status | Notes |
|---|---|---|
| 12.x | ✅ Supported | Minimum required version |
| 13.x | ✅ Supported | Full compatibility |
| 14.x | ✅ Supported | Recommended |
| 15.x | ✅ Supported | Full compatibility |
| 16.x | ✅ Supported | Latest tested |
Operating Systems
| OS | Support Status | Notes |
|---|---|---|
| Linux (RHEL/CentOS) | ✅ Supported | Primary development platform |
| Linux (Ubuntu/Debian) | ✅ Supported | Well tested |
| macOS | ✅ Supported | Development and testing |
| Windows | ⚠️ Experimental | Limited testing |
AI Provider Compatibility
| Provider | API Version | Models Supported |
|---|---|---|
| OpenAI | v1 | GPT-4o, GPT-4, GPT-3.5-turbo |
| Anthropic | v1 | Claude 3.5 Sonnet |
This completes the technical API reference for the pg_ai_query extension.
Configuration File Reference
This page provides a complete reference for all configuration options available in the pg_ai_query extension configuration file.
Configuration File Location
The configuration file must be placed at:
~/.pg_ai.config
File Format
The configuration uses INI format with sections and key-value pairs:
# Comments start with # and are ignored
[section_name]
key = value
key_with_quotes = "value with spaces"
Complete Configuration Template
# PG AI Query Configuration File
# Place this file at ~/.pg_ai.config
[general]
# Logging configuration
log_level = "INFO"
enable_logging = false
enable_postgresql_elog = true
# Request configuration
request_timeout_ms = 30000
max_retries = 3
[query]
# Query generation behavior
enforce_limit = true
default_limit = 1000
[openai]
# OpenAI provider configuration
api_key = ""
default_model = "gpt-4o"
[anthropic]
# Anthropic provider configuration
api_key = ""
default_model = "claude-3-5-sonnet-20241022"
Configuration Sections
[general] Section
Controls general extension behavior and logging.
| Option | Type | Default | Range/Values | Description |
|---|---|---|---|---|
log_level | string | “INFO” | “DEBUG”, “INFO”, “WARNING”, “ERROR” | Minimum log level for messages |
enable_logging | boolean | false | true, false | Enable/disable all logging output |
enable_postgresql_elog | boolean | true | true, false | Use PostgreSQL’s elog system for logging |
request_timeout_ms | integer | 30000 | 1000-300000 | Timeout for AI API requests in milliseconds |
max_retries | integer | 3 | 0-10 | Maximum retry attempts for failed requests |
log_level
Controls the verbosity of log messages.
Values:
"DEBUG": Most verbose, includes all internal operations"INFO": General information about operations"WARNING": Warnings about potential issues"ERROR": Only error messages
Example:
[general]
log_level = "DEBUG" # Show all log messages
enable_logging
Master switch for all logging output.
Values:
true: Enable logging (respects log_level)false: Disable all logging output
Example:
[general]
enable_logging = true # Turn on logging
log_level = "INFO" # Show INFO level and above
enable_postgresql_elog
Controls whether to use PostgreSQL’s internal logging system.
Values:
true: Use PostgreSQL’s elog system (recommended)false: Use alternative logging method
Example:
[general]
enable_postgresql_elog = true # Use PostgreSQL logging
request_timeout_ms
Timeout for AI API requests in milliseconds.
Range: 1000-300000 (1 second to 5 minutes) Recommended: 30000-60000 for most use cases
Example:
[general]
request_timeout_ms = 45000 # 45 second timeout
max_retries
Maximum number of retry attempts for failed API requests.
Range: 0-10 Recommended: 3-5 for production use
Example:
[general]
max_retries = 5 # Retry up to 5 times
[query] Section
Controls query generation behavior and safety features.
| Option | Type | Default | Range/Values | Description |
|---|---|---|---|---|
enforce_limit | boolean | true | true, false | Always add LIMIT clause to SELECT queries |
default_limit | integer | 1000 | 1-1000000 | Default row limit when none specified |
enforce_limit
Controls whether SELECT queries automatically include LIMIT clauses.
Values:
true: Always add LIMIT to SELECT queries (recommended for safety)false: Allow unlimited SELECT queries
Example:
[query]
enforce_limit = true # Always limit query results
default_limit = 500 # Default to 500 rows
default_limit
Default number of rows to limit when no explicit limit is requested.
Range: 1-1000000 Recommended: 100-5000 depending on use case
Example:
[query]
default_limit = 2000 # Default to 2000 rows
[openai] Section
Configuration for OpenAI provider.
| Option | Type | Default | Values | Description |
|---|---|---|---|---|
api_key | string | “” | API key format | Your OpenAI API key |
default_model | string | “gpt-4o” | Model names | Default OpenAI model to use |
api_key
Your OpenAI API key from platform.openai.com.
Format: Must start with sk- or sk-proj-
Security: Keep this key secure and never commit to version control
Example:
[openai]
api_key = "sk-proj-abc123..." # Your actual API key
default_model
Default OpenAI model to use for query generation.
Available Models:
"gpt-4o": Latest GPT-4 Omni model (recommended)"gpt-4": Standard GPT-4 model"gpt-3.5-turbo": Fast and economical model
Example:
[openai]
default_model = "gpt-4o" # Use latest model
[anthropic] Section
Configuration for Anthropic (Claude) provider.
| Option | Type | Default | Values | Description |
|---|---|---|---|---|
api_key | string | “” | API key format | Your Anthropic API key |
default_model | string | “claude-3-5-sonnet-20241022” | Model names | Default Claude model to use |
api_key
Your Anthropic API key from console.anthropic.com.
Format: Must start with sk-ant-
Security: Keep this key secure and never commit to version control
Example:
[anthropic]
api_key = "sk-ant-abc123..." # Your actual API key
default_model
Default Anthropic model to use for query generation.
Available Models:
"claude-3-5-sonnet-20241022": Latest Claude 3.5 Sonnet model
Example:
[anthropic]
default_model = "claude-3-5-sonnet-20241022" # Use Claude 3.5 Sonnet
Configuration Examples
Development Configuration
# Development setup with detailed logging
[general]
log_level = "DEBUG"
enable_logging = true
enable_postgresql_elog = true
request_timeout_ms = 60000 # Longer timeout for debugging
max_retries = 2 # Fewer retries for faster feedback
[query]
enforce_limit = true
default_limit = 100 # Small limit for testing
[openai]
api_key = "sk-proj-dev-key-here"
default_model = "gpt-3.5-turbo" # Cheaper for development
Production Configuration
# Production setup optimized for performance and reliability
[general]
log_level = "WARNING"
enable_logging = false # Disable for performance
enable_postgresql_elog = true
request_timeout_ms = 30000
max_retries = 5 # More retries for reliability
[query]
enforce_limit = true
default_limit = 1000
[openai]
api_key = "sk-proj-prod-key-here"
default_model = "gpt-4" # Good balance of quality and cost
Multi-Provider Configuration
# Setup with both providers for redundancy
[general]
log_level = "INFO"
enable_logging = true
request_timeout_ms = 45000
max_retries = 3
[query]
enforce_limit = true
default_limit = 500
[openai]
api_key = "sk-proj-openai-key-here"
default_model = "gpt-4o"
[anthropic]
api_key = "sk-ant-anthropic-key-here"
default_model = "claude-3-5-sonnet-20241022"
# Extension will use OpenAI first (first configured)
# Falls back to Anthropic if OpenAI fails
High-Performance Configuration
# Optimized for speed and low cost
[general]
log_level = "ERROR" # Minimal logging
enable_logging = false
request_timeout_ms = 15000 # Shorter timeout
max_retries = 2 # Fewer retries
[query]
enforce_limit = true
default_limit = 200 # Smaller default limit
[openai]
api_key = "sk-proj-key-here"
default_model = "gpt-3.5-turbo" # Fastest model
Configuration Validation
File Format Validation
The extension validates:
- INI Format: Proper section headers and key-value pairs
- Required Quotes: String values with spaces must be quoted
- Section Names: Only valid section names are accepted
- Key Names: Only valid configuration keys are recognized
Value Validation
Each configuration value is validated:
- Type Checking: Strings, integers, and booleans are validated
- Range Checking: Numeric values must be within acceptable ranges
- Format Checking: API keys must match expected formats
- Model Validation: Model names must be supported
Error Handling
Configuration errors are reported with specific messages:
Configuration Error: Invalid log_level 'VERBOSE' in section [general]. Valid values are: DEBUG, INFO, WARNING, ERROR
Configuration Error: request_timeout_ms must be between 1000 and 300000, got 500000
Configuration Error: OpenAI API key must start with 'sk-' or 'sk-proj-'
Configuration File Only
Currently, only configuration files are supported. All settings must be specified in ~/.pg_ai.config.
Security Considerations
File Permissions
Set secure permissions on the configuration file:
# Make file readable only by owner
chmod 600 ~/.pg_ai.config
# Verify permissions
ls -la ~/.pg_ai.config
# Should show: -rw------- 1 user user ...
API Key Security
- Never commit configuration files with API keys to version control
- Use secure configuration files in containerized or CI/CD environments
- Rotate keys regularly as per your organization’s security policy
- Monitor usage through your AI provider’s dashboard
Configuration Templates
Use template files for version control:
# .pg_ai.config.template
[general]
log_level = "INFO"
enable_logging = false
[openai]
api_key = "${OPENAI_API_KEY}" # Replace with actual key
default_model = "gpt-4o"
[anthropic]
api_key = "${ANTHROPIC_API_KEY}" # Replace with actual key
Configuration Loading Order
The extension loads configuration in this order:
- Default values (hardcoded)
- Configuration file (
~/.pg_ai.config)
This provides a simple and secure configuration approach.
Error Codes and Messages
This page provides a comprehensive reference for all error codes and messages that you might encounter while using the pg_ai_query extension.
Function-Specific Errors
generate_query Errors
| Error Message | Cause | Solution |
|---|---|---|
"API key required. Pass as parameter or configure ~/.pg_ai.config" | No API key provided and none configured | Add API key to config file or pass as parameter |
"No API key available for [provider] provider" | API key missing for specific provider | Configure API key for the requested provider |
"AI API error: [details]" | AI service returned an error | Check API key validity and service status |
"Natural language query cannot be empty" | Empty input provided | Provide a non-empty query description |
"Query generation failed: [details]" | AI failed to generate query | Check your description clarity and try again |
explain_query Errors
| Error Message | Cause | Solution |
|---|---|---|
"Query text cannot be empty" | Empty query provided | Provide a valid SQL query |
"Failed to prepare EXPLAIN query: syntax error" | Invalid SQL syntax | Fix SQL syntax errors |
"Failed to execute EXPLAIN query" | Query execution failed | Check query validity and permissions |
"Failed to connect to SPI" | Database connection issue | Check database connectivity |
"No output from EXPLAIN query" | EXPLAIN returned no results | Verify query syntax and structure |
get_database_tables Errors
| Error Message | Cause | Solution |
|---|---|---|
"Failed to connect to SPI" | Database connection issue | Check database connectivity |
"Failed to execute query" | Database query failed | Check permissions and database status |
"No tables found" | No user tables in database | Create tables or check schema permissions |
get_table_details Errors
| Error Message | Cause | Solution |
|---|---|---|
"Table '[table_name]' does not exist" | Specified table not found | Check table name and schema |
"Failed to get table details: [details]" | Error retrieving table info | Check permissions and table existence |
Debugging Commands
-- Check extension status
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_ai_query';
-- Test basic functionality
SELECT generate_query('SELECT 1');
-- Check function permissions
\df generate_query
See Also
Troubleshooting Guide
This guide helps you diagnose and fix common issues with the pg_ai_query extension.
Installation Issues
Extension Not Found
Error: extension "pg_ai_query" is not available
Causes & Solutions:
-
Extension not installed
# Check if extension files are present ls /usr/share/postgresql/*/extension/pg_ai_query* # If missing, reinstall cd pg_ai_query sudo make install -
Wrong PostgreSQL version
# Check PostgreSQL version psql -c "SELECT version();" # Ensure extension was built for this version pg_config --version -
Permission issues
# Check file permissions ls -la /usr/share/postgresql/*/extension/pg_ai_query* # Fix if needed sudo chmod 644 /usr/share/postgresql/*/extension/pg_ai_query*
Compilation Errors
Error: Various C++ compilation errors
Common Causes:
-
Missing dependencies
# Ubuntu/Debian sudo apt-get install postgresql-server-dev-all cmake gcc g++ libssl-dev # CentOS/RHEL sudo yum install postgresql-devel cmake gcc-c++ openssl-devel -
Wrong PostgreSQL headers
# Check pg_config path which pg_config pg_config --includedir # Use specific version if needed PG_CONFIG=/usr/pgsql-14/bin/pg_config make -
C++ standard issues
# Ensure C++20 support gcc --version # Should be 8.0+
Configuration Issues
API Key Problems
Error: "API key not configured" or "Invalid API key"
Diagnosis:
-- Enable logging to see configuration loading
-- In ~/.pg_ai.config:
[general]
enable_logging = true
log_level = "DEBUG"
Solutions:
-
Check configuration file exists
ls -la ~/.pg_ai.config cat ~/.pg_ai.config -
Verify API key format
[openai] api_key = "sk-proj-abc123..." # Must start with sk- for OpenAI [anthropic] api_key = "sk-ant-abc123..." # Must start with sk-ant- for Anthropic -
Test API key directly
# Test OpenAI key curl -H "Authorization: Bearer YOUR_API_KEY" \ -H "Content-Type: application/json" \ -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"test"}]}' \ https://api.openai.com/v1/chat/completions -
Check file permissions
# Config file should be readable by PostgreSQL user chmod 644 ~/.pg_ai.config
Configuration Not Loading
Error: Configuration seems ignored
Solutions:
-
Restart PostgreSQL session
-- Disconnect and reconnect to reload configuration \q psql -d your_database -
Check file syntax
# Verify INI format is correct # No spaces around = signs # Quotes around string values api_key = "your-key-here" # Correct api_key=your-key-here # Wrong -
Verify home directory
echo $HOME # Config should be at $HOME/.pg_ai.config
Runtime Issues
No Tables Found
Error: "No tables found" or empty results
Diagnosis:
-- Check what tables exist
SELECT get_database_tables();
-- Check table permissions
SELECT schemaname, tablename, tableowner
FROM pg_tables
WHERE schemaname NOT IN ('information_schema', 'pg_catalog');
Solutions:
-
Create some tables
-- Extension only works with user tables CREATE TABLE test_table (id SERIAL, name TEXT); -
Check schema focus
-- Extension looks at 'public' schema by default SET search_path TO public, your_schema;
Query Generation Failures
Error: "Query generation failed" or timeouts
Common Causes & Solutions:
-
Network connectivity
# Test connection to AI providers curl -I https://api.openai.com/v1/models curl -I https://api.anthropic.com/v1/messages -
API rate limiting
# Increase timeout in config [general] request_timeout_ms = 60000 # 60 seconds max_retries = 5 -
Complex database schema
-- For large schemas, try with simpler requests first SELECT generate_query('SELECT 1'); -- Test basic functionality -
Model-specific issues
-- Try different models SELECT generate_query('show tables', null, 'openai'); SELECT generate_query('show tables', null, 'anthropic');
Poor Query Quality
Issue: Generated queries are incorrect or suboptimal
Solutions:
-
Be more specific in requests
-- Instead of: "show data" -- Use: "show user names and email addresses from users table" -
Check schema information
-- Ensure tables have good names and structure SELECT get_table_details('your_table'); -
Try different models
-- GPT-4 is generally more accurate than GPT-3.5 SELECT generate_query('complex query here', null, 'openai'); -
Break complex requests into parts
-- Instead of one complex request, try multiple simpler ones SELECT generate_query('show users'); SELECT generate_query('show orders for user id 5');
Performance Issues
Slow Query Generation
Issue: Extension takes a long time to respond
Diagnosis:
-- Enable detailed logging
-- In ~/.pg_ai.config:
[general]
enable_logging = true
log_level = "DEBUG"
Solutions:
-
Optimize database schema
-- Reduce number of tables if possible -- Use clear table and column names -- Add table comments for context COMMENT ON TABLE users IS 'Customer information and contact details'; -
Adjust timeout settings
[general] request_timeout_ms = 45000 # Increase from default 30000 -
Use faster models
[openai] default_model = "gpt-3.5-turbo" # Faster than gpt-4
High Memory Usage
Issue: PostgreSQL memory usage increases significantly
Solutions:
-
Restart PostgreSQL sessions periodically
# Schema information is cached per session # Restart to clear cache -
Reduce concurrent users of extension
-
Monitor extension usage
-- Track function calls SELECT * FROM pg_stat_user_functions WHERE funcname LIKE '%generate%';
Debugging Tools
Enable Debug Logging
# ~/.pg_ai.config
[general]
enable_logging = true
log_level = "DEBUG"
enable_postgresql_elog = true
Check Extension Status
-- Verify extension is loaded
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_ai_query';
-- Check function availability
\df *generate*
\df *database_tables*
\df *table_details*
-- Test basic functionality
SELECT generate_query('SELECT 1 as test');
Monitor System Resources
# Check PostgreSQL logs
tail -f /var/log/postgresql/postgresql-*.log
# Monitor network connections
netstat -an | grep :5432
# Check memory usage
ps aux | grep postgres
Test Configuration
-- Test API connectivity
SELECT generate_query('test query', 'your-test-key', 'openai');
-- Test schema discovery
SELECT get_database_tables();
SELECT get_table_details('your_table');
Common Error Messages
| Error Message | Likely Cause | Solution |
|---|---|---|
extension "pg_ai_query" is not available | Extension not installed | Run make install |
function generate_query does not exist | Extension not created in DB | Run CREATE EXTENSION pg_ai_query |
API key not configured | Missing or invalid API key | Check ~/.pg_ai.config file |
No tables found | No user tables in database | Create some tables or check permissions |
Query generation failed: timeout | Network or API issues | Check connectivity and increase timeout |
Invalid provider: xyz | Wrong provider name | Use ‘openai’, ‘anthropic’, or ‘auto’ |
Table does not exist: xyz | Table name incorrect | Check table name and schema |
Getting Additional Help
Log Analysis
When reporting issues, include:
-
Extension version
SELECT extversion FROM pg_extension WHERE extname = 'pg_ai_query'; -
PostgreSQL version
SELECT version(); -
Error logs with debug enabled
-
Configuration file (with API keys redacted)
Testing Isolation
To isolate problems:
- Test with minimal configuration
- Use simple test tables
- Try different AI providers
- Test basic functions first
Community Resources
- GitHub Issues: Report bugs and feature requests
- Documentation: Check for updated troubleshooting info
- PostgreSQL Community: For database-specific issues
Prevention Best Practices
- Regular Updates: Keep extension and dependencies updated
- Monitoring: Set up logging and monitoring for production use
- Testing: Test configuration changes in development first
- Backups: Always backup before major updates
- Documentation: Document your specific configuration and customizations
Frequently Asked Questions (FAQ)
This page answers common questions about the pg_ai_query extension.
General Questions
What is pg_ai_query?
pg_ai_query is a PostgreSQL extension that converts natural language descriptions into SQL queries using AI models from OpenAI and Anthropic. It integrates directly with your PostgreSQL database to provide intelligent query generation.
How does it work?
- You provide a natural language description of what you want to query
- The extension analyzes your database schema automatically
- It sends the schema context and your request to an AI provider
- The AI generates a PostgreSQL-compatible SQL query
- The extension returns the SQL query for you to execute
Is my data sent to AI providers?
No. Only your database schema metadata (table names, column names, relationships) is sent to AI providers, never your actual data. The extension is designed with privacy in mind.
Installation and Setup
What PostgreSQL versions are supported?
PostgreSQL 12 and later are supported. We recommend PostgreSQL 14 or later for the best experience.
Do I need to install anything besides PostgreSQL?
You need:
- PostgreSQL with development headers
- C++ compiler with C++20 support
- CMake for building
- An API key from OpenAI or Anthropic
Can I use this with cloud PostgreSQL services?
It depends. The extension needs to be compiled and installed on the PostgreSQL server. This works with:
- Self-hosted PostgreSQL
- VPS or cloud instances where you have admin access
- Some managed services that support custom extensions
It won’t work with fully managed services like AWS RDS, Google Cloud SQL, or Azure Database for PostgreSQL that don’t allow custom extensions.
How do I get an API key?
For OpenAI:
- Visit platform.openai.com
- Create an account and add billing information
- Go to API Keys section and create a new key
For Anthropic:
- Visit console.anthropic.com
- Create an account and add credits
- Go to API Keys section and create a new key
Usage Questions
Why does it say “No tables found”?
This happens when:
- Your database has no user-created tables (only system tables)
- The extension user doesn’t have permission to access tables
- You’re in the wrong schema
Solution: Create some tables or check permissions:
-- Check if you have tables
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'public' AND table_type = 'BASE TABLE';
-- Check permissions
SELECT has_table_privilege('your_table', 'SELECT');
Can I use this for INSERT, UPDATE, or DELETE queries?
Yes, but with caution. The extension can generate all types of SQL queries, but:
- Always review generated queries before executing
- Consider using a read-only database user for safety
- Be especially careful with data modification queries
How accurate are the generated queries?
Accuracy depends on several factors:
- Schema quality: Well-named tables and columns improve results
- AI model: GPT-4 and Claude generally produce better results than GPT-3.5
- Query complexity: Simple queries are more accurate than complex ones
- Request clarity: Specific requests generate better queries
Can I customize the AI model used?
Yes. You can specify the model in your configuration:
[openai]
default_model = "gpt-4o" # or "gpt-4", "gpt-3.5-turbo"
[anthropic]
default_model = "claude-3-5-sonnet-20241022"
You can also specify the provider per query:
SELECT generate_query('show users', null, 'openai');
SELECT generate_query('show users', null, 'anthropic');
Performance Questions
How fast is query generation?
Typical response times:
- GPT-3.5-turbo: 1-2 seconds
- GPT-4/GPT-4o: 2-5 seconds
- Claude 3.5 Sonnet: 3-6 seconds
Performance depends on:
- AI provider response time
- Database schema complexity
- Network latency
Does it cache results?
Schema information is cached per PostgreSQL session, but AI responses are not cached. Each query generation makes a fresh API request.
How much does it cost to use?
Costs depend on your AI provider and usage:
- GPT-3.5-turbo: ~$0.001-0.005 per query
- GPT-4: ~$0.10-0.20 per query
- GPT-4o: ~$0.02-0.05 per query
- Claude 3.5 Sonnet: ~$0.03-0.06 per query
Actual costs vary based on schema complexity and query length.
Configuration Questions
Where should I put the configuration file?
Place it at ~/.pg_ai.config in the home directory of the user running PostgreSQL (often the postgres user).
Can I use environment variables instead of a config file?
Currently, only configuration files are supported. Place your configuration at ~/.pg_ai.config.
How do I enable logging for debugging?
[general]
enable_logging = true
log_level = "DEBUG"
Then restart your PostgreSQL session and check the logs.
Can I configure multiple providers?
Yes. Configure both OpenAI and Anthropic:
[openai]
api_key = "sk-proj-openai-key"
default_model = "gpt-4o"
[anthropic]
api_key = "sk-ant-anthropic-key"
default_model = "claude-3-5-sonnet-20241022"
The extension will use the first configured provider as default.
Troubleshooting
“Extension not found” error
Make sure the extension is properly installed:
# Check if extension files exist
ls /usr/share/postgresql/*/extension/pg_ai_query*
# Reinstall if needed
cd pg_ai_query
sudo make install
“API key not configured” error
Check your configuration:
# Verify config file exists and has correct content
cat ~/.pg_ai.config
# Check file permissions
ls -la ~/.pg_ai.config # Should be readable by PostgreSQL user
“Query generation failed” errors
This usually indicates:
- Network connectivity issues
- Invalid API key
- AI provider service problems
- Rate limiting
Solutions:
- Check your internet connection
- Verify API key is correct and active
- Try a different provider
- Wait a few minutes and retry
Generated queries are wrong
Tips for better results:
- Be more specific in your requests
- Use proper table and column names
- Add table comments to provide context
- Try different AI models
- Break complex requests into simpler parts
Extension crashes PostgreSQL
This shouldn’t happen in normal operation. If it does:
- Check PostgreSQL logs for error messages
- Ensure you have the latest version of the extension
- Report the issue with reproduction steps
Security Questions
Is it safe to use in production?
Yes, with proper precautions:
- Use a read-only database user for query generation
- Always review generated queries before execution
- Monitor API usage and costs
- Keep API keys secure
- Enable logging for audit trails
What data is sent to AI providers?
Only database metadata:
- Table names and schemas
- Column names and data types
- Constraint definitions
- Relationship information
Never sent:
- Actual data content
- Database credentials
- User information
Can it generate harmful queries?
The extension includes safety features:
- Automatic LIMIT clauses on SELECT statements
- Blocks access to system tables
- Validates query structure
However, always review generated queries, especially for data modification operations.
How do I secure API keys?
- Set file permissions:
chmod 600 ~/.pg_ai.config - Never commit keys to version control
- Use secure configuration files in production
- Rotate keys regularly
- Monitor usage through provider dashboards
Business Questions
Can I use this commercially?
Yes. The extension itself is open source, but you need to comply with:
- Your AI provider’s terms of service
- PostgreSQL’s license
- Your organization’s data policies
What about compliance (GDPR, HIPAA, etc.)?
Consider these factors:
- Schema metadata may contain sensitive information
- AI providers have different compliance certifications
- You may need data processing agreements with AI providers
- Consider on-premises AI models for sensitive environments
Is there support available?
- Community Support: GitHub issues and discussions
- Documentation: Comprehensive docs with examples
- Enterprise Support: May be available for commercial deployments
Advanced Usage
Can I extend the extension?
The extension is open source and designed to be extensible. You can:
- Add support for new AI providers
- Customize query generation logic
- Add new schema analysis features
- Contribute improvements back to the project
Can I use local AI models?
Currently, only cloud-based OpenAI and Anthropic models are supported. Local model support is being considered for future versions.
How do I optimize for my specific use case?
- Choose appropriate AI models for your needs
- Design your schema with clear, descriptive names
- Add comments to tables and columns
- Create views for complex business logic
- Monitor usage and costs
- Train your users on effective natural language queries
Can I integrate this with business intelligence tools?
Yes. Many BI tools can call PostgreSQL functions, so you can integrate query generation into dashboards and reports. Consider creating wrapper functions for specific business use cases.
Don’t see your question here? Check the Troubleshooting Guide or open an issue on the GitHub repository.