PostgreSQL AI Query Extension

The PostgreSQL AI Query Extension (pg_ai_query) is a powerful PostgreSQL extension that allows you to generate SQL queries from natural language descriptions using state-of-the-art AI models from OpenAI and Anthropic.

What is pg_ai_query?

pg_ai_query bridges the gap between natural language and SQL by leveraging large language models to understand your intent and automatically generate optimized PostgreSQL queries. The extension integrates directly into your PostgreSQL database, providing a seamless way to query your data using plain English.

Key Features

Natural Language to SQL: Convert plain English descriptions into valid PostgreSQL queries
AI-Powered Query Analysis: Analyze query performance with EXPLAIN ANALYZE and get optimization insights
Automatic Schema Discovery: The extension automatically analyzes your database schema to understand table structures, relationships, and constraints
Multiple AI Providers: Support for both OpenAI (GPT-4, GPT-3.5) and Anthropic (Claude) models
Intelligent Query Generation: Generates optimized queries with appropriate JOINs, WHERE clauses, and LIMIT constraints
Performance Optimization: Get AI-powered recommendations for query improvements and index suggestions
Safety First: Built-in protections against dangerous operations and unauthorized access to system tables
Configurable: Flexible configuration system with support for API keys, model selection, and logging
PostgreSQL Native: Runs directly within PostgreSQL as a native extension

How It Works

Schema Analysis: The extension automatically discovers and analyzes your database schema
Natural Language Processing: Your natural language query is processed and understood by AI models
SQL Generation: The AI generates an appropriate SQL query based on your schema and request
Query Validation: The generated query is validated for safety and correctness
Execution Ready: You receive a ready-to-execute SQL query

Example Usage

Query Generation

-- Simple query
SELECT generate_query('show me all users created in the last 7 days');

-- With custom API key
SELECT generate_query('count orders by status', 'your-api-key-here');

-- With specific provider
SELECT generate_query('find top 10 customers by revenue', 'your-api-key', 'openai');

Query Performance Analysis

-- Analyze query performance
SELECT explain_query('SELECT * FROM users WHERE active = true');

-- Get optimization suggestions for complex queries
SELECT explain_query('
    SELECT u.username, COUNT(o.id) as orders
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    GROUP BY u.username
    ORDER BY orders DESC
');

-- Combine generation and analysis
WITH generated AS (
    SELECT generate_query('find high-value recent orders') as query
)
SELECT explain_query((SELECT query FROM generated));

Supported AI Models

OpenAI Models

GPT-4o: Latest and most capable model
GPT-4: High-quality reasoning and code generation
GPT-3.5 Turbo: Fast and efficient for simpler queries

Anthropic Models

Claude 3.5 Sonnet: Advanced reasoning and natural language understanding

Use Cases

Data Exploration: Quickly explore your data without writing complex SQL
Query Optimization: Analyze and optimize slow-performing queries with AI insights
Performance Monitoring: Regular analysis of critical queries for performance regression
Business Intelligence: Generate reports and analytics queries from natural descriptions
Learning SQL: Understand how natural language translates to SQL syntax and learn optimization techniques
Index Planning: Get AI-powered recommendations for database index strategies
Rapid Prototyping: Quickly generate queries for testing and development
Documentation: Generate example queries for database documentation

Architecture

The extension consists of several key components:

Query Generator: Core engine that processes natural language and generates SQL
Query Analyzer: Performance analysis engine using EXPLAIN ANALYZE and AI insights
Schema Discovery: Automatically analyzes database structure and relationships
AI Provider Integration: Handles communication with OpenAI and Anthropic APIs
Configuration Manager: Manages settings, API keys, and model configurations
Safety Validator: Ensures generated queries are safe and authorized

Security and Safety

pg_ai_query is designed with security in mind:

No System Access: Cannot access system catalogs or sensitive PostgreSQL internals
User Table Focus: Only operates on user-created tables and data
Query Validation: All generated queries are validated before return
Configurable Limits: Built-in LIMIT enforcement to prevent large result sets
API Key Protection: Secure handling of API credentials

Getting Started

Ready to start generating SQL queries from natural language? Head over to the Installation guide to get started, or check out our Quick Start tutorial for a hands-on introduction.

Repository

The source code is available on GitHub. Feel free to contribute, report issues, or explore the implementation.

Installation

This guide will walk you through installing the PostgreSQL AI Query extension on your system.

Prerequisites

Before installing pg_ai_query, ensure you have the following:

System Requirements

PostgreSQL 12+: The extension requires PostgreSQL version 12 or later
Operating System: Linux, macOS, or Windows (with proper development tools)
C++ Compiler: GCC 8+ or Clang 10+ with C++20 support
CMake: Version 3.15 or later for building

Development Dependencies

PostgreSQL development headers (postgresql-devel on RHEL/CentOS, postgresql-server-dev-all on Ubuntu/Debian)
OpenSSL development libraries
Git (for cloning the repository)

AI Provider Requirements

You’ll need API access to at least one of the following:

OpenAI API: Create an account at platform.openai.com
Anthropic API: Create an account at console.anthropic.com

Installation Methods

Method 1: Build from Source (Recommended)

1. Clone the Repository

git clone https://github.com/benodiwal/pg_ai_query.git
cd pg_ai_query

2. Initialize Submodules

git submodule update --init --recursive

3. Build the Extension

make clean
make

4. Install the Extension

# Install to default PostgreSQL location
sudo make install

# Or specify a custom PostgreSQL installation
PG_CONFIG=/path/to/pg_config make install

Method 2: Package Installation (Future)

Package installations will be available in future releases for major Linux distributions and package managers.

Platform-Specific Instructions

Ubuntu/Debian

# Install dependencies
sudo apt-get update
sudo apt-get install postgresql-server-dev-all cmake gcc g++ libssl-dev git

# Clone and build
git clone https://github.com/benodiwal/pg_ai_query.git
cd pg_ai_query
git submodule update --init --recursive
make && sudo make install

CentOS/RHEL/Fedora

# Install dependencies
sudo yum install postgresql-devel cmake gcc-c++ openssl-devel git
# OR on newer systems:
sudo dnf install postgresql-devel cmake gcc-c++ openssl-devel git

# Clone and build
git clone https://github.com/benodiwal/pg_ai_query.git
cd pg_ai_query
git submodule update --init --recursive
make && sudo make install

macOS

# Install dependencies (with Homebrew)
brew install postgresql cmake openssl git

# Clone and build
git clone https://github.com/benodiwal/pg_ai_query.git
cd pg_ai_query
git submodule update --init --recursive
make && sudo make install

Windows

For Windows installation, you’ll need:

Visual Studio 2019 or later with C++ tools
PostgreSQL installed from EDB or compiled from source
CMake for Windows

Detailed Windows installation instructions will be added in a future update.

Verification

After installation, verify that the extension is properly installed:

1. Connect to PostgreSQL

psql -d your_database

2. Create the Extension

CREATE EXTENSION IF NOT EXISTS pg_ai_query;

3. Test the Installation

-- Check if functions are available
\\df generate_query

-- Test with a simple query (will fail without API key, which is expected)
SELECT generate_query('show me all tables');

You should see the function listed and get an error about missing API configuration (which is normal - we’ll configure that next).

Troubleshooting Installation

Common Issues

PostgreSQL Development Headers Missing

Error: postgres.h: No such file or directory

Solution: Install PostgreSQL development packages:

# Ubuntu/Debian
sudo apt-get install postgresql-server-dev-all

# CentOS/RHEL
sudo yum install postgresql-devel

CMake Not Found

Error: cmake: command not found

Solution: Install CMake:

# Ubuntu/Debian
sudo apt-get install cmake

# macOS
brew install cmake

# CentOS/RHEL
sudo yum install cmake

Compilation Errors

Error: Various C++ compilation errors

Solutions:

Ensure you have a C++20-compatible compiler
Check that all submodules are initialized: git submodule update --init --recursive
Clean and rebuild: make clean && make

Permission Denied During Installation

Error: Permission denied when running make install

Solution: Use sudo for installation:

sudo make install

Wrong PostgreSQL Installation

Error: Extension installs to wrong PostgreSQL version

Solution: Specify the correct pg_config:

PG_CONFIG=/usr/pgsql-14/bin/pg_config make install

Getting Help

If you encounter issues not covered here:

Check the Troubleshooting guide
Review the FAQ for common questions
Open an issue on the GitHub repository

Next Steps

Once installation is complete:

Configure the extension with your API keys
Follow the Quick Start Guide to generate your first queries
Explore Usage Examples for inspiration

Configuration

The pg_ai_query extension supports flexible configuration through a configuration file that allows you to customize AI providers, API keys, logging, and query behavior.

Configuration File Location

The configuration file should be placed at:

~/.pg_ai.config

This file will be automatically detected and loaded when the extension is first used.

Configuration File Format

The configuration uses an INI-style format with sections and key-value pairs:

# PG AI Query Configuration File
[section_name]
key = value

Complete Configuration Example

Here’s a complete example configuration file with all available options:

# PG AI Query Configuration File
# Place this file at ~/.pg_ai.config

[general]
# Logging level: DEBUG, INFO, WARNING, ERROR
log_level = "INFO"

# Enable or disable all logging output (true/false)
enable_logging = false

# Request timeout in milliseconds
request_timeout_ms = 30000

# Maximum number of retries for failed requests
max_retries = 3

[query]
# Always enforce LIMIT clause on SELECT queries
enforce_limit = true

# Default LIMIT value when not specified by user
default_limit = 1000

[response]
# Show detailed explanation of what the query does
show_explanation = true

# Show warnings about performance, security, or data implications
show_warnings = true

# Show suggested visualization type for query results
show_suggested_visualization = true

# Use formatted response (JSON format) instead of plain SQL
# When enabled, returns structured JSON with query, explanation, warnings, etc.
# When disabled, returns plain SQL with optional comments
use_formatted_response = false

[openai]
# Your OpenAI API key
api_key = "sk-your-openai-api-key-here"

# Default model to use (options: gpt-4o, gpt-4, gpt-3.5-turbo)
default_model = "gpt-4o"

[anthropic]
# Your Anthropic API key (if using Claude)
api_key = "sk-ant-your-anthropic-api-key-here"

# Default model to use (options: claude-3-5-sonnet-20241022)
default_model = "claude-3-5-sonnet-20241022"

Configuration Sections

[general] Section

Controls general behavior of the extension.

Option	Type	Default	Description
`log_level`	string	“INFO”	Minimum level for log messages: DEBUG, INFO, WARNING, ERROR
`enable_logging`	boolean	false	Enable/disable all logging output
`request_timeout_ms`	integer	30000	Timeout for AI API requests in milliseconds
`max_retries`	integer	3	Maximum retry attempts for failed API requests

[query] Section

Controls query generation behavior.

Option	Type	Default	Description
`enforce_limit`	boolean	true	Always add LIMIT clause to SELECT queries
`default_limit`	integer	1000	Default row limit when none specified

[response] Section

Controls how query results are formatted and what additional information is included.

Option	Type	Default	Description
`show_explanation`	boolean	true	Include detailed explanation of what the query does
`show_warnings`	boolean	true	Include warnings about performance, security, or data implications
`show_suggested_visualization`	boolean	false	Include suggested visualization type for the query results
`use_formatted_response`	boolean	false	Return structured JSON instead of plain SQL

[openai] Section

OpenAI provider configuration.

Option	Type	Default	Description
`api_key`	string	“”	Your OpenAI API key from platform.openai.com
`default_model`	string	“gpt-4o”	Default OpenAI model to use

Available OpenAI Models:

gpt-4o - Latest GPT-4 Omni model (recommended)
gpt-4 - High-quality GPT-4 model
gpt-3.5-turbo - Fast and efficient model

[anthropic] Section

Anthropic (Claude) provider configuration.

Option	Type	Default	Description
`api_key`	string	“”	Your Anthropic API key from console.anthropic.com
`default_model`	string	“claude-3-5-sonnet-20241022”	Default Claude model to use

Available Anthropic Models:

claude-3-5-sonnet-20241022 - Latest Claude 3.5 Sonnet model

Setting Up API Keys

Getting an OpenAI API Key

Visit platform.openai.com
Create an account or sign in
Navigate to API Keys section
Create a new API key
Copy the key and add it to your config file

Getting an Anthropic API Key

Visit console.anthropic.com
Create an account or sign in
Navigate to API Keys section
Create a new API key
Copy the key and add it to your config file

Provider Selection Priority

The extension automatically selects an AI provider based on the following priority:

Explicit provider parameter in function call
First configured provider with a valid API key in config file
Error if no providers are configured

Configuration Validation

The extension validates configuration on startup:

API Key Format: Checks that API keys follow expected format
Model Availability: Validates that specified models are available
Numeric Values: Ensures timeouts and limits are positive integers
Boolean Values: Validates true/false values

Configuration File Only

Currently, only configuration files are supported. All settings must be specified in ~/.pg_ai.config.

Security Considerations

API Key Security

Never commit API keys to version control
Use appropriate file permissions: chmod 600 ~/.pg_ai.config
Rotate keys regularly as per your organization’s security policy
Monitor usage through your AI provider’s dashboard

Network Security

API requests are made over HTTPS
Consider firewall rules for outbound connections to AI providers
Monitor network traffic for unexpected API usage

Database Security

The extension only accesses user tables (not system catalogs)
Generated queries respect PostgreSQL’s permission system
Consider using dedicated database users with limited privileges

Debugging Configuration

Enable Logging

To troubleshoot configuration issues, enable logging:

[general]
enable_logging = true
log_level = "DEBUG"

Check Configuration Loading

You can verify configuration loading by observing log messages when first calling the extension:

-- This will log configuration loading details
SELECT generate_query('test query');

Validate API Keys

Test your API configuration:

-- Test with explicit API key
SELECT generate_query('show tables', 'your-api-key-here', 'openai');

Updating Configuration

Configuration is loaded on first use. To reload after making changes:

Disconnect from PostgreSQL
Reconnect to your database
Call any extension function to trigger reload

Or restart your PostgreSQL session.

Next Steps

Once configured:

Follow the Quick Start Guide to test your setup
Explore Usage Examples
Learn about AI Providers to optimize your model selection

Response Formatting

The pg_ai_query extension offers flexible response formatting options that allow you to customize how query results are presented. You can choose to receive just the SQL query, or enhance it with explanations, warnings, and visualization suggestions.

Response Format Options

Basic SQL Response (Default)

By default, the extension returns only the generated SQL query:

SELECT generate_query('show all customers');
-- Returns: SELECT * FROM customers LIMIT 1000;

Enhanced Text Response

When explanation and warnings are enabled, the response includes additional context as SQL comments:

-- Configuration: show_explanation=true, show_warnings=true, use_formatted_response=false
SELECT generate_query('find customers with high order values');
-- Returns:
-- SELECT c.customer_id, c.name, SUM(o.total_amount) as total_spent
-- FROM customers c
-- JOIN orders o ON c.customer_id = o.customer_id
-- GROUP BY c.customer_id, c.name
-- HAVING SUM(o.total_amount) > 1000
-- ORDER BY total_spent DESC
-- LIMIT 1000;
--
-- Explanation:
-- This query finds customers who have spent more than $1000 total across all their orders,
-- showing their total spending ranked from highest to lowest.
--
-- Warnings:
-- 1. Large dataset: This query may be slow on tables with millions of rows
-- 2. Consider indexing: customer_id and total_amount columns should be indexed

JSON Response

When use_formatted_response=true, the extension returns structured JSON:

{
  "query": "SELECT c.customer_id, c.name, SUM(o.total_amount) as total_spent FROM customers c JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.name HAVING SUM(o.total_amount) > 1000 ORDER BY total_spent DESC LIMIT 1000;",
  "success": true,
  "explanation": "This query finds customers who have spent more than $1000 total across all their orders, showing their total spending ranked from highest to lowest.",
  "warnings": [
    "Large dataset: This query may be slow on tables with millions of rows",
    "Consider indexing: customer_id and total_amount columns should be indexed"
  ],
  "suggested_visualization": "bar",
  "row_limit_applied": true
}

Configuration Options

Response Settings

Add these options to the [response] section of your ~/.pg_ai.config file:

[response]
# Show detailed explanation of what the query does
show_explanation = true

# Show warnings about performance, security, or data implications
show_warnings = true

# Show suggested visualization type for query results
show_suggested_visualization = true

# Use formatted response (JSON format) instead of plain SQL
use_formatted_response = false

Individual Option Details

`show_explanation`

Type: Boolean
Default: true
Description: Includes a plain English explanation of what the query does, including business context and logic

`show_warnings`

Type: Boolean
Default: true
Description: Includes warnings about:
- Performance implications for large datasets
- Security considerations
- Potential data quality issues
- Index recommendations
- Best practice suggestions

`show_suggested_visualization`

Type: Boolean
Default: false
Description: Suggests the most appropriate visualization type based on the query structure and result type:
- bar - For categorical comparisons
- line - For time series data
- pie - For part-to-whole relationships
- table - For detailed data exploration
- none - When visualization isn’t applicable

`use_formatted_response`

Type: Boolean
Default: false
Description: When enabled, returns structured JSON with all components. When disabled, returns SQL with optional comment annotations.

Response Examples

Sales Analysis Query

Input:

SELECT generate_query('monthly sales trend for this year');

Text Response (use_formatted_response=false):

SELECT
    DATE_TRUNC('month', order_date) as month,
    SUM(total_amount) as monthly_sales,
    COUNT(*) as order_count
FROM orders
WHERE EXTRACT(year FROM order_date) = EXTRACT(year FROM CURRENT_DATE)
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month
LIMIT 1000;

-- Explanation:
-- Calculates total sales and order count by month for the current year,
-- useful for identifying seasonal trends and business performance patterns.

-- Suggested Visualization: line
-- Reasoning: Time series data showing trends over months is best displayed as a line chart

JSON Response (use_formatted_response=true):

{
  "query": "SELECT DATE_TRUNC('month', order_date) as month, SUM(total_amount) as monthly_sales, COUNT(*) as order_count FROM orders WHERE EXTRACT(year FROM order_date) = EXTRACT(year FROM CURRENT_DATE) GROUP BY DATE_TRUNC('month', order_date) ORDER BY month LIMIT 1000;",
  "success": true,
  "explanation": "Calculates total sales and order count by month for the current year, useful for identifying seasonal trends and business performance patterns.",
  "warnings": [],
  "suggested_visualization": "line",
  "row_limit_applied": true
}

Complex Join Query

Input:

SELECT generate_query('customers who bought more than 5 products in the electronics category');

Enhanced Response:

SELECT
    c.customer_id,
    c.name,
    COUNT(DISTINCT oi.product_id) as electronics_products_bought
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE p.category = 'electronics'
GROUP BY c.customer_id, c.name
HAVING COUNT(DISTINCT oi.product_id) > 5
ORDER BY electronics_products_bought DESC
LIMIT 1000;

-- Explanation:
-- Finds customers who have purchased more than 5 different electronics products,
-- showing customer details and the count of unique electronics products they bought.

-- Warnings:
-- 1. Multiple JOINs: This query involves 4 table joins which may be slow on large datasets
-- 2. Index recommendation: Ensure indexes exist on customer_id, order_id, product_id, and category columns
-- 3. Category filter: Verify 'electronics' is the exact category name used in your database

-- Suggested Visualization: table
-- Reasoning: Customer list with counts is best displayed in a table format for detailed analysis

Using Responses in Applications

Parsing JSON Responses

When using use_formatted_response=true, you can parse the JSON in your application:

import psycopg2
import json

# Execute query
cursor.execute("SELECT generate_query('top selling products')")
response_json = cursor.fetchone()[0]

# Parse the JSON response
response = json.loads(response_json)

# Extract components
sql_query = response['query']
explanation = response.get('explanation', '')
warnings = response.get('warnings', [])
visualization = response.get('suggested_visualization', 'table')

# Use the SQL query for actual data retrieval
cursor.execute(sql_query)
results = cursor.fetchall()

Working with Text Responses

For text responses with comments, you can extract just the SQL:

def extract_sql_from_response(response_text):
    """Extract just the SQL query from a text response."""
    lines = response_text.split('\n')
    sql_lines = []

    for line in lines:
        # Skip comment lines
        if line.strip().startswith('--'):
            break
        if line.strip():
            sql_lines.append(line)

    return '\n'.join(sql_lines)

# Usage
response = cursor.fetchone()[0]
clean_sql = extract_sql_from_response(response)

Best Practices

For Development

Enable all response features during development for better understanding
Use JSON format for programmatic parsing
Pay attention to warnings for performance optimization

For Production

Consider disabling detailed explanations to reduce response size
Keep warnings enabled for monitoring potential issues
Use text format for human-readable responses

For Business Intelligence

Enable visualization suggestions to guide chart creation
Use explanations to document generated queries
Monitor warnings for data quality issues

Performance Considerations

Response Size

JSON responses are larger than text responses
Explanations and warnings add to response size
Consider response formatting impact on network transfer

AI Token Usage

Enhanced responses require more AI processing
Detailed explanations use additional API tokens
Balance information value vs. cost

Caching

Consider caching formatted responses for repeated queries
JSON responses are easier to cache and manipulate
Text responses are more human-readable but harder to parse

Troubleshooting Response Formatting

Configuration Not Applied

If response formatting isn’t working:

Check configuration file location: ~/.pg_ai.config
Verify configuration syntax (no quotes around true/false values)
Restart PostgreSQL connection to reload configuration
Check PostgreSQL logs for configuration errors

Incomplete Responses

If responses are missing expected components:

Verify AI provider supports detailed responses
Check API rate limits and quotas
Review request timeout settings
Enable logging to see full AI responses

JSON Parsing Errors

When using use_formatted_response=true:

Ensure AI provider returns valid JSON
Check for truncated responses due to token limits
Verify your JSON parsing library handles the response format
Use text format as fallback if JSON parsing fails

Quick Start Guide

This guide will get you up and running with pg_ai_query in just a few minutes. We’ll walk through setting up the extension, configuring it, and generating your first AI-powered SQL queries.

Step 1: Install and Enable the Extension

First, make sure you have pg_ai_query installed (see Installation Guide if needed), then enable it in your database:

-- Connect to your database
psql -d your_database

-- Enable the extension
CREATE EXTENSION IF NOT EXISTS pg_ai_query;

-- Verify installation
\df generate_query

You should see the generate_query function listed.

Step 2: Configure API Access

Create a configuration file at ~/.pg_ai.config:

# Quick start configuration
[general]
enable_logging = true

[openai]
api_key = "your-openai-api-key-here"
default_model = "gpt-4o"

Get your OpenAI API key:

Visit platform.openai.com
Sign up or log in
Create an API key in the API keys section
Replace your-openai-api-key-here with your actual key

Step 3: Create Sample Data

Let’s create some sample tables to work with:

-- Create a users table
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(150) UNIQUE NOT NULL,
    age INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    city VARCHAR(50)
);

-- Create an orders table
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    product_name VARCHAR(200),
    amount DECIMAL(10,2),
    status VARCHAR(20) DEFAULT 'pending',
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Insert sample data
INSERT INTO users (name, email, age, city) VALUES
    ('Alice Johnson', 'alice@example.com', 28, 'New York'),
    ('Bob Smith', 'bob@example.com', 35, 'San Francisco'),
    ('Carol Davis', 'carol@example.com', 31, 'Chicago'),
    ('David Wilson', 'david@example.com', 27, 'Seattle'),
    ('Eva Brown', 'eva@example.com', 33, 'Boston');

INSERT INTO orders (user_id, product_name, amount, status) VALUES
    (1, 'Laptop Pro', 1299.99, 'completed'),
    (1, 'Wireless Mouse', 79.99, 'completed'),
    (2, 'Monitor 4K', 399.99, 'pending'),
    (3, 'Keyboard Mechanical', 159.99, 'completed'),
    (4, 'Tablet', 599.99, 'shipped'),
    (5, 'Phone Case', 29.99, 'completed');

Step 4: Generate Your First Queries

Now let’s use natural language to query our data:

Basic Query

-- Simple user lookup
SELECT generate_query('show me all users');

Result:

SELECT id, name, email, age, created_at, city FROM public.users LIMIT 1000;

Query with Conditions

-- Users from specific city
SELECT generate_query('find all users from New York');

Result:

SELECT id, name, email, age, created_at, city
FROM public.users
WHERE city = 'New York'
LIMIT 1000;

Query with Joins

-- Orders with user information
SELECT generate_query('show me all orders with customer names');

Result:

SELECT o.id, o.product_name, o.amount, o.status, o.order_date, u.name as customer_name
FROM public.orders o
JOIN public.users u ON o.user_id = u.id
LIMIT 1000;

Aggregation Query

-- Revenue analysis
SELECT generate_query('calculate total revenue by order status');

Result:

SELECT status, SUM(amount) as total_revenue
FROM public.orders
GROUP BY status
LIMIT 1000;

Step 5: Execute Generated Queries

The extension generates SQL queries that you can execute immediately:

-- Generate and execute in one step
WITH generated AS (
    SELECT generate_query('show users older than 30') as query
)
SELECT query FROM generated;

-- Copy the generated query and run it:
SELECT id, name, email, age, created_at, city
FROM public.users
WHERE age > 30
LIMIT 1000;

Step 6: Explore Advanced Features

Schema Discovery

Check what tables the extension can see:

-- View all tables in your database
SELECT get_database_tables();

-- Get detailed information about a specific table
SELECT get_table_details('users');

Different AI Providers

If you have multiple providers configured:

-- Use specific provider
SELECT generate_query('show recent orders', null, 'openai');
SELECT generate_query('show recent orders', null, 'anthropic');

Complex Queries

Try more complex natural language requests:

-- Complex aggregation
SELECT generate_query('show top 3 customers by total order amount with their contact info');

-- Date-based filtering
SELECT generate_query('find orders placed in the last 7 days');

-- Multiple conditions
SELECT generate_query('show users from California or New York who are older than 25');

Common Patterns

1. Exploratory Data Analysis

-- Understand your data structure
SELECT generate_query('describe the structure of my database');
SELECT generate_query('show me sample data from each table');
SELECT generate_query('count records in all tables');

2. Business Intelligence

-- Sales analysis
SELECT generate_query('monthly revenue trends');
SELECT generate_query('top selling products');
SELECT generate_query('customer acquisition by month');

3. Data Quality Checks

-- Find data issues
SELECT generate_query('find users with missing email addresses');
SELECT generate_query('show duplicate orders');
SELECT generate_query('find orders without valid user references');

Troubleshooting Quick Start

Error: “Extension not found”

-- Solution: Install the extension first
CREATE EXTENSION pg_ai_query;

Error: “API key not configured”

Check your ~/.pg_ai.config file exists
Verify the API key is valid
Restart your PostgreSQL session

Error: “No tables found”

Make sure you have user tables (not just system tables)
Check table permissions
Try creating the sample tables from Step 3

Unexpected Results

Enable logging to see what’s happening:

[general]
enable_logging = true
log_level = "DEBUG"

Check the generated query makes sense
Try simpler natural language descriptions

Next Steps

Now that you’re up and running:

Learn more about Usage Patterns for advanced query generation
Explore Examples for inspiration
Read about AI Providers to optimize your model choice
Check Best Practices for production usage

Getting Help

If you run into issues:

Check the Troubleshooting Guide
Review the FAQ
Look at more Examples for inspiration

generate_query Function

The generate_query function converts natural language descriptions into valid PostgreSQL queries using AI models from OpenAI and Anthropic.

Function Signature

generate_query(
    natural_language_query text,
    api_key text DEFAULT NULL,
    provider text DEFAULT 'auto'
) RETURNS text

Parameters

Parameter	Type	Default	Description
`natural_language_query`	`text`	required	The natural language description of the desired query
`api_key`	`text`	`NULL`	OpenAI or Anthropic API key (uses config if not provided)
`provider`	`text`	`'auto'`	AI provider: `'openai'`, `'anthropic'`, or `'auto'`

Basic Usage

Simple Queries

-- Basic data retrieval
SELECT generate_query('show all users');

-- With filtering
SELECT generate_query('find users created in the last week');

-- Counting records
SELECT generate_query('count total orders');

Complex Queries

-- Joins and aggregation
SELECT generate_query('show top 10 customers by total order value with their email addresses');

-- Date-based analysis
SELECT generate_query('monthly revenue trend for the last year');

Configuration

API Key Setup

Create ~/.pg_ai.config:

[openai]
api_key = "sk-your-openai-api-key"
default_model = "gpt-4o"

[anthropic]
api_key = "sk-ant-your-anthropic-key"
default_model = "claude-3-5-sonnet-20241022"

explain_query Function

The explain_query function runs EXPLAIN ANALYZE on PostgreSQL queries and provides AI-powered performance analysis and optimization recommendations.

Overview

This function combines PostgreSQL’s built-in EXPLAIN ANALYZE functionality with advanced AI analysis to provide:

Detailed execution plan analysis
Performance bottleneck identification
Index recommendations
Query optimization suggestions
Easy-to-understand explanations in plain English

Function Signature

explain_query(
    query_text text,
    api_key text DEFAULT NULL,
    provider text DEFAULT 'auto'
) RETURNS text

Parameters

Parameter	Type	Default	Description
`query_text`	`text`	required	The SQL query to analyze
`api_key`	`text`	`NULL`	OpenAI or Anthropic API key (uses config if not provided)
`provider`	`text`	`'auto'`	AI provider: `'openai'`, `'anthropic'`, or `'auto'`

Basic Usage

Simple Query Analysis

SELECT explain_query('SELECT * FROM users WHERE created_at > NOW() - INTERVAL ''7 days''');

Complex Query Analysis

SELECT explain_query('
    SELECT u.username, COUNT(o.id) as order_count
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    WHERE u.created_at > NOW() - INTERVAL ''30 days''
    GROUP BY u.id, u.username
    HAVING COUNT(o.id) > 5
    ORDER BY order_count DESC
    LIMIT 100
');

Using Specific AI Provider

-- Use OpenAI specifically
SELECT explain_query(
    'SELECT * FROM products WHERE price > 100 ORDER BY price DESC LIMIT 10',
    'your-openai-api-key',
    'openai'
);

-- Use Anthropic specifically
SELECT explain_query(
    'SELECT * FROM products WHERE price > 100 ORDER BY price DESC LIMIT 10',
    'your-anthropic-api-key',
    'anthropic'
);

Output Format

The function returns a structured text analysis with these sections:

Query Overview

Brief description of what the query accomplishes.

Performance Summary

Overall execution time
Total cost estimate (PostgreSQL’s relative measure)
Number of rows processed

Execution Plan Analysis

Key operations in the execution plan
Join strategies and scan methods
Focus on expensive operations

Performance Issues

Identified bottlenecks
Inefficient operations
Resource usage concerns

Optimization Suggestions

Specific recommendations for improvement
Query rewriting suggestions
Configuration recommendations

Index Recommendations

Missing indexes that could improve performance
Specific CREATE INDEX statements
Partial index suggestions where applicable

Example Output

Query Overview:
This query retrieves users created within the last 7 days along with their order statistics,
focusing on active customers with more than 5 orders.

Performance Summary:
- Overall Execution Time: 45.2 milliseconds
- Total Cost: 1250.75 (PostgreSQL's relative cost estimate)
- Rows Processed: 156 rows returned from 50,000 rows examined

Execution Plan Analysis:
- Hash Join: Efficiently joins users and orders tables
- Index Scan: Uses existing index on users.created_at
- Sequential Scan: Full table scan on orders table (potential bottleneck)
- HashAggregate: Groups results for COUNT calculations
- Sort: Orders results by order count

Performance Issues:
- Sequential scan on orders table indicates missing index on user_id
- Hash join spills to disk due to large orders table size
- HAVING clause applied after aggregation, could be optimized

Optimization Suggestions:
1. Add index on orders.user_id to eliminate sequential scan
2. Consider partitioning orders table by date if very large
3. Move some HAVING conditions to WHERE clause if possible
4. Increase work_mem setting if hash joins frequently spill to disk

Index Recommendations:
-- Primary recommendation
CREATE INDEX idx_orders_user_id ON orders(user_id);

-- Optional: Composite index for better performance
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);

-- Consider partial index for recent orders
CREATE INDEX idx_orders_recent ON orders(user_id) WHERE created_at > NOW() - INTERVAL '1 year';

Supported Query Types

The function supports analysis of:

SELECT statements
WITH (Common Table Expressions) queries
VALUES clauses

Note: Only read-only queries are supported for security reasons. DDL, DML, and other statement types will return an error.

Configuration

The function uses the same configuration system as other pg_ai_query functions.

API Keys

Configure API keys in ~/.pg_ai.config:

[openai]
api_key = "your-openai-api-key"
default_model = "gpt-4o"

[anthropic]
api_key = "your-anthropic-api-key"
default_model = "claude-3-5-sonnet-20241022"

Provider Selection

'auto' (default): Uses the first available API key from configuration
'openai': Forces use of OpenAI models
'anthropic': Forces use of Anthropic models

Error Handling

Common error scenarios and their solutions:

Invalid Query

SELECT explain_query('SELECT * FROM non_existent_table');
-- Error: relation "non_existent_table" does not exist

Missing API Key

-- When no API key is configured
SELECT explain_query('SELECT * FROM users');
-- Error: API key required. Pass as parameter or configure ~/.pg_ai.config

Syntax Error

SELECT explain_query('SELECT * FORM users'); -- typo in FROM
-- Error: syntax error at or near "FORM"

Unsupported Query Type

SELECT explain_query('DROP TABLE users');
-- Error: Only SELECT, WITH, and VALUES queries are allowed

Performance Considerations

Query Execution: The function actually executes your query via EXPLAIN ANALYZE
Execution Time: Query execution time is included in the analysis
AI Processing: AI analysis adds typically 1-3 seconds of processing time
Large Queries: Very complex queries may take longer to analyze

Security Notes

Queries are executed with the same permissions as the calling user
No privilege escalation occurs
API keys are handled securely and not logged
Only read-only query types are permitted

Best Practices

Use for Optimization: Run on queries that are performing slowly
Test Variations: Compare different approaches to the same query
Monitor Trends: Regular analysis helps track performance changes over time
Validate Recommendations: Always test suggested indexes before implementing in production
Consider Data Volume: Remember that performance characteristics change with data size

Integration with Other Functions

The explain_query function works well with other pg_ai_query functions:

-- Generate a query, then analyze its performance
WITH generated_query AS (
    SELECT generate_query('show recent high-value orders') as sql
)
SELECT explain_query((SELECT sql FROM generated_query));

Common Use Cases

Query Optimization: Identify and fix slow queries
Index Planning: Determine what indexes to create
Performance Monitoring: Regular health checks of critical queries
Code Reviews: Analyze query performance before deployment
Learning: Understand how PostgreSQL executes different query patterns

get_database_tables Function

The get_database_tables function returns metadata about all user tables in the database, providing a comprehensive overview of your database schema.

Function Signature

get_database_tables() RETURNS text

Parameters

This function takes no parameters.

Basic Usage

-- Get all database tables
SELECT get_database_tables();

-- Pretty print with formatting
SELECT jsonb_pretty(get_database_tables()::jsonb);

Output Format

The function returns a JSON array containing metadata for each table:

[
  {
    "table_name": "users",
    "schema_name": "public",
    "table_type": "BASE TABLE",
    "estimated_rows": 1500
  },
  {
    "table_name": "orders",
    "schema_name": "public",
    "table_type": "BASE TABLE",
    "estimated_rows": 5000
  }
]

Extracting Information

Get Table Names Only

SELECT
    jsonb_array_elements(get_database_tables()::jsonb)->>'table_name' as table_name
FROM (SELECT get_database_tables()) t;

Get Tables with Row Counts

SELECT
    table_name,
    estimated_rows
FROM jsonb_to_recordset(get_database_tables()::jsonb)
AS x(table_name text, estimated_rows int)
ORDER BY estimated_rows DESC;

Use Cases

Database exploration and documentation
Schema discovery for applications
Inventory of available tables
Integration with generate_query function

get_table_details Function

The get_table_details function returns detailed information about a specific table including columns, constraints, foreign keys, and indexes.

Function Signature

get_table_details(
    table_name text,
    schema_name text DEFAULT 'public'
) RETURNS text

Parameters

Parameter	Type	Default	Description
`table_name`	`text`	required	Name of the table to analyze
`schema_name`	`text`	`'public'`	Schema containing the table

Basic Usage

-- Basic usage
SELECT get_table_details('users');

-- Specific schema
SELECT get_table_details('orders', 'sales');

-- Pretty print
SELECT jsonb_pretty(get_table_details('users')::jsonb);

Output Format

The function returns detailed JSON information about the table:

{
  "table_name": "users",
  "schema_name": "public",
  "columns": [
    {
      "column_name": "id",
      "data_type": "integer",
      "is_nullable": false,
      "column_default": "nextval('users_id_seq'::regclass)",
      "is_primary_key": true,
      "is_foreign_key": false
    },
    {
      "column_name": "email",
      "data_type": "character varying",
      "is_nullable": false,
      "column_default": null,
      "is_primary_key": false,
      "is_foreign_key": false
    }
  ],
  "indexes": [
    "users_pkey",
    "users_email_key"
  ]
}

Extracting Information

Get Column Information

SELECT
    column_name,
    data_type,
    is_nullable,
    is_primary_key
FROM jsonb_to_recordset(
    (get_table_details('users')::jsonb)->'columns'
) AS x(
    column_name text,
    data_type text,
    is_nullable boolean,
    is_primary_key boolean
);

Find Primary Key Columns

SELECT
    column_name
FROM jsonb_to_recordset(
    (get_table_details('users')::jsonb)->'columns'
) AS x(column_name text, is_primary_key boolean)
WHERE is_primary_key = true;

List All Indexes

SELECT
    jsonb_array_elements_text(
        (get_table_details('orders')::jsonb)->'indexes'
    ) as index_name;

Use Cases

Schema Documentation: Generate comprehensive table documentation
Database Exploration: Understand table structure before writing queries
Data Migration Planning: Analyze table structure for migration scripts
Query Optimization: Understand available indexes and constraints
Integration Planning: Map foreign key relationships

Common Patterns

Analyzing All Tables

-- Get details for all tables in database
WITH all_tables AS (
    SELECT
        jsonb_array_elements(get_database_tables()::jsonb)->>'table_name' as table_name
)
SELECT
    table_name,
    get_table_details(table_name) as details
FROM all_tables;

Finding Foreign Key Relationships

SELECT
    column_name,
    foreign_table,
    foreign_column
FROM jsonb_to_recordset(
    (get_table_details('orders')::jsonb)->'columns'
) AS x(
    column_name text,
    is_foreign_key boolean,
    foreign_table text,
    foreign_column text
)
WHERE is_foreign_key = true;

Integration with Other Functions

With generate_query

-- First understand the table structure
SELECT get_table_details('users');

-- Then generate queries based on that knowledge
SELECT generate_query('show users with their profile information');

With explain_query

-- Analyze table structure
SELECT get_table_details('orders');

-- Generate and analyze a query
SELECT explain_query('SELECT * FROM orders WHERE user_id = 123');

Error Handling

Common errors and solutions:

Error	Cause	Solution
`"Table 'tablename' does not exist"`	Table not found	Check table name and schema
`"Access denied to table"`	Insufficient permissions	Grant SELECT permission on table
`"Schema 'schemaname' does not exist"`	Schema not found	Verify schema name

Basic Usage

This guide covers the fundamental ways to use pg_ai_query for generating SQL queries from natural language. Whether you’re exploring data, creating reports, or learning SQL, this guide will help you get the most out of the extension.

Core Function: generate_query()

The main function you’ll use is generate_query(), which converts natural language into SQL:

generate_query(natural_language_query, [api_key], [provider])

Parameters

Parameter	Type	Required	Default	Description
`natural_language_query`	text	Yes	-	Your question in plain English
`api_key`	text	No	null	API key (uses config if not provided)
`provider`	text	No	‘auto’	AI provider: ‘openai’, ‘anthropic’, or ‘auto’

Basic Examples

-- Simplest form - uses configured API key and provider
SELECT generate_query('show me all users');

-- With custom API key
SELECT generate_query('count total orders', 'your-api-key-here');

-- With specific provider
SELECT generate_query('find recent customers', null, 'openai');

Query Types

1. Simple Data Retrieval

Goal: Get basic data from tables

-- Show all records
SELECT generate_query('show me all products');
SELECT generate_query('list all customers');
SELECT generate_query('display user information');

-- Show specific columns
SELECT generate_query('show user names and emails');
SELECT generate_query('get product names and prices');

2. Filtered Queries

Goal: Find specific records based on conditions

-- Simple filters
SELECT generate_query('find users older than 25');
SELECT generate_query('show completed orders');
SELECT generate_query('get products under $100');

-- Multiple conditions
SELECT generate_query('find users from New York who are older than 30');
SELECT generate_query('show pending orders from the last month');

-- Date-based filters
SELECT generate_query('show orders from yesterday');
SELECT generate_query('find users created in 2024');
SELECT generate_query('get sales from last quarter');

3. Aggregation and Analysis

Goal: Calculate totals, averages, counts, and other statistics

-- Simple counts
SELECT generate_query('count total users');
SELECT generate_query('how many orders do we have');

-- Sums and totals
SELECT generate_query('calculate total revenue');
SELECT generate_query('sum all order amounts');

-- Averages
SELECT generate_query('average order value');
SELECT generate_query('mean user age');

-- Group by analysis
SELECT generate_query('count orders by status');
SELECT generate_query('total sales by month');
SELECT generate_query('average order value by customer');

4. Joined Data

Goal: Combine information from multiple tables

-- Simple joins
SELECT generate_query('show orders with customer names');
SELECT generate_query('list products with category information');

-- Complex relationships
SELECT generate_query('show customers with their total order amounts');
SELECT generate_query('find users who have never placed an order');
SELECT generate_query('list top customers by number of orders');

5. Ranking and Sorting

Goal: Order results and find top/bottom records

-- Top N queries
SELECT generate_query('top 10 customers by revenue');
SELECT generate_query('5 most expensive products');
SELECT generate_query('latest 20 orders');

-- Sorting
SELECT generate_query('show users ordered by age');
SELECT generate_query('list products by price descending');

Best Practices for Natural Language

Be Specific About What You Want

Good:

SELECT generate_query('show user names and email addresses for active users');

Less Good:

SELECT generate_query('show users');

Use Clear Column References

Good:

SELECT generate_query('find orders with amount greater than 1000');

Better:

SELECT generate_query('find orders where order_amount > 1000');

Specify Time Periods Clearly

Good:

SELECT generate_query('show orders from the last 30 days');
SELECT generate_query('find users created between January 1, 2024 and March 31, 2024');

Use Business Terms

The AI understands common business terminology:

-- These work well
SELECT generate_query('show quarterly revenue');
SELECT generate_query('find churned customers');
SELECT generate_query('calculate customer lifetime value');
SELECT generate_query('show conversion rates by month');

Advanced Usage Patterns

1. Incremental Query Building

Start simple and add complexity:

-- Step 1: Basic query
SELECT generate_query('show orders');

-- Step 2: Add filters
SELECT generate_query('show completed orders');

-- Step 3: Add aggregation
SELECT generate_query('count completed orders by customer');

-- Step 4: Add sorting
SELECT generate_query('count completed orders by customer, ordered by count desc');

2. Data Exploration Workflow

-- 1. Understand data structure
SELECT get_database_tables();

-- 2. Explore individual tables
SELECT generate_query('show sample data from users table');
SELECT generate_query('describe the structure of orders table');

-- 3. Understand relationships
SELECT generate_query('show how users and orders are connected');

-- 4. Analyze patterns
SELECT generate_query('show distribution of users by city');
SELECT generate_query('analyze order patterns by day of week');

3. Report Generation

-- Daily report
SELECT generate_query('daily sales summary for today');

-- Customer analysis
SELECT generate_query('customer segmentation by order frequency');

-- Product performance
SELECT generate_query('product performance metrics for last month');

Working with Results

1. Copying and Executing Queries

-- Method 1: Copy-paste the result
SELECT generate_query('show top customers');
-- Copy the returned SQL and execute it separately

-- Method 2: Store in a variable (if your client supports it)
\set query `SELECT generate_query('show top customers')`
:query

If the generated query isn’t quite right:

-- Try rephrasing
SELECT generate_query('show customers with highest total orders');
-- vs
SELECT generate_query('rank customers by total order value');

-- Be more specific
SELECT generate_query('show customers with more than 5 orders and total value over $1000');

3. Validation

Always review generated queries before using in production:

-- Check the generated query
SELECT generate_query('delete old records');
-- Review before executing - the extension won't generate dangerous queries,
-- but always verify the logic matches your intent

Error Handling

Common Issues and Solutions

“No tables found”

-- Check your database has user tables
SELECT get_database_tables();

“Query too complex”

-- Break down into simpler parts
-- Instead of: "show customers with orders and products and categories"
-- Try: "show customers with their order details"

“Ambiguous column reference”

-- Be more specific about table/column names
-- Instead of: "show orders with customer info"
-- Try: "show orders with customer names from users table"

Performance Considerations

Query Limits

All queries automatically include LIMIT clauses for safety:

-- This automatically gets LIMIT 1000 (configurable)
SELECT generate_query('show all orders');

Large Datasets

For large tables, be specific about what you need:

-- Instead of: "show all user data"
-- Use: "show user summary with names and creation dates"

Indexes

Generated queries work best with proper indexes:

-- Create indexes on commonly filtered columns
CREATE INDEX idx_users_city ON users(city);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_status ON orders(status);

Integration Patterns

1. Application Development

# Python example
cursor.execute("SELECT generate_query(%s)", ['show recent user activity'])
query = cursor.fetchone()[0]
cursor.execute(query)
results = cursor.fetchall()

2. Business Intelligence Tools

Many BI tools can use the extension to generate queries dynamically based on user input.

3. Data Documentation

Use the extension to generate example queries for documentation:

-- Generate examples for each table
SELECT generate_query('show sample data from ' || table_name)
FROM information_schema.tables
WHERE table_schema = 'public';

Next Steps

Explore the Function Reference for complete API details
Check out Examples for more complex use cases
Learn about Schema Discovery to understand how the extension analyzes your database

Query Performance Analysis

Learn how to use the explain_query function to analyze and optimize your PostgreSQL queries with AI-powered insights.

Introduction

Query performance analysis is crucial for maintaining fast, efficient databases. The explain_query function combines PostgreSQL’s EXPLAIN ANALYZE with advanced AI analysis to provide actionable insights for query optimization.

Getting Started

Basic Analysis

Start with a simple query to understand the output format:

SELECT explain_query('SELECT id, email FROM users WHERE active = true LIMIT 100');

This will give you:

How PostgreSQL executes the query
Whether indexes are being used
Performance metrics and timing
AI-generated optimization suggestions

Analyzing Slow Queries

If you have a query that’s running slowly:

SELECT explain_query('
    SELECT u.username, COUNT(o.id) as total_orders
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    WHERE u.created_at > ''2024-01-01''
    GROUP BY u.username
    ORDER BY total_orders DESC
');

Look for:

Sequential scans on large tables
Missing indexes
Inefficient join operations
Sort operations without indexes

Common Performance Patterns

Index Usage Analysis

Checking if Indexes are Used

-- This should use an index if one exists on user_id
SELECT explain_query('SELECT * FROM orders WHERE user_id = 123');

-- This might not use an index efficiently
SELECT explain_query('SELECT * FROM orders WHERE UPPER(status) = ''PENDING''');

Composite Index Analysis

-- Check if a composite index would help
SELECT explain_query('
    SELECT * FROM orders
    WHERE status = ''completed''
    AND created_at > NOW() - INTERVAL ''30 days''
    ORDER BY created_at DESC
');

Look for recommendations about creating indexes like:

CREATE INDEX idx_orders_status_created ON orders(status, created_at);

Join Performance

Nested Loop vs Hash Join

-- Small result set - might use nested loop
SELECT explain_query('
    SELECT u.email, o.total
    FROM users u
    JOIN orders o ON u.id = o.user_id
    WHERE u.id = 123
');

-- Large result set - should use hash or merge join
SELECT explain_query('
    SELECT u.username, COUNT(o.id)
    FROM users u
    JOIN orders o ON u.id = o.user_id
    GROUP BY u.username
');

Optimizing Join Order

-- Multiple joins - check join order efficiency
SELECT explain_query('
    SELECT u.username, p.name as product, oi.quantity
    FROM users u
    JOIN orders o ON u.id = o.user_id
    JOIN order_items oi ON o.id = oi.order_id
    JOIN products p ON oi.product_id = p.id
    WHERE o.created_at > NOW() - INTERVAL ''7 days''
');

Aggregation Performance

GROUP BY Analysis

-- Check if grouping is efficient
SELECT explain_query('
    SELECT DATE(created_at), COUNT(*)
    FROM orders
    WHERE created_at >= ''2024-01-01''
    GROUP BY DATE(created_at)
    ORDER BY DATE(created_at)
');

Window Functions

-- Analyze window function performance
SELECT explain_query('
    SELECT
        user_id,
        created_at,
        ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY created_at) as order_number
    FROM orders
    WHERE created_at > NOW() - INTERVAL ''1 month''
');

Optimization Workflows

Before and After Comparison

1. Analyze Original Query

SELECT explain_query('
    SELECT * FROM orders o, users u
    WHERE o.user_id = u.id AND o.total > 100
');

2. Apply Optimizations

Based on the AI recommendations, rewrite the query:

SELECT explain_query('
    SELECT o.id, o.total, u.username
    FROM orders o
    INNER JOIN users u ON o.user_id = u.id
    WHERE o.total > 100
');

3. Create Recommended Indexes

-- Example recommendation from AI analysis
CREATE INDEX idx_orders_total ON orders(total) WHERE total > 0;

4. Re-analyze

SELECT explain_query('
    SELECT o.id, o.total, u.username
    FROM orders o
    INNER JOIN users u ON o.user_id = u.id
    WHERE o.total > 100
');

Systematic Query Review

Create a systematic approach to review your application’s queries:

-- Create a function to analyze multiple queries
CREATE OR REPLACE FUNCTION analyze_app_queries()
RETURNS TABLE(query_name text, analysis text) AS $$
BEGIN
    RETURN QUERY
    SELECT 'user_dashboard'::text,
           explain_query('SELECT * FROM users WHERE last_login > NOW() - INTERVAL ''30 days''')
    UNION ALL
    SELECT 'order_summary'::text,
           explain_query('SELECT COUNT(*), SUM(total) FROM orders WHERE created_at >= CURRENT_DATE')
    UNION ALL
    SELECT 'top_products'::text,
           explain_query('
               SELECT p.name, SUM(oi.quantity) as sold
               FROM products p
               JOIN order_items oi ON p.id = oi.product_id
               JOIN orders o ON oi.order_id = o.id
               WHERE o.created_at >= CURRENT_DATE - INTERVAL ''7 days''
               GROUP BY p.id, p.name
               ORDER BY sold DESC
               LIMIT 10
           ');
END;
$$ LANGUAGE plpgsql;

-- Run the analysis
SELECT * FROM analyze_app_queries();

Performance Monitoring

Regular Health Checks

Set up regular performance monitoring:

-- Monitor daily active users query
SELECT explain_query('
    SELECT COUNT(DISTINCT user_id)
    FROM user_sessions
    WHERE DATE(created_at) = CURRENT_DATE
');

-- Monitor order processing performance
SELECT explain_query('
    SELECT AVG(processing_time_seconds)
    FROM orders
    WHERE status = ''completed''
    AND created_at >= CURRENT_DATE
');

Automated Analysis

Create views for continuous monitoring:

CREATE VIEW query_performance_monitor AS
SELECT
    'recent_orders' as query_type,
    explain_query('SELECT * FROM orders WHERE created_at > NOW() - INTERVAL ''1 hour''') as analysis
UNION ALL
SELECT
    'active_sessions',
    explain_query('SELECT COUNT(*) FROM user_sessions WHERE last_activity > NOW() - INTERVAL ''15 minutes''');

Advanced Analysis Techniques

Subquery vs JOIN Performance

-- Analyze EXISTS subquery
SELECT explain_query('
    SELECT u.username
    FROM users u
    WHERE EXISTS (
        SELECT 1 FROM orders o
        WHERE o.user_id = u.id
        AND o.total > 1000
    )
');

-- Compare with JOIN approach
SELECT explain_query('
    SELECT DISTINCT u.username
    FROM users u
    JOIN orders o ON u.id = o.user_id
    WHERE o.total > 1000
');

CTE vs Subquery Analysis

-- Common Table Expression approach
SELECT explain_query('
    WITH high_value_customers AS (
        SELECT user_id, SUM(total) as total_spent
        FROM orders
        WHERE created_at > NOW() - INTERVAL ''1 year''
        GROUP BY user_id
        HAVING SUM(total) > 5000
    )
    SELECT u.username, hvc.total_spent
    FROM users u
    JOIN high_value_customers hvc ON u.id = hvc.user_id
');

-- Subquery approach
SELECT explain_query('
    SELECT u.username, sub.total_spent
    FROM users u
    JOIN (
        SELECT user_id, SUM(total) as total_spent
        FROM orders
        WHERE created_at > NOW() - INTERVAL ''1 year''
        GROUP BY user_id
        HAVING SUM(total) > 5000
    ) sub ON u.id = sub.user_id
');

Partition Analysis

For partitioned tables:

-- Check if partition pruning is working
SELECT explain_query('
    SELECT COUNT(*)
    FROM orders_2024
    WHERE created_at BETWEEN ''2024-06-01'' AND ''2024-06-30''
');

Interpreting AI Recommendations

Understanding Cost Estimates

Total Cost: PostgreSQL’s relative measure of query expense
Startup Cost: Cost before returning first row
Execution Time: Actual time taken (from ANALYZE)
Rows: Estimated vs actual rows processed

Index Recommendations

The AI will suggest:

Simple indexes: CREATE INDEX idx_table_column ON table(column);
Composite indexes: CREATE INDEX idx_table_multi ON table(col1, col2);
Partial indexes: CREATE INDEX idx_table_partial ON table(column) WHERE condition;
Expression indexes: CREATE INDEX idx_table_expr ON table(LOWER(column));

Query Rewriting Suggestions

Common recommendations include:

Rewriting WHERE clauses to be more selective
Changing join order or join types
Using LIMIT more effectively
Replacing correlated subqueries with joins

Troubleshooting Common Issues

No Index Usage

If indexes aren’t being used:

-- Check if the query can use the index
SELECT explain_query('SELECT * FROM users WHERE email = ''user@example.com''');

-- Vs a query that can't use the index efficiently
SELECT explain_query('SELECT * FROM users WHERE email LIKE ''%@example.com''');

Slow Aggregations

For slow GROUP BY queries:

-- Analyze aggregation performance
SELECT explain_query('
    SELECT product_category, COUNT(*), AVG(price)
    FROM products
    GROUP BY product_category
');

Large Result Sets

When dealing with large results:

-- Use LIMIT to test query efficiency
SELECT explain_query('
    SELECT * FROM orders
    ORDER BY created_at DESC
    LIMIT 100
');

Best Practices

Start Small: Begin with LIMIT to understand query patterns
Focus on Bottlenecks: Look for the most expensive operations first
Test Incrementally: Make one optimization at a time
Validate Results: Ensure optimizations actually improve performance
Monitor Regularly: Set up regular analysis of critical queries
Document Changes: Keep track of optimizations and their impact

Integration with Development Workflow

Code Review Process

-- Add performance analysis to your development process
CREATE OR REPLACE FUNCTION review_new_query(query_text text)
RETURNS text AS $$
BEGIN
    RETURN explain_query(query_text);
END;
$$ LANGUAGE plpgsql;

-- Use in code reviews
SELECT review_new_query('SELECT * FROM new_feature_table WHERE complex_condition = true');

CI/CD Integration

Create performance tests:

-- Performance regression test
DO $$
DECLARE
    analysis text;
BEGIN
    SELECT explain_query('SELECT * FROM critical_table WHERE important_column = 123') INTO analysis;

    -- Log or validate the analysis
    IF analysis LIKE '%Sequential Scan%' THEN
        RAISE WARNING 'Performance regression detected: Sequential scan in critical query';
    END IF;
END;
$$;

Next Steps

Function Reference - Complete API documentation
Best Practices - Advanced optimization techniques
Integration Patterns - Using with other tools
Troubleshooting - Common issues and solutions

Examples

This page provides practical examples of using pg_ai_query for common database tasks and use cases.

Sample Database

For these examples, we’ll use a sample e-commerce database with the following tables:

-- Users table
CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    name VARCHAR(100) NOT NULL,
    email VARCHAR(150) UNIQUE NOT NULL,
    age INTEGER,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    city VARCHAR(50),
    status VARCHAR(20) DEFAULT 'active'
);

-- Products table
CREATE TABLE products (
    id SERIAL PRIMARY KEY,
    name VARCHAR(200) NOT NULL,
    category VARCHAR(50),
    price DECIMAL(10,2),
    stock_quantity INTEGER DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Orders table
CREATE TABLE orders (
    id SERIAL PRIMARY KEY,
    user_id INTEGER REFERENCES users(id),
    total_amount DECIMAL(10,2),
    status VARCHAR(20) DEFAULT 'pending',
    order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    shipping_city VARCHAR(50)
);

-- Order items table
CREATE TABLE order_items (
    id SERIAL PRIMARY KEY,
    order_id INTEGER REFERENCES orders(id),
    product_id INTEGER REFERENCES products(id),
    quantity INTEGER,
    unit_price DECIMAL(10,2)
);

Basic Queries

Simple Data Retrieval

-- Show all users
SELECT generate_query('show me all users');
-- Result: SELECT id, name, email, age, created_at, city, status FROM public.users LIMIT 1000;

-- List products with their prices
SELECT generate_query('show product names and prices');
-- Result: SELECT name, price FROM public.products LIMIT 1000;

-- Get recent orders
SELECT generate_query('show orders from the last 7 days');
-- Result: SELECT * FROM public.orders WHERE order_date >= CURRENT_DATE - INTERVAL '7 days' LIMIT 1000;

Filtered Searches

-- Find active users
SELECT generate_query('find all active users');
-- Result: SELECT * FROM public.users WHERE status = 'active' LIMIT 1000;

-- Search by city
SELECT generate_query('show users from New York');
-- Result: SELECT * FROM public.users WHERE city = 'New York' LIMIT 1000;

-- Price range filtering
SELECT generate_query('find products under $50');
-- Result: SELECT * FROM public.products WHERE price < 50 LIMIT 1000;

-- Age-based filtering
SELECT generate_query('show users between 25 and 35 years old');
-- Result: SELECT * FROM public.users WHERE age BETWEEN 25 AND 35 LIMIT 1000;

Aggregation and Analytics

Counting and Totals

-- Count users by city
SELECT generate_query('count users by city');
-- Result: SELECT city, COUNT(*) as count FROM public.users GROUP BY city LIMIT 1000;

-- Total revenue
SELECT generate_query('calculate total revenue from all orders');
-- Result: SELECT SUM(total_amount) as total_revenue FROM public.orders;

-- Average order value
SELECT generate_query('what is the average order value');
-- Result: SELECT AVG(total_amount) as average_order_value FROM public.orders;

-- Product inventory summary
SELECT generate_query('show total stock quantity by product category');
-- Result: SELECT category, SUM(stock_quantity) as total_stock FROM public.products GROUP BY category LIMIT 1000;

Time-Based Analysis

-- Monthly sales trends
SELECT generate_query('show monthly sales for this year');
-- Result: SELECT DATE_TRUNC('month', order_date) as month, SUM(total_amount) as monthly_sales
--         FROM public.orders WHERE EXTRACT(year FROM order_date) = EXTRACT(year FROM CURRENT_DATE)
--         GROUP BY DATE_TRUNC('month', order_date) ORDER BY month LIMIT 1000;

-- Daily order count
SELECT generate_query('count orders by day for the last 30 days');
-- Result: SELECT DATE(order_date) as order_day, COUNT(*) as order_count
--         FROM public.orders WHERE order_date >= CURRENT_DATE - INTERVAL '30 days'
--         GROUP BY DATE(order_date) ORDER BY order_day LIMIT 1000;

-- User registration trends
SELECT generate_query('show user registrations by month');
-- Result: SELECT DATE_TRUNC('month', created_at) as month, COUNT(*) as new_users
--         FROM public.users GROUP BY DATE_TRUNC('month', created_at) ORDER BY month LIMIT 1000;

Complex Joins and Relationships

Customer Order Analysis

-- Orders with customer information
SELECT generate_query('show orders with customer names and emails');
-- Result: SELECT o.id, o.total_amount, o.status, o.order_date, u.name, u.email
--         FROM public.orders o JOIN public.users u ON o.user_id = u.id LIMIT 1000;

-- Customer lifetime value
SELECT generate_query('calculate total spent by each customer');
-- Result: SELECT u.name, u.email, SUM(o.total_amount) as total_spent
--         FROM public.users u LEFT JOIN public.orders o ON u.id = o.user_id
--         GROUP BY u.id, u.name, u.email ORDER BY total_spent DESC LIMIT 1000;

-- Top customers by order count
SELECT generate_query('find customers with the most orders');
-- Result: SELECT u.name, u.email, COUNT(o.id) as order_count
--         FROM public.users u LEFT JOIN public.orders o ON u.id = o.user_id
--         GROUP BY u.id, u.name, u.email ORDER BY order_count DESC LIMIT 1000;

Product and Sales Analysis

-- Best-selling products
SELECT generate_query('show top selling products by quantity');
-- Result: SELECT p.name, SUM(oi.quantity) as total_sold
--         FROM public.products p JOIN public.order_items oi ON p.id = oi.product_id
--         GROUP BY p.id, p.name ORDER BY total_sold DESC LIMIT 1000;

-- Revenue by product category
SELECT generate_query('calculate revenue by product category');
-- Result: SELECT p.category, SUM(oi.quantity * oi.unit_price) as category_revenue
--         FROM public.products p JOIN public.order_items oi ON p.id = oi.product_id
--         GROUP BY p.category ORDER BY category_revenue DESC LIMIT 1000;

-- Detailed order breakdown
SELECT generate_query('show order details with product names and quantities');
-- Result: SELECT o.id, o.order_date, u.name as customer_name, p.name as product_name,
--                oi.quantity, oi.unit_price, (oi.quantity * oi.unit_price) as line_total
--         FROM public.orders o
--         JOIN public.users u ON o.user_id = u.id
--         JOIN public.order_items oi ON o.id = oi.order_id
--         JOIN public.products p ON oi.product_id = p.id LIMIT 1000;

Business Intelligence Queries

Customer Segmentation

-- High-value customers
SELECT generate_query('find customers who have spent more than $1000 total');
-- Result: SELECT u.name, u.email, SUM(o.total_amount) as total_spent
--         FROM public.users u JOIN public.orders o ON u.id = o.user_id
--         GROUP BY u.id, u.name, u.email
--         HAVING SUM(o.total_amount) > 1000 ORDER BY total_spent DESC LIMIT 1000;

-- Customer activity segments
SELECT generate_query('categorize customers by number of orders');
-- Result: SELECT
--           CASE
--             WHEN order_count = 0 THEN 'No Orders'
--             WHEN order_count BETWEEN 1 AND 3 THEN 'Low Activity'
--             WHEN order_count BETWEEN 4 AND 10 THEN 'Medium Activity'
--             ELSE 'High Activity'
--           END as customer_segment,
--           COUNT(*) as customer_count
--         FROM (
--           SELECT u.id, COUNT(o.id) as order_count
--           FROM public.users u LEFT JOIN public.orders o ON u.id = o.user_id
--           GROUP BY u.id
--         ) customer_orders
--         GROUP BY customer_segment LIMIT 1000;

-- Inactive customers
SELECT generate_query('find customers who haven\'t ordered in the last 90 days');
-- Result: SELECT u.name, u.email, MAX(o.order_date) as last_order_date
--         FROM public.users u LEFT JOIN public.orders o ON u.id = o.user_id
--         GROUP BY u.id, u.name, u.email
--         HAVING MAX(o.order_date) < CURRENT_DATE - INTERVAL '90 days'
--         OR MAX(o.order_date) IS NULL LIMIT 1000;

Inventory and Product Analysis

-- Low stock alerts
SELECT generate_query('show products with less than 10 items in stock');
-- Result: SELECT name, stock_quantity FROM public.products WHERE stock_quantity < 10 LIMIT 1000;

-- Product performance metrics
SELECT generate_query('show product sales performance with revenue and quantity sold');
-- Result: SELECT p.name, p.category,
--                COUNT(oi.id) as times_ordered,
--                SUM(oi.quantity) as total_quantity_sold,
--                SUM(oi.quantity * oi.unit_price) as total_revenue
--         FROM public.products p
--         LEFT JOIN public.order_items oi ON p.id = oi.product_id
--         GROUP BY p.id, p.name, p.category
--         ORDER BY total_revenue DESC LIMIT 1000;

-- Seasonal trends
SELECT generate_query('show monthly product sales trends');
-- Result: SELECT p.category,
--                DATE_TRUNC('month', o.order_date) as month,
--                SUM(oi.quantity) as quantity_sold
--         FROM public.products p
--         JOIN public.order_items oi ON p.id = oi.product_id
--         JOIN public.orders o ON oi.order_id = o.id
--         GROUP BY p.category, DATE_TRUNC('month', o.order_date)
--         ORDER BY month, category LIMIT 1000;

Advanced Analytics

Cohort Analysis

-- Customer acquisition by month
SELECT generate_query('show new customer registrations by month with their first order date');
-- Result: SELECT DATE_TRUNC('month', u.created_at) as registration_month,
--                COUNT(*) as new_customers,
--                COUNT(first_order.user_id) as customers_with_orders
--         FROM public.users u
--         LEFT JOIN (
--           SELECT user_id, MIN(order_date) as first_order_date
--           FROM public.orders
--           GROUP BY user_id
--         ) first_order ON u.id = first_order.user_id
--         GROUP BY DATE_TRUNC('month', u.created_at)
--         ORDER BY registration_month LIMIT 1000;

Geographic Analysis

-- Sales by shipping location
SELECT generate_query('show total sales by shipping city');
-- Result: SELECT shipping_city,
--                COUNT(*) as order_count,
--                SUM(total_amount) as total_sales
--         FROM public.orders
--         WHERE shipping_city IS NOT NULL
--         GROUP BY shipping_city
--         ORDER BY total_sales DESC LIMIT 1000;

-- Customer distribution
SELECT generate_query('show customer count and average order value by city');
-- Result: SELECT u.city,
--                COUNT(DISTINCT u.id) as customer_count,
--                COUNT(o.id) as total_orders,
--                AVG(o.total_amount) as avg_order_value
--         FROM public.users u
--         LEFT JOIN public.orders o ON u.id = o.user_id
--         GROUP BY u.city
--         ORDER BY customer_count DESC LIMIT 1000;

Data Quality and Validation

Data Integrity Checks

-- Find orphaned records
SELECT generate_query('find orders without valid customer references');
-- Result: SELECT o.* FROM public.orders o
--         LEFT JOIN public.users u ON o.user_id = u.id
--         WHERE u.id IS NULL LIMIT 1000;

-- Missing email addresses
SELECT generate_query('find users with missing or invalid email addresses');
-- Result: SELECT * FROM public.users
--         WHERE email IS NULL OR email = '' OR email NOT LIKE '%@%' LIMIT 1000;

-- Duplicate detection
SELECT generate_query('find duplicate user email addresses');
-- Result: SELECT email, COUNT(*) as count
--         FROM public.users
--         GROUP BY email
--         HAVING COUNT(*) > 1 LIMIT 1000;

Data Range Validation

-- Unusual values
SELECT generate_query('find orders with negative amounts');
-- Result: SELECT * FROM public.orders WHERE total_amount < 0 LIMIT 1000;

-- Age validation
SELECT generate_query('find users with unrealistic ages');
-- Result: SELECT * FROM public.users WHERE age < 0 OR age > 120 LIMIT 1000;

-- Future dates
SELECT generate_query('find orders with future dates');
-- Result: SELECT * FROM public.orders WHERE order_date > CURRENT_TIMESTAMP LIMIT 1000;

Performance Monitoring

Database Statistics

-- Table sizes and row counts
SELECT generate_query('show row count for each table');
-- Result: This will generate appropriate queries to count rows in each table

-- Activity monitoring
SELECT generate_query('show order activity by hour of day');
-- Result: SELECT EXTRACT(hour FROM order_date) as hour_of_day,
--                COUNT(*) as order_count
--         FROM public.orders
--         GROUP BY EXTRACT(hour FROM order_date)
--         ORDER BY hour_of_day LIMIT 1000;

Tips for Better Results

1. Be Specific

-- Instead of: "show sales"
SELECT generate_query('show total sales amount by month for 2024');

-- Instead of: "find customers"
SELECT generate_query('find customers with more than 5 orders and total spent over $500');

2. Use Clear Time References

-- Good time references:
SELECT generate_query('show orders from last 30 days');
SELECT generate_query('find users registered between January 1, 2024 and March 31, 2024');
SELECT generate_query('show sales for Q1 2024');

3. Specify Sorting and Limits

-- Include sorting preferences:
SELECT generate_query('show top 10 customers by total revenue');
SELECT generate_query('list products ordered by price ascending');
SELECT generate_query('show recent orders sorted by date descending');

4. Reference Column Names When Needed

-- When ambiguous, specify column names:
SELECT generate_query('find users where user status is active');
SELECT generate_query('show products where stock quantity is less than reorder level');

These examples demonstrate the flexibility and power of pg_ai_query for various database tasks, from simple queries to complex business intelligence analysis.

Enhanced Response Formatting Examples

The following examples show how different response configuration options affect the output:

Example: Enhanced Text Response

Configuration:

[response]
show_explanation = true
show_warnings = true
show_suggested_visualization = true
use_formatted_response = false

Query:

SELECT generate_query('monthly sales trend for the last year with top performing categories');

Enhanced Response:

SELECT
    DATE_TRUNC('month', o.order_date) as month,
    p.category,
    SUM(o.total_amount) as monthly_sales,
    COUNT(o.id) as order_count
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.order_date >= CURRENT_DATE - INTERVAL '1 year'
GROUP BY DATE_TRUNC('month', o.order_date), p.category
ORDER BY month DESC, monthly_sales DESC
LIMIT 1000;

-- Explanation:
-- Analyzes monthly sales trends over the past year, broken down by product category.
-- Shows total sales amount and order count for each category per month,
-- helping identify seasonal patterns and top-performing categories over time.

-- Warnings:
-- 1. Multiple JOINs: Query involves 3 table joins which may impact performance on large datasets
-- 2. Date calculation: Uses CURRENT_DATE which may give different results when run at different times
-- 3. Index recommendation: Consider indexes on order_date, order_id, and product_id for optimal performance

-- Suggested Visualization: line
-- Reasoning: Time series data showing trends over months is best displayed as line charts,
-- with separate lines for each category to compare performance patterns

Example: JSON Response

Configuration:

[response]
show_explanation = true
show_warnings = true
show_suggested_visualization = true
use_formatted_response = true

Query:

SELECT generate_query('top 10 customers by lifetime value');

JSON Response:

{
  "query": "SELECT u.id, u.name, u.email, SUM(o.total_amount) as lifetime_value, COUNT(o.id) as total_orders FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.id, u.name, u.email ORDER BY lifetime_value DESC LIMIT 10;",
  "success": true,
  "explanation": "Identifies the top 10 customers based on their total lifetime purchase value. Shows customer details along with their total spending and number of orders, useful for customer relationship management and targeted marketing.",
  "warnings": [
    "Customer segmentation: Consider including date ranges for more meaningful lifetime value calculations",
    "Performance: Large customer and orders tables may require optimization with proper indexing"
  ],
  "suggested_visualization": "bar",
  "row_limit_applied": true
}

Example: Minimal Response

Configuration:

[response]
show_explanation = false
show_warnings = false
show_suggested_visualization = false
use_formatted_response = false

Query:

SELECT generate_query('all products in electronics category');

Minimal Response:

SELECT * FROM products WHERE category = 'electronics' LIMIT 1000;

Example: Warnings Only

Configuration:

[response]
show_explanation = false
show_warnings = true
show_suggested_visualization = false
use_formatted_response = false

Query:

SELECT generate_query('delete old order records from 2020');

Response with Warnings:

DELETE FROM orders WHERE EXTRACT(year FROM order_date) = 2020;

-- Warnings:
-- 1. DESTRUCTIVE OPERATION: This DELETE statement will permanently remove data
-- 2. No transaction control: Consider wrapping in BEGIN/COMMIT for safety
-- 3. Foreign key impact: Deleting orders may affect order_items table if cascade is not set
-- 4. Backup recommendation: Ensure you have recent backups before executing
-- 5. Row count check: Consider SELECT COUNT(*) first to verify the number of rows that will be deleted

Working with JSON Responses in Applications

Python Example:

import psycopg2
import json

# Connect to database
conn = psycopg2.connect("dbname=mydb user=myuser")
cur = conn.cursor()

# Execute query with JSON response
cur.execute("SELECT generate_query('monthly sales by product category')")
response_json = cur.fetchone()[0]

# Parse JSON response
response = json.loads(response_json)

if response['success']:
    # Extract query and execute it
    data_query = response['query']
    cur.execute(data_query)
    results = cur.fetchall()

    # Use metadata for visualization
    viz_type = response.get('suggested_visualization', 'table')
    explanation = response.get('explanation', '')
    warnings = response.get('warnings', [])

    print(f"Query explanation: {explanation}")
    print(f"Suggested visualization: {viz_type}")

    if warnings:
        print("Warnings:")
        for warning in warnings:
            print(f"  - {warning}")

    # Process results based on suggested visualization
    if viz_type == 'line':
        # Create line chart
        create_line_chart(results)
    elif viz_type == 'bar':
        # Create bar chart
        create_bar_chart(results)
    else:
        # Display as table
        display_table(results)

These enhanced formatting examples show how the new configuration options provide flexible control over response detail and format, making the extension suitable for both interactive use and application integration.

Schema Discovery

The pg_ai_query extension includes an intelligent schema discovery system that automatically analyzes your PostgreSQL database structure to provide context for AI-powered query generation. This page explains how schema discovery works and how you can optimize it.

Overview

Schema discovery is the process by which the extension examines your database to understand:

Tables and Views: What data structures exist
Columns and Data Types: What fields are available and their types
Relationships: How tables connect through foreign keys
Constraints: Primary keys, unique constraints, and check constraints
Indexes: Available indexes for query optimization
Statistics: Row counts and table sizes for context

This information is then used to generate accurate, well-structured SQL queries from your natural language requests.

How Schema Discovery Works

1. Automatic Triggering

Schema discovery happens automatically when:

The extension is first used in a PostgreSQL session
You explicitly call schema discovery functions
The AI needs context to generate a query

-- These actions trigger schema discovery:
SELECT generate_query('show users');  -- Automatic discovery
SELECT get_database_tables();         -- Explicit discovery
SELECT get_table_details('users');    -- Detailed table analysis

2. Discovery Process

The extension follows this process:

Step 1: Table Enumeration

-- Extension queries information_schema to find user tables
-- Equivalent to:
SELECT table_name, table_schema, table_type
FROM information_schema.tables
WHERE table_schema NOT IN ('information_schema', 'pg_catalog', 'pg_toast');

Step 2: Column Analysis

-- For each table, analyze columns
-- Equivalent to:
SELECT column_name, data_type, is_nullable, column_default
FROM information_schema.columns
WHERE table_name = 'your_table';

Step 3: Constraint Discovery

-- Find primary keys, foreign keys, and unique constraints
-- Equivalent to:
SELECT constraint_name, constraint_type, column_name
FROM information_schema.table_constraints tc
JOIN information_schema.constraint_column_usage ccu
  ON tc.constraint_name = ccu.constraint_name;

Step 4: Relationship Mapping

-- Discover foreign key relationships
-- Equivalent to:
SELECT
  tc.table_name,
  kcu.column_name,
  ccu.table_name AS foreign_table_name,
  ccu.column_name AS foreign_column_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
  ON tc.constraint_name = kcu.constraint_name
JOIN information_schema.constraint_column_usage ccu
  ON ccu.constraint_name = tc.constraint_name
WHERE constraint_type = 'FOREIGN KEY';

Step 5: Statistics Collection

-- Gather table statistics for optimization
-- Equivalent to:
SELECT
  schemaname,
  tablename,
  n_tup_ins,
  n_tup_upd,
  n_tup_del,
  n_live_tup
FROM pg_stat_user_tables;

3. Information Processing

The collected information is then:

Structured into JSON format for AI consumption
Cached in the PostgreSQL session for performance
Filtered to exclude system tables and sensitive information
Contextualized with business-friendly descriptions

Schema Discovery Functions

get_database_tables()

Returns high-level information about all tables in your database.

Usage:

SELECT get_database_tables();

Example Output:

[
  {
    "table_name": "users",
    "schema_name": "public",
    "table_type": "BASE TABLE",
    "estimated_rows": 1500,
    "table_size": "128 kB"
  },
  {
    "table_name": "orders",
    "schema_name": "public",
    "table_type": "BASE TABLE",
    "estimated_rows": 5000,
    "table_size": "512 kB"
  }
]

Information Provided:

Table names and schemas
Table type (BASE TABLE, VIEW, etc.)
Estimated row counts
Physical table sizes

get_table_details()

Returns comprehensive information about a specific table.

Usage:

SELECT get_table_details('table_name', 'schema_name');

Example Output:

{
  "table_name": "users",
  "schema_name": "public",
  "columns": [
    {
      "column_name": "id",
      "data_type": "integer",
      "is_nullable": false,
      "column_default": "nextval('users_id_seq'::regclass)",
      "is_primary_key": true,
      "is_unique": true
    },
    {
      "column_name": "email",
      "data_type": "character varying",
      "character_maximum_length": 150,
      "is_nullable": false,
      "is_unique": true
    },
    {
      "column_name": "created_at",
      "data_type": "timestamp without time zone",
      "is_nullable": true,
      "column_default": "CURRENT_TIMESTAMP"
    }
  ],
  "constraints": [
    {
      "constraint_name": "users_pkey",
      "constraint_type": "PRIMARY KEY",
      "column_names": ["id"]
    }
  ],
  "foreign_keys": [],
  "indexes": [
    {
      "index_name": "users_email_idx",
      "columns": ["email"],
      "is_unique": true
    }
  ]
}

Optimizing Schema Discovery

1. Table and Column Naming

Use clear, descriptive names that help the AI understand your data:

Good Examples:

-- Clear table names
CREATE TABLE customer_orders (...);
CREATE TABLE product_categories (...);
CREATE TABLE user_preferences (...);

-- Descriptive column names
CREATE TABLE users (
    user_id SERIAL PRIMARY KEY,
    full_name VARCHAR(100),
    email_address VARCHAR(150),
    registration_date TIMESTAMP,
    account_status VARCHAR(20)
);

Less Optimal:

-- Unclear abbreviations
CREATE TABLE usr (...);
CREATE TABLE prod_cat (...);

-- Ambiguous column names
CREATE TABLE users (
    id INT,
    nm VARCHAR(100),
    em VARCHAR(150),
    dt TIMESTAMP,
    st VARCHAR(20)
);

2. Adding Table Comments

Enhance schema discovery with descriptive comments:

-- Table-level comments
COMMENT ON TABLE users IS 'Customer information and account details';
COMMENT ON TABLE orders IS 'Customer purchase orders and transaction history';
COMMENT ON TABLE products IS 'Product catalog with pricing and inventory';

-- Column-level comments
COMMENT ON COLUMN users.account_status IS 'User account status: active, inactive, suspended';
COMMENT ON COLUMN orders.fulfillment_status IS 'Order processing status: pending, shipped, delivered, cancelled';

3. Proper Constraint Definition

Well-defined constraints help the AI understand data relationships:

-- Primary keys
ALTER TABLE users ADD CONSTRAINT users_pkey PRIMARY KEY (user_id);

-- Foreign keys with descriptive names
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer
    FOREIGN KEY (customer_id) REFERENCES users(user_id);

-- Meaningful unique constraints
ALTER TABLE users ADD CONSTRAINT unique_user_email
    UNIQUE (email_address);

-- Check constraints with business logic
ALTER TABLE orders ADD CONSTRAINT check_order_amount_positive
    CHECK (total_amount > 0);

4. Strategic Indexing

Indexes help the AI understand query optimization opportunities:

-- Indexes on frequently queried columns
CREATE INDEX idx_users_registration_date ON users(registration_date);
CREATE INDEX idx_orders_order_date ON orders(order_date);

-- Composite indexes for common query patterns
CREATE INDEX idx_orders_customer_date ON orders(customer_id, order_date);

Schema Discovery Best Practices

1. Database Design

Normalized Structure: Use proper normalization to make relationships clear
Consistent Naming: Follow consistent naming conventions across tables
Foreign Key Relationships: Always define foreign key constraints
Meaningful Defaults: Use sensible default values where appropriate

2. Documentation

-- Document complex business rules
COMMENT ON TABLE order_items IS 'Individual line items within customer orders, linked to product catalog';
COMMENT ON COLUMN products.discontinued_date IS 'Date when product was discontinued; NULL means still active';

3. Data Types

Choose appropriate data types that convey meaning:

-- Good: Specific types
CREATE TABLE events (
    event_date DATE,              -- Clearly a date
    event_time TIME,              -- Clearly a time
    duration_minutes INTEGER,     -- Clearly numeric
    event_type VARCHAR(50),       -- Clearly textual with reasonable limit
    is_public BOOLEAN            -- Clearly boolean
);

-- Less clear: Generic types
CREATE TABLE events (
    event_date TEXT,              -- Could be anything
    event_time TEXT,              -- Ambiguous format
    duration_minutes TEXT,        -- Should be numeric
    event_type TEXT,              -- No length constraint
    is_public TEXT               -- Should be boolean
);

4. Schema Organization

Organize related tables logically:

-- Group related tables with consistent prefixes
CREATE TABLE user_accounts (...);
CREATE TABLE user_preferences (...);
CREATE TABLE user_sessions (...);

CREATE TABLE product_catalog (...);
CREATE TABLE product_categories (...);
CREATE TABLE product_reviews (...);

CREATE TABLE order_headers (...);
CREATE TABLE order_items (...);
CREATE TABLE order_shipments (...);

Performance Considerations

Schema Discovery Caching

Session-level Caching: Schema information is cached per PostgreSQL session
Cache Invalidation: Schema cache is cleared when the session ends
Refresh Triggers: Schema is re-analyzed if tables are modified

Optimization Tips

Minimize Table Count: Large numbers of tables slow down discovery
Use Views: Create views to present simplified interfaces to complex schemas
Partition Strategy: Consider table partitioning for very large tables
Statistics Updates: Keep table statistics current with ANALYZE

-- Update statistics for better schema discovery
ANALYZE users;
ANALYZE orders;

-- Or update all tables
ANALYZE;

Security Considerations

What Schema Discovery Accesses

✅ Accessed Information:

User table structures
Column names and types
Constraint definitions
Index information
Table statistics

❌ Not Accessed:

Actual data content
System catalog details
User passwords or sensitive data
Database configuration

Privacy Protection

The schema discovery system:

Only reads metadata, never actual data
Respects PostgreSQL’s permission system
Excludes system tables and schemas
Never transmits sensitive information to external APIs

Troubleshooting Schema Discovery

Common Issues

Problem: “No tables found”

-- Check if tables exist
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'public' AND table_type = 'BASE TABLE';

-- Verify permissions
SELECT has_table_privilege('users', 'SELECT');

Problem: Missing relationships

-- Check foreign key constraints
SELECT
    tc.table_name,
    kcu.column_name,
    ccu.table_name AS foreign_table_name
FROM information_schema.table_constraints tc
JOIN information_schema.key_column_usage kcu
  ON tc.constraint_name = kcu.constraint_name
WHERE tc.constraint_type = 'FOREIGN KEY';

Problem: Outdated statistics

-- Update table statistics
ANALYZE your_table_name;

-- Check last analyze time
SELECT schemaname, tablename, last_analyze
FROM pg_stat_user_tables;

Schema discovery is a powerful feature that makes your database “AI-ready” by providing the context needed for intelligent query generation. Following these best practices will help you get the most accurate and useful results from the pg_ai_query extension.

AI Providers

The pg_ai_query extension supports multiple AI providers, each with different models, capabilities, and pricing structures. This guide helps you choose the right provider and model for your use case.

Supported Providers

OpenAI

Overview: OpenAI provides the GPT family of models, known for strong natural language understanding and code generation capabilities.

API Endpoint: https://api.openai.com/v1/

Available Models:

Model	Description	Context Length	Best For
`gpt-4o`	Latest GPT-4 Omni model	128,000 tokens	Complex queries, best accuracy
`gpt-4`	Standard GPT-4 model	8,192 tokens	High-quality SQL generation
`gpt-3.5-turbo`	Fast and efficient model	4,096 tokens	Simple queries, fast responses

Anthropic

Overview: Anthropic provides Claude models, known for thoughtful reasoning and safety-conscious outputs.

API Endpoint: https://api.anthropic.com/v1/

Available Models:

Model	Description	Context Length	Best For
`claude-3-5-sonnet-20241022`	Latest Claude 3.5 Sonnet	200,000 tokens	Complex analysis, detailed reasoning

Provider Selection

Automatic Provider Selection

When you use provider = 'auto' (the default), the extension selects a provider based on:

Configuration Priority: First configured provider with a valid API key
Fallback Logic: If primary provider fails, tries others
Default Order: OpenAI → Anthropic

-- Uses automatic provider selection
SELECT generate_query('show recent orders');

Explicit Provider Selection

You can specify which provider to use:

-- Force OpenAI
SELECT generate_query('show users', null, 'openai');

-- Force Anthropic
SELECT generate_query('show users', null, 'anthropic');

Configuration-Based Selection

Set your preferred provider in the configuration file:

[general]
# Provider priority is determined by order in config file

[openai]
api_key = "your-openai-key"
default_model = "gpt-4o"

[anthropic]
api_key = "your-anthropic-key"
default_model = "claude-3-5-sonnet-20241022"

Provider Comparison

Performance Characteristics

Aspect	OpenAI GPT-4o	OpenAI GPT-3.5	Anthropic Claude
Response Time	2-5 seconds	1-2 seconds	3-6 seconds
Accuracy	Excellent	Good	Excellent
Complex Queries	Excellent	Fair	Excellent
SQL Knowledge	Excellent	Good	Very Good
Cost	High	Low	Medium

Use Case Recommendations

For Simple Queries

Recommended: OpenAI GPT-3.5-turbo

-- Simple data retrieval
SELECT generate_query('show all users', null, 'openai');
-- Configuration: default_model = "gpt-3.5-turbo"

Why: Fast, cost-effective, sufficient accuracy for basic queries.

For Complex Analytics

Recommended: OpenAI GPT-4o or Anthropic Claude

-- Complex business intelligence
SELECT generate_query(
    'calculate monthly customer retention rates with cohort analysis',
    null,
    'openai'
);
-- Configuration: default_model = "gpt-4o"

Why: Better reasoning for complex multi-table joins and advanced analytics.

For Data Exploration

Recommended: Anthropic Claude

-- Exploratory analysis
SELECT generate_query(
    'analyze sales trends and identify anomalies in the last quarter',
    null,
    'anthropic'
);

Why: Excellent at understanding context and generating insightful queries.

For Production Systems

Recommended: OpenAI GPT-4 (balanced performance/cost)

-- Production queries
SELECT generate_query('generate daily sales report', null, 'openai');
-- Configuration: default_model = "gpt-4"

Why: Good balance of accuracy, speed, and cost for production workloads.

Model-Specific Configurations

OpenAI Configuration

[openai]
api_key = "sk-proj-your-api-key-here"
default_model = "gpt-4o"

# Optional: Model-specific settings (future feature)
# gpt_4o_temperature = 0.7
# gpt_4o_max_tokens = 16384

API Key Format: Must start with sk-proj- or sk-

Getting Started:

Visit platform.openai.com
Create an account and add billing information
Generate an API key
Add to configuration file

Anthropic Configuration

[anthropic]
api_key = "sk-ant-your-api-key-here"
default_model = "claude-3-5-sonnet-20241022"

# Optional: Model-specific settings (future feature)
# claude_temperature = 0.7
# claude_max_tokens = 8192

API Key Format: Must start with sk-ant-

Getting Started:

Visit console.anthropic.com
Create an account and add credits
Generate an API key
Add to configuration file

Cost Considerations

OpenAI Pricing (Approximate)

Model	Input Cost	Output Cost	Typical Query Cost
GPT-4o	$2.50/1M tokens	$10.00/1M tokens	$0.02-0.05
GPT-4	$30.00/1M tokens	$60.00/1M tokens	$0.10-0.20
GPT-3.5-turbo	$0.50/1M tokens	$1.50/1M tokens	$0.001-0.005

Anthropic Pricing (Approximate)

Model	Input Cost	Output Cost	Typical Query Cost
Claude 3.5 Sonnet	$3.00/1M tokens	$15.00/1M tokens	$0.03-0.06

Prices are approximate and change frequently. Check provider websites for current pricing.

Cost Optimization Strategies

Use Appropriate Models

-- For simple queries, use cheaper models
SELECT generate_query('count users', null, 'openai');
-- Set: default_model = "gpt-3.5-turbo"

Efficient Query Descriptions

-- Be specific to reduce back-and-forth
SELECT generate_query('show users with orders in last 30 days including total order value');

Batch Similar Queries

-- Instead of multiple calls:
-- SELECT generate_query('show users');
-- SELECT generate_query('show user count');

-- Use one comprehensive query:
SELECT generate_query('show user list with total count');

Provider Reliability and Error Handling

Automatic Fallback

The extension includes automatic fallback logic:

-- If OpenAI fails, automatically tries Anthropic
SELECT generate_query('show data', null, 'auto');

Manual Retry with Different Provider

-- If one provider fails, try another
SELECT generate_query('complex query', null, 'openai');
-- If that fails:
SELECT generate_query('complex query', null, 'anthropic');

Error Handling Configuration

[general]
max_retries = 3           # Retry failed requests
request_timeout_ms = 30000 # Timeout for each request

Advanced Provider Features

Rate Limiting

Each provider has different rate limits:

OpenAI:

GPT-3.5: 3,500 requests/minute
GPT-4: 500 requests/minute (varies by tier)

Anthropic:

Claude: 1,000 requests/minute (varies by tier)

Context Window Optimization

For large schemas, consider context window sizes:

-- Large schema? Use models with larger context windows
-- GPT-4o: 128k tokens
-- Claude 3.5: 200k tokens
SELECT generate_query('analyze relationships across all tables', null, 'anthropic');

Regional Considerations

OpenAI: Global availability with regional data centers Anthropic: Primary US-based with expanding regional support

Provider-Specific Best Practices

OpenAI Best Practices

Model Selection:
- Use GPT-3.5-turbo for development and testing
- Use GPT-4 for production workloads
- Use GPT-4o for complex analytical queries
Prompt Engineering:
- Be specific about desired output format
- Include relevant business context
- Specify PostgreSQL version if using advanced features

Anthropic Best Practices

Leveraging Reasoning:
- Claude excels at understanding business logic
- Good for queries requiring domain expertise
- Effective for complex multi-step analysis
Safety Features:
- Claude has built-in safety considerations
- Good for production environments with compliance needs
- Transparent about limitations and assumptions

Future Provider Support

The extension is designed to easily support additional providers:

Google (Gemini): Planned support
Azure OpenAI: Possible future integration
Local Models: Exploring support for self-hosted models

Monitoring Provider Performance

Enable Logging

[general]
enable_logging = true
log_level = "INFO"

Track Performance

-- Monitor which provider is being used
-- Check logs for provider selection and response times

Performance Metrics

Key metrics to monitor:

Response Time: How long queries take to generate
Success Rate: Percentage of successful generations
Cost: Monthly API usage costs
Quality: Accuracy of generated queries

Choose your AI provider based on your specific needs: cost sensitivity, query complexity, response time requirements, and accuracy needs. The extension’s flexible provider system allows you to optimize for your particular use case.

Performance & Best Practices

This guide provides recommendations for using pg_ai_query effectively in production environments, optimizing performance, and following security best practices.

Performance Optimization

Database Design for AI Query Generation

1. Schema Design

Use Descriptive Naming

-- Good: Clear, business-friendly names
CREATE TABLE customer_orders (
    order_id SERIAL PRIMARY KEY,
    customer_id INTEGER,
    order_date TIMESTAMP,
    total_amount DECIMAL(10,2),
    order_status VARCHAR(20)
);

-- Avoid: Cryptic abbreviations
CREATE TABLE cust_ord (
    id INT,
    cid INT,
    dt TIMESTAMP,
    amt DECIMAL,
    st VARCHAR
);

Proper Relationships

-- Always define foreign key relationships
ALTER TABLE customer_orders
ADD CONSTRAINT fk_customer_orders_customer
FOREIGN KEY (customer_id) REFERENCES customers(customer_id);

-- Add meaningful constraints
ALTER TABLE customer_orders
ADD CONSTRAINT check_positive_amount CHECK (total_amount > 0);

Strategic Indexing

-- Index frequently queried columns
CREATE INDEX idx_orders_date ON customer_orders(order_date);
CREATE INDEX idx_orders_customer ON customer_orders(customer_id);
CREATE INDEX idx_orders_status ON customer_orders(order_status);

-- Composite indexes for common query patterns
CREATE INDEX idx_orders_customer_date
ON customer_orders(customer_id, order_date);

2. Data Types and Constraints

Choose Appropriate Data Types

-- Use specific types that convey meaning
CREATE TABLE products (
    product_id SERIAL PRIMARY KEY,
    product_name VARCHAR(200) NOT NULL,      -- Limited length
    category_id INTEGER,                     -- Clear numeric ID
    price DECIMAL(10,2),                     -- Exact decimal for money
    is_active BOOLEAN DEFAULT true,          -- Clear boolean
    created_date DATE,                       -- Date only, not timestamp
    last_updated TIMESTAMP DEFAULT NOW()     -- Full timestamp for updates
);

Add Table and Column Comments

COMMENT ON TABLE products IS 'Product catalog with pricing and inventory information';
COMMENT ON COLUMN products.category_id IS 'References product_categories.category_id';
COMMENT ON COLUMN products.is_active IS 'false = discontinued product, true = available for sale';

Query Generation Performance

1. Optimize Natural Language Requests

Be Specific and Clear

-- Good: Specific request
SELECT generate_query('show customers who placed orders in the last 30 days with their total order value');

-- Less optimal: Vague request
SELECT generate_query('show customer stuff');

Include Context When Needed

-- For complex schemas, provide business context
SELECT generate_query('show quarterly revenue trends by product category for fiscal year 2024');

-- Specify table relationships when ambiguous
SELECT generate_query('show orders joined with customer information including customer names and emails');

2. Schema Size Optimization

Limit Table Discovery Scope

-- For very large schemas, use specific schemas
-- Configure to focus on business tables only
SELECT generate_query('show sales data from last month');

Use Views for Complex Schemas

-- Create business-friendly views
CREATE VIEW sales_summary AS
SELECT
    o.order_id,
    c.customer_name,
    o.order_date,
    o.total_amount,
    p.product_name
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id;

-- AI can then reference the simplified view
SELECT generate_query('show recent sales from sales_summary view');

3. Caching and Session Management

Understand Caching Behavior

Schema information is cached per PostgreSQL session
Reconnecting clears the cache and triggers re-analysis
Keep sessions alive for better performance

Session Management

# Python example: Reuse connections
import psycopg2
from psycopg2 import pool

# Create connection pool
connection_pool = psycopg2.pool.SimpleConnectionPool(1, 20, database="mydb")

def generate_ai_query(natural_query):
    conn = connection_pool.getconn()
    try:
        cursor = conn.cursor()
        cursor.execute("SELECT generate_query(%s)", (natural_query,))
        return cursor.fetchone()[0]
    finally:
        connection_pool.putconn(conn)

Configuration Optimization

1. Provider and Model Selection

Choose Appropriate Models

# For production: Balance speed and accuracy
[openai]
api_key = "your-key"
default_model = "gpt-4"  # Good balance

# For development: Use faster models
[openai]
default_model = "gpt-3.5-turbo"  # Faster, cheaper

Configure Timeouts

[general]
request_timeout_ms = 45000  # Increase for complex schemas
max_retries = 5            # Increase for production reliability

2. Logging Configuration

Production Logging

[general]
enable_logging = false      # Disable for performance
enable_postgresql_elog = true  # Use PostgreSQL logging only

Development Logging

[general]
enable_logging = true
log_level = "INFO"         # or "DEBUG" for troubleshooting

Security Best Practices

1. API Key Security

Secure Storage

# Set proper file permissions
chmod 600 ~/.pg_ai.config

# Ensure correct ownership
chown postgres:postgres ~/.pg_ai.config  # For PostgreSQL user

Configuration Files Only

# Use secure configuration files
[openai]
api_key = "your-key-here"

Key Rotation

# Regularly rotate API keys
[openai]
api_key = "new-rotated-key"

2. Database Security

Principle of Least Privilege

-- Create dedicated user for AI queries
CREATE USER ai_query_user WITH PASSWORD 'secure_password';

-- Grant only necessary permissions
GRANT SELECT ON ALL TABLES IN SCHEMA public TO ai_query_user;
GRANT USAGE ON SCHEMA public TO ai_query_user;

-- Revoke dangerous permissions
REVOKE INSERT, UPDATE, DELETE ON ALL TABLES IN SCHEMA public FROM ai_query_user;

Schema Isolation

-- Create separate schema for AI-accessible views
CREATE SCHEMA ai_views;

-- Create simplified, safe views
CREATE VIEW ai_views.customer_summary AS
SELECT
    customer_id,
    customer_name,
    city,
    registration_date
FROM customers
WHERE is_active = true;  -- Only show active customers

-- Grant access only to AI schema
GRANT USAGE ON SCHEMA ai_views TO ai_query_user;
GRANT SELECT ON ALL TABLES IN SCHEMA ai_views TO ai_query_user;

3. Query Validation

Always Review Generated Queries

def safe_ai_query_execution(natural_query):
    # Generate the query
    generated_sql = generate_query(natural_query)

    # Log for audit
    logger.info(f"Generated SQL: {generated_sql}")

    # Basic safety checks
    if any(keyword in generated_sql.upper() for keyword in ['DROP', 'DELETE', 'TRUNCATE']):
        raise SecurityError("Potentially dangerous SQL generated")

    # Execute with read-only user
    return execute_with_readonly_user(generated_sql)

Implement Query Whitelisting

-- Create function to validate query patterns
CREATE OR REPLACE FUNCTION validate_ai_query(query_text TEXT)
RETURNS BOOLEAN AS $$
BEGIN
    -- Only allow SELECT statements
    IF NOT query_text ILIKE 'SELECT%' THEN
        RETURN FALSE;
    END IF;

    -- Block system table access
    IF query_text ILIKE '%pg_%' OR query_text ILIKE '%information_schema%' THEN
        RETURN FALSE;
    END IF;

    RETURN TRUE;
END;
$$ LANGUAGE plpgsql;

4. Audit and Monitoring

Query Logging

import logging

# Set up audit logging
audit_logger = logging.getLogger('ai_query_audit')
audit_handler = logging.FileHandler('/var/log/ai_queries.log')
audit_logger.addHandler(audit_handler)

def audited_generate_query(natural_query, user_id):
    audit_logger.info(f"User {user_id} requested: {natural_query}")

    result = generate_query(natural_query)

    audit_logger.info(f"Generated for user {user_id}: {result}")
    return result

Monitor API Usage

# Enable logging to monitor API costs
[general]
enable_logging = true
log_level = "INFO"

Production Deployment

1. Architecture Patterns

Application-Level Integration

class AIQueryService:
    def __init__(self):
        self.connection_pool = create_pool()
        self.cache = {}

    def generate_query(self, natural_query, user_context=None):
        # Add user context to improve query generation
        contextual_query = f"{natural_query}"
        if user_context:
            contextual_query += f" for user role {user_context['role']}"

        return self._execute_ai_query(contextual_query)

    def _execute_ai_query(self, query):
        # Implement caching, security checks, etc.
        pass

Microservice Pattern

# Docker Compose example
version: '3.8'
services:
  ai-query-service:
    image: your-app:latest
    environment:
      - DATABASE_URL=postgresql://ai_user:pass@db:5432/mydb
    volumes:
      - ./pg_ai_config:/etc/pg_ai
    depends_on:
      - database

  database:
    image: postgres:14
    environment:
      - POSTGRES_DB=mydb
    volumes:
      - postgres_data:/var/lib/postgresql/data

2. Error Handling

Graceful Degradation

def robust_query_generation(natural_query, fallback_provider=None):
    try:
        return generate_query(natural_query)
    except APIError as e:
        if fallback_provider:
            logger.warning(f"Primary provider failed: {e}, trying {fallback_provider}")
            return generate_query(natural_query, provider=fallback_provider)
        else:
            logger.error(f"Query generation failed: {e}")
            return None
    except Exception as e:
        logger.error(f"Unexpected error: {e}")
        return None

Retry Logic

import time
from functools import wraps

def retry_with_backoff(max_retries=3, base_delay=1):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise
                    delay = base_delay * (2 ** attempt)
                    time.sleep(delay)
            return None
        return wrapper
    return decorator

@retry_with_backoff(max_retries=3)
def generate_query_with_retry(natural_query):
    return generate_query(natural_query)

3. Monitoring and Alerting

Key Metrics to Monitor

Query generation success rate
Average response time
API cost per query
Error rates by provider
Schema discovery performance

Alerting Setup

import prometheus_client

# Prometheus metrics
query_generation_duration = prometheus_client.Histogram(
    'ai_query_generation_duration_seconds',
    'Time spent generating AI queries'
)

query_generation_errors = prometheus_client.Counter(
    'ai_query_generation_errors_total',
    'Total number of query generation errors',
    ['provider', 'error_type']
)

@query_generation_duration.time()
def monitored_generate_query(natural_query):
    try:
        return generate_query(natural_query)
    except Exception as e:
        query_generation_errors.labels(
            provider='openai',
            error_type=type(e).__name__
        ).inc()
        raise

Cost Management

1. Usage Optimization

Request Optimization

def optimize_query_request(natural_query):
    # Cache common queries
    cache_key = hashlib.md5(natural_query.encode()).hexdigest()
    if cache_key in query_cache:
        return query_cache[cache_key]

    # Use cheaper model for simple queries
    if is_simple_query(natural_query):
        result = generate_query(natural_query, provider='openai', model='gpt-3.5-turbo')
    else:
        result = generate_query(natural_query, provider='openai', model='gpt-4')

    query_cache[cache_key] = result
    return result

def is_simple_query(query):
    simple_patterns = ['show', 'list', 'count', 'find']
    return any(pattern in query.lower() for pattern in simple_patterns)

2. Budget Controls

API Cost Tracking

class CostTracker:
    def __init__(self, monthly_budget=100):
        self.monthly_budget = monthly_budget
        self.current_spend = 0

    def track_request(self, provider, model, tokens_used):
        cost = calculate_cost(provider, model, tokens_used)
        self.current_spend += cost

        if self.current_spend > self.monthly_budget * 0.9:
            logger.warning("Approaching monthly budget limit")

        if self.current_spend > self.monthly_budget:
            raise BudgetExceededException("Monthly budget exceeded")

3. Usage Analytics

Query Pattern Analysis

-- Analyze query patterns for optimization
CREATE TABLE ai_query_log (
    id SERIAL PRIMARY KEY,
    user_id INTEGER,
    natural_query TEXT,
    generated_sql TEXT,
    provider VARCHAR(20),
    model VARCHAR(50),
    execution_time_ms INTEGER,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Find most common query types
SELECT
    LEFT(natural_query, 50) as query_pattern,
    COUNT(*) as frequency,
    AVG(execution_time_ms) as avg_time
FROM ai_query_log
GROUP BY LEFT(natural_query, 50)
ORDER BY frequency DESC;

Following these best practices will help you deploy pg_ai_query successfully in production environments while maintaining security, performance, and cost efficiency.

Integration Patterns

This guide covers common patterns for integrating pg_ai_query with applications, development workflows, and data analysis tools.

Application Integration

Python Integration

import psycopg2
import json

class NaturalLanguageDB:
    def __init__(self, connection_string):
        self.conn = psycopg2.connect(connection_string)

    def query(self, description, execute=False):
        """Generate and optionally execute a query from natural language"""
        cur = self.conn.cursor()

        # Generate the query
        cur.execute("SELECT generate_query(%s)", (description,))
        query = cur.fetchone()[0]

        if execute:
            # Execute the generated query
            cur.execute(query)
            results = cur.fetchall()
            return query, results
        else:
            return query, None

    def analyze_performance(self, query):
        """Analyze query performance"""
        cur = self.conn.cursor()
        cur.execute("SELECT explain_query(%s)", (query,))
        return cur.fetchone()[0]

# Usage example
db = NaturalLanguageDB("postgresql://user:pass@localhost/dbname")

# Generate a query
query, _ = db.query("show top customers by revenue")
print(f"Generated: {query}")

# Execute and get results
query, results = db.query("count orders by status", execute=True)
print(f"Results: {results}")

# Analyze performance
analysis = db.analyze_performance(query)
print(f"Performance: {analysis}")

Node.js Integration

const { Pool } = require('pg');

class AIQueryClient {
    constructor(config) {
        this.pool = new Pool(config);
    }

    async generateQuery(description, apiKey = null, provider = 'auto') {
        const client = await this.pool.connect();
        try {
            const result = await client.query(
                'SELECT generate_query($1, $2, $3)',
                [description, apiKey, provider]
            );
            return result.rows[0].generate_query;
        } finally {
            client.release();
        }
    }

    async executeNaturalQuery(description) {
        const query = await this.generateQuery(description);
        const client = await this.pool.connect();
        try {
            const result = await client.query(query);
            return {
                query: query,
                results: result.rows,
                rowCount: result.rowCount
            };
        } finally {
            client.release();
        }
    }

    async analyzeQuery(query) {
        const client = await this.pool.connect();
        try {
            const result = await client.query(
                'SELECT explain_query($1)',
                [query]
            );
            return result.rows[0].explain_query;
        } finally {
            client.release();
        }
    }
}

// Usage
const aiClient = new AIQueryClient({
    user: 'postgres',
    host: 'localhost',
    database: 'myapp',
    password: 'password',
    port: 5432,
});

// Generate and execute
aiClient.executeNaturalQuery('show recent orders')
    .then(result => {
        console.log('Query:', result.query);
        console.log('Results:', result.results);
    });

Development Workflow Integration

Code Review Integration

Create a function to validate new queries during code review:

-- Create a query validation function
CREATE OR REPLACE FUNCTION validate_query_performance(query_text text)
RETURNS TABLE(
    performance_score int,
    issues text[],
    recommendations text[]
) AS $$
DECLARE
    analysis text;
    score int := 100;
    issues_array text[] := '{}';
    recommendations_array text[] := '{}';
BEGIN
    -- Get AI analysis
    SELECT explain_query(query_text) INTO analysis;

    -- Simple scoring based on analysis content
    IF analysis ILIKE '%sequential scan%' THEN
        score := score - 30;
        issues_array := issues_array || 'Sequential scan detected';
        recommendations_array := recommendations_array || 'Consider adding indexes';
    END IF;

    IF analysis ILIKE '%high cost%' OR analysis ILIKE '%expensive%' THEN
        score := score - 20;
        issues_array := issues_array || 'High execution cost';
    END IF;

    RETURN QUERY SELECT score, issues_array, recommendations_array;
END;
$$ LANGUAGE plpgsql;

-- Usage in CI/CD
SELECT * FROM validate_query_performance('
    SELECT * FROM orders o
    JOIN users u ON o.user_id = u.id
    WHERE o.created_at > NOW() - INTERVAL ''30 days''
');

Git Hooks Integration

#!/bin/bash
# .git/hooks/pre-commit

# Extract SQL queries from staged files
git diff --cached --name-only | grep '\.sql$' | while read file; do
    echo "Analyzing SQL file: $file"

    # Extract queries and analyze with pg_ai_query
    psql -d mydb -c "
        SELECT explain_query(\$\$$(cat $file)\$\$)
    " | grep -i 'warning\|issue\|slow'

    if [ $? -eq 0 ]; then
        echo "Performance issues detected in $file"
        exit 1
    fi
done

Business Intelligence Integration

Jupyter Notebook Integration

# Install: pip install psycopg2-binary pandas plotly

import pandas as pd
import psycopg2
import plotly.express as px

def natural_query_to_dataframe(connection_string, description):
    """Convert natural language to SQL and return pandas DataFrame"""
    with psycopg2.connect(connection_string) as conn:
        # Generate query
        query_df = pd.read_sql("SELECT generate_query(%s) as query", conn, params=[description])
        query = query_df.iloc[0]['query']

        # Execute query
        result_df = pd.read_sql(query, conn)

        print(f"Generated Query: {query}")
        return result_df

# Usage in Jupyter
df = natural_query_to_dataframe(
    "postgresql://user:pass@localhost/db",
    "monthly revenue trend for the last year"
)

# Create visualization
fig = px.line(df, x='month', y='revenue', title='Monthly Revenue Trend')
fig.show()

Grafana Integration

Create a PostgreSQL data source query that uses pg_ai_query:

-- Grafana query template
SELECT
    time_column,
    value_column
FROM (
    ${__query(SELECT generate_query('${natural_description:text}'))}
) generated_query
WHERE $__timeFilter(time_column)

Data Analysis Workflows

Automated Reporting

import smtplib
from email.mime.text import MIMEText
from email.mime.multipart import MIMEMultipart
import psycopg2
import pandas as pd
from datetime import datetime, timedelta

class AutomatedReporter:
    def __init__(self, db_config, email_config):
        self.db_config = db_config
        self.email_config = email_config

    def generate_report(self, queries_descriptions):
        """Generate automated reports from natural language queries"""
        report_data = {}

        with psycopg2.connect(**self.db_config) as conn:
            for name, description in queries_descriptions.items():
                # Generate and execute query
                df = pd.read_sql(
                    "SELECT generate_query(%s) as query",
                    conn, params=[description]
                )
                query = df.iloc[0]['query']

                # Execute the generated query
                result_df = pd.read_sql(query, conn)
                report_data[name] = result_df

                # Analyze performance
                analysis_df = pd.read_sql(
                    "SELECT explain_query(%s) as analysis",
                    conn, params=[query]
                )
                print(f"Performance analysis for {name}:")
                print(analysis_df.iloc[0]['analysis'])

        return report_data

    def send_report(self, report_data):
        """Send the generated report via email"""
        # Create email content from report data
        # Implementation depends on your email service
        pass

# Usage
reporter = AutomatedReporter(
    db_config={'host': 'localhost', 'database': 'mydb', ...},
    email_config={'smtp_server': 'smtp.gmail.com', ...}
)

daily_queries = {
    'user_signups': 'daily user registrations for the last 7 days',
    'revenue': 'daily revenue for the last 7 days',
    'top_products': 'top 10 products by sales today'
}

report = reporter.generate_report(daily_queries)

Query Performance Monitoring

-- Create a monitoring system
CREATE TABLE query_performance_log (
    id SERIAL PRIMARY KEY,
    query_description text,
    generated_query text,
    execution_time interval,
    analysis_summary text,
    created_at timestamp DEFAULT NOW()
);

-- Function to log and analyze queries
CREATE OR REPLACE FUNCTION monitor_query_performance(description text)
RETURNS void AS $$
DECLARE
    query text;
    start_time timestamp;
    end_time timestamp;
    exec_time interval;
    analysis text;
BEGIN
    -- Generate query
    SELECT generate_query(description) INTO query;

    -- Time the execution
    start_time := clock_timestamp();
    EXECUTE query;
    end_time := clock_timestamp();
    exec_time := end_time - start_time;

    -- Get performance analysis
    SELECT explain_query(query) INTO analysis;

    -- Log the results
    INSERT INTO query_performance_log
    (query_description, generated_query, execution_time, analysis_summary)
    VALUES (description, query, exec_time, analysis);

    -- Alert on slow queries
    IF exec_time > interval '5 seconds' THEN
        RAISE WARNING 'Slow query detected: % took %', description, exec_time;
    END IF;
END;
$$ LANGUAGE plpgsql;

-- Usage
SELECT monitor_query_performance('daily active users count');

API Integration Patterns

REST API Wrapper

from flask import Flask, request, jsonify
import psycopg2
import os

app = Flask(__name__)

def get_db_connection():
    return psycopg2.connect(
        host=os.getenv('DB_HOST'),
        database=os.getenv('DB_NAME'),
        user=os.getenv('DB_USER'),
        password=os.getenv('DB_PASSWORD')
    )

@app.route('/api/query', methods=['POST'])
def natural_language_query():
    """Generate and optionally execute queries from natural language"""
    data = request.json
    description = data.get('description')
    execute = data.get('execute', False)
    api_key = data.get('api_key')
    provider = data.get('provider', 'auto')

    if not description:
        return jsonify({'error': 'Description required'}), 400

    try:
        conn = get_db_connection()
        cur = conn.cursor()

        # Generate query
        cur.execute(
            "SELECT generate_query(%s, %s, %s)",
            (description, api_key, provider)
        )
        query = cur.fetchone()[0]

        result = {'query': query}

        if execute:
            # Execute the generated query
            cur.execute(query)
            results = cur.fetchall()
            columns = [desc[0] for desc in cur.description]

            result['results'] = [dict(zip(columns, row)) for row in results]
            result['row_count'] = len(results)

        conn.close()
        return jsonify(result)

    except Exception as e:
        return jsonify({'error': str(e)}), 500

@app.route('/api/explain', methods=['POST'])
def explain_query():
    """Analyze query performance"""
    data = request.json
    query = data.get('query')

    if not query:
        return jsonify({'error': 'Query required'}), 400

    try:
        conn = get_db_connection()
        cur = conn.cursor()

        cur.execute("SELECT explain_query(%s)", (query,))
        analysis = cur.fetchone()[0]

        conn.close()
        return jsonify({'analysis': analysis})

    except Exception as e:
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    app.run(debug=True)

Testing Integration

Unit Testing Queries

import unittest
import psycopg2

class TestQueryGeneration(unittest.TestCase):
    def setUp(self):
        self.conn = psycopg2.connect("postgresql://test_user:test_pass@localhost/test_db")

    def test_basic_query_generation(self):
        """Test basic query generation functionality"""
        cur = self.conn.cursor()

        # Test simple query
        cur.execute("SELECT generate_query('show all users')")
        query = cur.fetchone()[0]

        self.assertIn('SELECT', query.upper())
        self.assertIn('users', query.lower())

    def test_query_performance(self):
        """Test that generated queries perform reasonably well"""
        cur = self.conn.cursor()

        # Generate a query
        cur.execute("SELECT generate_query('count active users')")
        query = cur.fetchone()[0]

        # Analyze performance
        cur.execute("SELECT explain_query(%s)", (query,))
        analysis = cur.fetchone()[0]

        # Check for performance issues
        self.assertNotIn('Sequential Scan', analysis)

    def tearDown(self):
        self.conn.close()

Production Deployment Patterns

Connection Pooling

from psycopg2 import pool
import threading

class AIQueryPool:
    def __init__(self, minconn, maxconn, **kwargs):
        self.pool = pool.ThreadedConnectionPool(
            minconn, maxconn, **kwargs
        )
        self.lock = threading.Lock()

    def execute_natural_query(self, description):
        conn = self.pool.getconn()
        try:
            cur = conn.cursor()
            cur.execute("SELECT generate_query(%s)", (description,))
            query = cur.fetchone()[0]

            cur.execute(query)
            results = cur.fetchall()
            return query, results
        finally:
            self.pool.putconn(conn)

# Usage
query_pool = AIQueryPool(
    minconn=5, maxconn=20,
    host='localhost', database='prod_db',
    user='app_user', password='secure_password'
)

Error Handling and Logging

import logging
from functools import wraps

def handle_ai_query_errors(func):
    @wraps(func)
    def wrapper(*args, **kwargs):
        try:
            return func(*args, **kwargs)
        except psycopg2.Error as e:
            logging.error(f"Database error in {func.__name__}: {e}")
            raise
        except Exception as e:
            logging.error(f"Unexpected error in {func.__name__}: {e}")
            raise
    return wrapper

@handle_ai_query_errors
def safe_natural_query(connection, description):
    """Safely execute natural language queries with error handling"""
    cur = connection.cursor()
    cur.execute("SELECT generate_query(%s)", (description,))
    return cur.fetchone()[0]

Function Reference

This page provides complete reference documentation for all functions provided by the pg_ai_query extension.

Main Functions

generate_query()

Generates SQL queries from natural language descriptions with automatic schema discovery.

Signature

generate_query(
    natural_language_query text,
    api_key text DEFAULT NULL,
    provider text DEFAULT 'auto'
) RETURNS text

Parameters

Parameter	Type	Required	Default	Description
`natural_language_query`	text	✓	-	The natural language description of the query you want
`api_key`	text	✗	NULL	API key for AI provider (uses config if NULL)
`provider`	text	✗	‘auto’	AI provider to use: ‘openai’, ‘anthropic’, or ‘auto’

Returns

Type: text
Content: A valid PostgreSQL SQL query

Examples

-- Basic usage
SELECT generate_query('show all users');

-- With custom API key
SELECT generate_query(
    'find users created in the last week',
    'sk-your-api-key-here'
);

-- With specific provider
SELECT generate_query(
    'calculate monthly revenue by product category',
    NULL,
    'openai'
);

-- Complex query
SELECT generate_query('show top 10 customers by total order value with their contact information');

Behavior

Schema Discovery: Automatically analyzes your database schema to understand table structures and relationships
Safety Limits: Always adds LIMIT clauses to SELECT queries (configurable)
Query Validation: Validates generated queries for safety and correctness
Error Handling: Returns descriptive error messages for invalid requests

Supported Query Types

Type	Description	Examples
SELECT	Data retrieval with filtering, joins, aggregation	`'show users'`, `'count orders by status'`
INSERT	Data insertion	`'insert a new user with name John'`
UPDATE	Data modification	`'update user email where id is 5'`
DELETE	Data removal	`'delete cancelled orders'`

explain_query()

Analyzes query performance using EXPLAIN ANALYZE and provides AI-powered optimization insights.

Signature

explain_query(
    query_text text,
    api_key text DEFAULT NULL,
    provider text DEFAULT 'auto'
) RETURNS text

Parameters

Parameter	Type	Required	Default	Description
`query_text`	text	✓	-	The SQL query to analyze
`api_key`	text	✗	NULL	API key for AI provider (uses config if NULL)
`provider`	text	✗	‘auto’	AI provider to use: ‘openai’, ‘anthropic’, or ‘auto’

Returns

Type: text
Content: Detailed performance analysis and optimization recommendations

Examples

-- Basic usage
SELECT explain_query('SELECT * FROM users WHERE active = true');

-- Complex query analysis
SELECT explain_query('
    SELECT u.username, COUNT(o.id) as order_count
    FROM users u
    LEFT JOIN orders o ON u.id = o.user_id
    GROUP BY u.username
    ORDER BY order_count DESC
    LIMIT 100
');

-- With specific provider
SELECT explain_query(
    'SELECT * FROM products WHERE price > 100',
    'your-api-key',
    'anthropic'
);

-- Integration with generate_query
WITH generated AS (
    SELECT generate_query('find recent high-value orders') as query
)
SELECT explain_query((SELECT query FROM generated));

Output Structure

The function returns structured text analysis containing:

Query Overview: Brief description of query purpose
Performance Summary: Execution time, cost estimates, rows processed
Execution Plan Analysis: Breakdown of PostgreSQL’s execution strategy
Performance Issues: Identified bottlenecks and inefficiencies
Optimization Suggestions: Specific recommendations for improvement
Index Recommendations: Suggested indexes with CREATE statements

Supported Query Types

Type	Description	Notes
SELECT	Data retrieval queries	Fully supported
WITH	Common Table Expressions	Fully supported
VALUES	Value lists	Fully supported

Behavior

Actual Execution: Uses EXPLAIN ANALYZE which executes the query
Performance Metrics: Provides real execution times and row counts
AI Analysis: Processes execution plan through AI for insights
Security: Only allows read-only query types
Error Handling: Graceful handling of invalid queries

Performance Considerations

Query Execution: The function actually runs your query
Analysis Time: AI processing adds 1-3 seconds
Resource Usage: Same as running the query manually
Best for: Development, testing, and performance optimization

get_database_tables()

Returns metadata about all user tables in the database.

Signature

get_database_tables() RETURNS text

Parameters

None.

Returns

Type: text (JSON format)
Content: Array of table metadata objects

JSON Structure

[
  {
    "table_name": "users",
    "schema_name": "public",
    "table_type": "BASE TABLE",
    "estimated_rows": 1500,
    "table_size": "128 kB"
  },
  {
    "table_name": "orders",
    "schema_name": "public",
    "table_type": "BASE TABLE",
    "estimated_rows": 5000,
    "table_size": "512 kB"
  }
]

Example Usage

-- Get all tables
SELECT get_database_tables();

-- Pretty print with formatting
SELECT jsonb_pretty(get_database_tables()::jsonb);

-- Extract specific information
SELECT
    table_name,
    estimated_rows
FROM jsonb_to_recordset(get_database_tables()::jsonb)
AS x(table_name text, estimated_rows int);

get_table_details()

Returns detailed information about a specific table including columns, constraints, and relationships.

Signature

get_table_details(
    table_name text,
    schema_name text DEFAULT 'public'
) RETURNS text

Parameters

Parameter	Type	Required	Default	Description
`table_name`	text	✓	-	Name of the table to analyze
`schema_name`	text	✗	‘public’	Schema containing the table

Returns

Type: text (JSON format)
Content: Detailed table information object

JSON Structure

{
  "table_name": "users",
  "schema_name": "public",
  "columns": [
    {
      "column_name": "id",
      "data_type": "integer",
      "is_nullable": false,
      "column_default": "nextval('users_id_seq'::regclass)",
      "is_primary_key": true
    },
    {
      "column_name": "email",
      "data_type": "character varying",
      "character_maximum_length": 150,
      "is_nullable": false,
      "column_default": null,
      "is_unique": true
    }
  ],
  "constraints": [
    {
      "constraint_name": "users_pkey",
      "constraint_type": "PRIMARY KEY",
      "column_names": ["id"]
    },
    {
      "constraint_name": "users_email_key",
      "constraint_type": "UNIQUE",
      "column_names": ["email"]
    }
  ],
  "foreign_keys": [],
  "indexes": [
    {
      "index_name": "users_pkey",
      "columns": ["id"],
      "is_unique": true,
      "is_primary": true
    }
  ]
}

Example Usage

-- Basic usage
SELECT get_table_details('users');

-- Specific schema
SELECT get_table_details('orders', 'sales');

-- Extract column information
SELECT
    column_name,
    data_type,
    is_nullable
FROM jsonb_to_recordset(
    (get_table_details('users')::jsonb)->'columns'
) AS x(column_name text, data_type text, is_nullable boolean);

-- Find all foreign keys
SELECT
    constraint_name,
    column_names
FROM jsonb_to_recordset(
    (get_table_details('orders')::jsonb)->'foreign_keys'
) AS x(constraint_name text, column_names text[]);

Utility Functions

Schema Discovery Process

The extension uses these internal processes when analyzing your database:

Table Discovery: Identifies all user tables (excludes system catalogs)
Column Analysis: Examines data types, constraints, and nullable fields
Relationship Mapping: Discovers foreign key relationships between tables
Index Analysis: Identifies indexes that might optimize query performance
Statistics Gathering: Collects row counts and table sizes for optimization

Internal Validation

All functions perform these validations:

Permission Checks: Ensures access to only authorized tables
Schema Validation: Verifies table and schema existence
SQL Injection Prevention: Sanitizes all inputs
Query Safety: Prevents generation of harmful operations

Error Codes and Messages

Common Error Conditions

Error	Cause	Solution
`"API key not configured"`	No valid API key found	Configure API key in `~/.pg_ai.config` or pass as parameter
`"No tables found"`	Database has no user tables	Create some tables or check permissions
`"Table does not exist"`	Specified table not found	Check table name and schema
`"Query generation failed"`	AI service error	Check API key, network connectivity, and service status
`"Invalid provider"`	Unknown provider specified	Use ‘openai’, ‘anthropic’, or ‘auto’

Debugging Functions

-- Check extension version and status
SELECT extversion FROM pg_extension WHERE extname = 'pg_ai_query';

-- Verify function availability
\df generate_query
\df get_database_tables
\df get_table_details

-- Test basic functionality
SELECT generate_query('SELECT 1');

Performance Characteristics

Query Generation Performance

Factor	Impact	Recommendations
Database Size	Minimal	Schema analysis is cached
Table Count	Low	Only user tables are analyzed
Query Complexity	Medium	More complex requests take longer
AI Model	High	GPT-4o is slower but more accurate than GPT-3.5

Memory Usage

Schema Analysis: ~1MB per 100 tables
Query Generation: ~5-10MB per request
Caching: Schema information cached in session

Network Requirements

API Requests: HTTPS to OpenAI/Anthropic APIs
Bandwidth: ~1-10KB per query generation request
Latency: Typically 1-5 seconds per request

Function Security

Access Control

Functions execute with caller’s privileges
No privilege escalation
Respects PostgreSQL’s standard permission system

Data Protection

Never sends actual data to AI providers
Only schema metadata is transmitted
Generated queries can be reviewed before execution

API Security

API keys transmitted over HTTPS only
Keys stored securely in configuration files
No logging of API keys

Next Steps

See Examples for practical usage patterns
Review Best Practices for production usage
Check API Reference for complete technical details

API Reference

This page provides complete technical reference for the pg_ai_query extension API.

Extension Information

Property	Value
Name	`pg_ai_query`
Version	1.0
Schema	`public`
Dependencies	PostgreSQL 12+

SQL API

CREATE EXTENSION

CREATE EXTENSION [IF NOT EXISTS] pg_ai_query [WITH] [SCHEMA schema_name];

Creates the extension and installs all functions.

Example:

CREATE EXTENSION IF NOT EXISTS pg_ai_query;

DROP EXTENSION

DROP EXTENSION [IF EXISTS] pg_ai_query [CASCADE | RESTRICT];

Removes the extension and all its functions.

Example:

DROP EXTENSION IF EXISTS pg_ai_query CASCADE;

Core Functions API

generate_query()

Signature:

generate_query(
    natural_language_query text,
    api_key text DEFAULT NULL,
    provider text DEFAULT 'auto'
) RETURNS text

Parameters:

Parameter	Type	Constraints	Description
`natural_language_query`	`text`	NOT NULL, max 4096 chars	Natural language description of desired query
`api_key`	`text`	NULL allowed	API key for AI provider (optional if configured)
`provider`	`text`	‘openai’, ‘anthropic’, ‘auto’	AI provider selection

Returns:

Type: text
Format: Valid PostgreSQL SQL query
Constraints: Always includes LIMIT clause for SELECT statements

Exceptions:

EXTERNAL_ROUTINE_EXCEPTION: AI API communication failures
INVALID_PARAMETER_VALUE: Invalid provider or malformed input
CONFIGURATION_FILE_ERROR: Configuration issues

Example:

SELECT generate_query('show top 10 customers by revenue');

get_database_tables()

Signature:

get_database_tables() RETURNS text

Parameters: None

Returns:

Type: text
Format: JSON array
Content: Table metadata objects

JSON Schema:

{
  "type": "array",
  "items": {
    "type": "object",
    "properties": {
      "table_name": {"type": "string"},
      "schema_name": {"type": "string"},
      "table_type": {"type": "string"},
      "estimated_rows": {"type": "integer"},
      "table_size": {"type": "string"}
    },
    "required": ["table_name", "schema_name", "table_type"]
  }
}

Exceptions:

EXTERNAL_ROUTINE_EXCEPTION: Database introspection failures

Example:

SELECT get_database_tables();

get_table_details()

Signature:

get_table_details(
    table_name text,
    schema_name text DEFAULT 'public'
) RETURNS text

Parameters:

Parameter	Type	Constraints	Description
`table_name`	`text`	NOT NULL, must exist	Name of table to analyze
`schema_name`	`text`	Valid schema name	Schema containing the table

Returns:

Type: text
Format: JSON object
Content: Detailed table structure information

JSON Schema:

{
  "type": "object",
  "properties": {
    "table_name": {"type": "string"},
    "schema_name": {"type": "string"},
    "columns": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "column_name": {"type": "string"},
          "data_type": {"type": "string"},
          "is_nullable": {"type": "boolean"},
          "column_default": {"type": ["string", "null"]},
          "character_maximum_length": {"type": ["integer", "null"]},
          "numeric_precision": {"type": ["integer", "null"]},
          "is_primary_key": {"type": "boolean"},
          "is_unique": {"type": "boolean"}
        }
      }
    },
    "constraints": {"type": "array"},
    "foreign_keys": {"type": "array"},
    "indexes": {"type": "array"}
  }
}

Exceptions:

INVALID_PARAMETER_VALUE: Table does not exist
EXTERNAL_ROUTINE_EXCEPTION: Database introspection failures

Example:

SELECT get_table_details('users', 'public');

Configuration API

Configuration File Location

The extension reads configuration from:

~/.pg_ai.config

Configuration Structure

[general]
# General settings
log_level = "INFO" | "DEBUG" | "WARNING" | "ERROR"
enable_logging = true | false
enable_postgresql_elog = true | false
request_timeout_ms = <integer>
max_retries = <integer>

[query]
# Query generation settings
enforce_limit = true | false
default_limit = <integer>

[openai]
# OpenAI provider settings
api_key = "<api_key_string>"
default_model = "gpt-4o" | "gpt-4" | "gpt-3.5-turbo"

[anthropic]
# Anthropic provider settings
api_key = "<api_key_string>"
default_model = "claude-3-5-sonnet-20241022"

Configuration Validation Rules

Setting	Type	Range/Values	Default
`log_level`	string	DEBUG, INFO, WARNING, ERROR	INFO
`enable_logging`	boolean	true, false	false
`enable_postgresql_elog`	boolean	true, false	true
`request_timeout_ms`	integer	1000-300000	30000
`max_retries`	integer	0-10	3
`enforce_limit`	boolean	true, false	true
`default_limit`	integer	1-1000000	1000
`api_key`	string	Provider-specific format	“”
`default_model`	string	Provider-specific values	Provider default

Error Codes

Extension Error Codes

Code	Name	Description
`22023`	`INVALID_PARAMETER_VALUE`	Invalid function parameter
`38001`	`CONTAINING_SQL_NOT_PERMITTED`	Unsafe SQL operation attempted
`38003`	`PROHIBITED_SQL_STATEMENT_ATTEMPTED`	Forbidden query type
`39001`	`EXTERNAL_ROUTINE_EXCEPTION`	AI API communication error
`58030`	`IO_ERROR`	File system or network I/O error
`XX000`	`INTERNAL_ERROR`	Unexpected internal error

AI Provider Error Codes

OpenAI Errors

HTTP Code	Error Type	Description
400	`invalid_request_error`	Invalid request parameters
401	`authentication_error`	Invalid API key
403	`permission_error`	Insufficient permissions
429	`rate_limit_error`	Rate limit exceeded
500	`api_error`	OpenAI server error

Anthropic Errors

HTTP Code	Error Type	Description
400	`invalid_request_error`	Invalid request format
401	`authentication_error`	Invalid API key
403	`permission_error`	Access denied
429	`rate_limit_error`	Rate limit exceeded
500	`api_error`	Anthropic server error

Internal APIs

Schema Discovery Engine

The extension uses these internal functions for schema analysis:

-- Internal function signatures (not callable by users)
_pg_ai_analyze_schema() RETURNS schema_info;
_pg_ai_get_table_relationships() RETURNS relationship_map;
_pg_ai_validate_query(query text) RETURNS boolean;

Query Generation Pipeline

Input Validation: Validates natural language input
Schema Discovery: Analyzes database structure
Context Building: Creates AI prompt with schema context
AI Request: Sends request to configured provider
Response Parsing: Extracts SQL from AI response
Query Validation: Validates generated SQL for safety
Result Return: Returns validated SQL query

Security Validation

The extension performs these security checks:

SQL Injection Prevention: Sanitizes all inputs
System Table Protection: Blocks access to pg_* and information_schema
DDL Restriction: Can generate DDL but with warnings
Privilege Respect: Honors PostgreSQL permission system

Performance Characteristics

Memory Usage

Operation	Memory Usage	Duration
Schema Analysis	~1MB per 100 tables	100-500ms
Query Generation	~5-10MB per request	1-5 seconds
Configuration Load	~1KB	<10ms

Network Requirements

Provider	Endpoint	Protocol	Bandwidth
OpenAI	`api.openai.com`	HTTPS	1-10KB per request
Anthropic	`api.anthropic.com`	HTTPS	1-10KB per request

Caching Behavior

Schema Information: Cached per PostgreSQL session
Configuration: Cached until session restart
AI Responses: Not cached (each request is fresh)

Extension Metadata

System Catalog Integration

The extension registers these entries:

-- Extension registration
SELECT * FROM pg_extension WHERE extname = 'pg_ai_query';

-- Function registration
SELECT * FROM pg_proc WHERE proname LIKE '%generate%';

-- Dependencies
SELECT * FROM pg_depend WHERE refobjid IN (
    SELECT oid FROM pg_extension WHERE extname = 'pg_ai_query'
);

Version Information

-- Check extension version
SELECT extversion FROM pg_extension WHERE extname = 'pg_ai_query';

-- Function definitions
\df+ generate_query
\df+ get_database_tables
\df+ get_table_details

Compatibility

PostgreSQL Versions

PostgreSQL Version	Support Status	Notes
12.x	✅ Supported	Minimum required version
13.x	✅ Supported	Full compatibility
14.x	✅ Supported	Recommended
15.x	✅ Supported	Full compatibility
16.x	✅ Supported	Latest tested

Operating Systems

OS	Support Status	Notes
Linux (RHEL/CentOS)	✅ Supported	Primary development platform
Linux (Ubuntu/Debian)	✅ Supported	Well tested
macOS	✅ Supported	Development and testing
Windows	⚠️ Experimental	Limited testing

AI Provider Compatibility

Provider	API Version	Models Supported
OpenAI	v1	GPT-4o, GPT-4, GPT-3.5-turbo
Anthropic	v1	Claude 3.5 Sonnet

This completes the technical API reference for the pg_ai_query extension.

Configuration File Reference

This page provides a complete reference for all configuration options available in the pg_ai_query extension configuration file.

Configuration File Location

The configuration file must be placed at:

~/.pg_ai.config

File Format

The configuration uses INI format with sections and key-value pairs:

# Comments start with # and are ignored
[section_name]
key = value
key_with_quotes = "value with spaces"

Complete Configuration Template

# PG AI Query Configuration File
# Place this file at ~/.pg_ai.config

[general]
# Logging configuration
log_level = "INFO"
enable_logging = false
enable_postgresql_elog = true

# Request configuration
request_timeout_ms = 30000
max_retries = 3

[query]
# Query generation behavior
enforce_limit = true
default_limit = 1000

[openai]
# OpenAI provider configuration
api_key = ""
default_model = "gpt-4o"

[anthropic]
# Anthropic provider configuration
api_key = ""
default_model = "claude-3-5-sonnet-20241022"

Configuration Sections

[general] Section

Controls general extension behavior and logging.

Option	Type	Default	Range/Values	Description
`log_level`	string	“INFO”	“DEBUG”, “INFO”, “WARNING”, “ERROR”	Minimum log level for messages
`enable_logging`	boolean	false	true, false	Enable/disable all logging output
`enable_postgresql_elog`	boolean	true	true, false	Use PostgreSQL’s elog system for logging
`request_timeout_ms`	integer	30000	1000-300000	Timeout for AI API requests in milliseconds
`max_retries`	integer	3	0-10	Maximum retry attempts for failed requests

log_level

Controls the verbosity of log messages.

Values:

"DEBUG": Most verbose, includes all internal operations
"INFO": General information about operations
"WARNING": Warnings about potential issues
"ERROR": Only error messages

Example:

[general]
log_level = "DEBUG"  # Show all log messages

enable_logging

Master switch for all logging output.

Values:

true: Enable logging (respects log_level)
false: Disable all logging output

Example:

[general]
enable_logging = true  # Turn on logging
log_level = "INFO"     # Show INFO level and above

enable_postgresql_elog

Controls whether to use PostgreSQL’s internal logging system.

Values:

true: Use PostgreSQL’s elog system (recommended)
false: Use alternative logging method

Example:

[general]
enable_postgresql_elog = true  # Use PostgreSQL logging

request_timeout_ms

Timeout for AI API requests in milliseconds.

Range: 1000-300000 (1 second to 5 minutes) Recommended: 30000-60000 for most use cases

Example:

[general]
request_timeout_ms = 45000  # 45 second timeout

max_retries

Maximum number of retry attempts for failed API requests.

Range: 0-10 Recommended: 3-5 for production use

Example:

[general]
max_retries = 5  # Retry up to 5 times

[query] Section

Controls query generation behavior and safety features.

Option	Type	Default	Range/Values	Description
`enforce_limit`	boolean	true	true, false	Always add LIMIT clause to SELECT queries
`default_limit`	integer	1000	1-1000000	Default row limit when none specified

enforce_limit

Controls whether SELECT queries automatically include LIMIT clauses.

Values:

true: Always add LIMIT to SELECT queries (recommended for safety)
false: Allow unlimited SELECT queries

Example:

[query]
enforce_limit = true    # Always limit query results
default_limit = 500     # Default to 500 rows

default_limit

Default number of rows to limit when no explicit limit is requested.

Range: 1-1000000 Recommended: 100-5000 depending on use case

Example:

[query]
default_limit = 2000  # Default to 2000 rows

[openai] Section

Configuration for OpenAI provider.

Option	Type	Default	Values	Description
`api_key`	string	“”	API key format	Your OpenAI API key
`default_model`	string	“gpt-4o”	Model names	Default OpenAI model to use

api_key

Your OpenAI API key from platform.openai.com.

Format: Must start with sk- or sk-proj- Security: Keep this key secure and never commit to version control

Example:

[openai]
api_key = "sk-proj-abc123..."  # Your actual API key

default_model

Default OpenAI model to use for query generation.

Available Models:

"gpt-4o": Latest GPT-4 Omni model (recommended)
"gpt-4": Standard GPT-4 model
"gpt-3.5-turbo": Fast and economical model

Example:

[openai]
default_model = "gpt-4o"  # Use latest model

[anthropic] Section

Configuration for Anthropic (Claude) provider.

Option	Type	Default	Values	Description
`api_key`	string	“”	API key format	Your Anthropic API key
`default_model`	string	“claude-3-5-sonnet-20241022”	Model names	Default Claude model to use

api_key

Your Anthropic API key from console.anthropic.com.

Format: Must start with sk-ant- Security: Keep this key secure and never commit to version control

Example:

[anthropic]
api_key = "sk-ant-abc123..."  # Your actual API key

default_model

Default Anthropic model to use for query generation.

Available Models:

"claude-3-5-sonnet-20241022": Latest Claude 3.5 Sonnet model

Example:

[anthropic]
default_model = "claude-3-5-sonnet-20241022"  # Use Claude 3.5 Sonnet

Configuration Examples

Development Configuration

# Development setup with detailed logging
[general]
log_level = "DEBUG"
enable_logging = true
enable_postgresql_elog = true
request_timeout_ms = 60000  # Longer timeout for debugging
max_retries = 2             # Fewer retries for faster feedback

[query]
enforce_limit = true
default_limit = 100         # Small limit for testing

[openai]
api_key = "sk-proj-dev-key-here"
default_model = "gpt-3.5-turbo"  # Cheaper for development

Production Configuration

# Production setup optimized for performance and reliability
[general]
log_level = "WARNING"
enable_logging = false      # Disable for performance
enable_postgresql_elog = true
request_timeout_ms = 30000
max_retries = 5             # More retries for reliability

[query]
enforce_limit = true
default_limit = 1000

[openai]
api_key = "sk-proj-prod-key-here"
default_model = "gpt-4"     # Good balance of quality and cost

Multi-Provider Configuration

# Setup with both providers for redundancy
[general]
log_level = "INFO"
enable_logging = true
request_timeout_ms = 45000
max_retries = 3

[query]
enforce_limit = true
default_limit = 500

[openai]
api_key = "sk-proj-openai-key-here"
default_model = "gpt-4o"

[anthropic]
api_key = "sk-ant-anthropic-key-here"
default_model = "claude-3-5-sonnet-20241022"

# Extension will use OpenAI first (first configured)
# Falls back to Anthropic if OpenAI fails

High-Performance Configuration

# Optimized for speed and low cost
[general]
log_level = "ERROR"         # Minimal logging
enable_logging = false
request_timeout_ms = 15000  # Shorter timeout
max_retries = 2             # Fewer retries

[query]
enforce_limit = true
default_limit = 200         # Smaller default limit

[openai]
api_key = "sk-proj-key-here"
default_model = "gpt-3.5-turbo"  # Fastest model

Configuration Validation

File Format Validation

The extension validates:

INI Format: Proper section headers and key-value pairs
Required Quotes: String values with spaces must be quoted
Section Names: Only valid section names are accepted
Key Names: Only valid configuration keys are recognized

Value Validation

Each configuration value is validated:

Type Checking: Strings, integers, and booleans are validated
Range Checking: Numeric values must be within acceptable ranges
Format Checking: API keys must match expected formats
Model Validation: Model names must be supported

Error Handling

Configuration errors are reported with specific messages:

Configuration Error: Invalid log_level 'VERBOSE' in section [general]. Valid values are: DEBUG, INFO, WARNING, ERROR
Configuration Error: request_timeout_ms must be between 1000 and 300000, got 500000
Configuration Error: OpenAI API key must start with 'sk-' or 'sk-proj-'

Configuration File Only

Currently, only configuration files are supported. All settings must be specified in ~/.pg_ai.config.

Security Considerations

File Permissions

Set secure permissions on the configuration file:

# Make file readable only by owner
chmod 600 ~/.pg_ai.config

# Verify permissions
ls -la ~/.pg_ai.config
# Should show: -rw------- 1 user user ...

API Key Security

Never commit configuration files with API keys to version control
Use secure configuration files in containerized or CI/CD environments
Rotate keys regularly as per your organization’s security policy
Monitor usage through your AI provider’s dashboard

Configuration Templates

Use template files for version control:

# .pg_ai.config.template
[general]
log_level = "INFO"
enable_logging = false

[openai]
api_key = "${OPENAI_API_KEY}"  # Replace with actual key
default_model = "gpt-4o"

[anthropic]
api_key = "${ANTHROPIC_API_KEY}"  # Replace with actual key

Configuration Loading Order

The extension loads configuration in this order:

Default values (hardcoded)
Configuration file (~/.pg_ai.config)

This provides a simple and secure configuration approach.

Error Codes and Messages

This page provides a comprehensive reference for all error codes and messages that you might encounter while using the pg_ai_query extension.

Function-Specific Errors

generate_query Errors

Error Message	Cause	Solution
`"API key required. Pass as parameter or configure ~/.pg_ai.config"`	No API key provided and none configured	Add API key to config file or pass as parameter
`"No API key available for [provider] provider"`	API key missing for specific provider	Configure API key for the requested provider
`"AI API error: [details]"`	AI service returned an error	Check API key validity and service status
`"Natural language query cannot be empty"`	Empty input provided	Provide a non-empty query description
`"Query generation failed: [details]"`	AI failed to generate query	Check your description clarity and try again

explain_query Errors

Error Message	Cause	Solution
`"Query text cannot be empty"`	Empty query provided	Provide a valid SQL query
`"Failed to prepare EXPLAIN query: syntax error"`	Invalid SQL syntax	Fix SQL syntax errors
`"Failed to execute EXPLAIN query"`	Query execution failed	Check query validity and permissions
`"Failed to connect to SPI"`	Database connection issue	Check database connectivity
`"No output from EXPLAIN query"`	EXPLAIN returned no results	Verify query syntax and structure

get_database_tables Errors

Error Message	Cause	Solution
`"Failed to connect to SPI"`	Database connection issue	Check database connectivity
`"Failed to execute query"`	Database query failed	Check permissions and database status
`"No tables found"`	No user tables in database	Create tables or check schema permissions

get_table_details Errors

Error Message	Cause	Solution
`"Table '[table_name]' does not exist"`	Specified table not found	Check table name and schema
`"Failed to get table details: [details]"`	Error retrieving table info	Check permissions and table existence

Debugging Commands

-- Check extension status
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_ai_query';

-- Test basic functionality
SELECT generate_query('SELECT 1');

-- Check function permissions
\df generate_query

Troubleshooting Guide

This guide helps you diagnose and fix common issues with the pg_ai_query extension.

Installation Issues

Extension Not Found

Error: extension "pg_ai_query" is not available

Causes & Solutions:

Extension not installed

# Check if extension files are present
ls /usr/share/postgresql/*/extension/pg_ai_query*

# If missing, reinstall
cd pg_ai_query
sudo make install

Wrong PostgreSQL version

# Check PostgreSQL version
psql -c "SELECT version();"

# Ensure extension was built for this version
pg_config --version

Permission issues

# Check file permissions
ls -la /usr/share/postgresql/*/extension/pg_ai_query*

# Fix if needed
sudo chmod 644 /usr/share/postgresql/*/extension/pg_ai_query*

Compilation Errors

Error: Various C++ compilation errors

Common Causes:

Missing dependencies

# Ubuntu/Debian
sudo apt-get install postgresql-server-dev-all cmake gcc g++ libssl-dev

# CentOS/RHEL
sudo yum install postgresql-devel cmake gcc-c++ openssl-devel

Wrong PostgreSQL headers

# Check pg_config path
which pg_config
pg_config --includedir

# Use specific version if needed
PG_CONFIG=/usr/pgsql-14/bin/pg_config make

C++ standard issues

# Ensure C++20 support
gcc --version  # Should be 8.0+

Configuration Issues

API Key Problems

Error: "API key not configured" or "Invalid API key"

Diagnosis:

-- Enable logging to see configuration loading
-- In ~/.pg_ai.config:
[general]
enable_logging = true
log_level = "DEBUG"

Solutions:

Check configuration file exists

ls -la ~/.pg_ai.config
cat ~/.pg_ai.config

Verify API key format

[openai]
api_key = "sk-proj-abc123..."  # Must start with sk- for OpenAI

[anthropic]
api_key = "sk-ant-abc123..."   # Must start with sk-ant- for Anthropic

Test API key directly

# Test OpenAI key
curl -H "Authorization: Bearer YOUR_API_KEY" \
     -H "Content-Type: application/json" \
     -d '{"model":"gpt-3.5-turbo","messages":[{"role":"user","content":"test"}]}' \
     https://api.openai.com/v1/chat/completions

Check file permissions

# Config file should be readable by PostgreSQL user
chmod 644 ~/.pg_ai.config

Configuration Not Loading

Error: Configuration seems ignored

Solutions:

Restart PostgreSQL session

-- Disconnect and reconnect to reload configuration
\q
psql -d your_database

Check file syntax

# Verify INI format is correct
# No spaces around = signs
# Quotes around string values
api_key = "your-key-here"  # Correct
api_key=your-key-here      # Wrong

Verify home directory

echo $HOME
# Config should be at $HOME/.pg_ai.config

Runtime Issues

No Tables Found

Error: "No tables found" or empty results

Diagnosis:

-- Check what tables exist
SELECT get_database_tables();

-- Check table permissions
SELECT schemaname, tablename, tableowner
FROM pg_tables
WHERE schemaname NOT IN ('information_schema', 'pg_catalog');

Solutions:

Create some tables

-- Extension only works with user tables
CREATE TABLE test_table (id SERIAL, name TEXT);

Check schema focus

-- Extension looks at 'public' schema by default
SET search_path TO public, your_schema;

Query Generation Failures

Error: "Query generation failed" or timeouts

Common Causes & Solutions:

Network connectivity

# Test connection to AI providers
curl -I https://api.openai.com/v1/models
curl -I https://api.anthropic.com/v1/messages

API rate limiting

# Increase timeout in config
[general]
request_timeout_ms = 60000  # 60 seconds
max_retries = 5

Complex database schema

-- For large schemas, try with simpler requests first
SELECT generate_query('SELECT 1');  -- Test basic functionality

Model-specific issues

-- Try different models
SELECT generate_query('show tables', null, 'openai');
SELECT generate_query('show tables', null, 'anthropic');

Poor Query Quality

Issue: Generated queries are incorrect or suboptimal

Solutions:

Be more specific in requests

-- Instead of: "show data"
-- Use: "show user names and email addresses from users table"

Check schema information

-- Ensure tables have good names and structure
SELECT get_table_details('your_table');

Try different models

-- GPT-4 is generally more accurate than GPT-3.5
SELECT generate_query('complex query here', null, 'openai');

Break complex requests into parts

-- Instead of one complex request, try multiple simpler ones
SELECT generate_query('show users');
SELECT generate_query('show orders for user id 5');

Performance Issues

Slow Query Generation

Issue: Extension takes a long time to respond

Diagnosis:

-- Enable detailed logging
-- In ~/.pg_ai.config:
[general]
enable_logging = true
log_level = "DEBUG"

Solutions:

Optimize database schema

-- Reduce number of tables if possible
-- Use clear table and column names
-- Add table comments for context
COMMENT ON TABLE users IS 'Customer information and contact details';

Adjust timeout settings

[general]
request_timeout_ms = 45000  # Increase from default 30000

Use faster models

[openai]
default_model = "gpt-3.5-turbo"  # Faster than gpt-4

High Memory Usage

Issue: PostgreSQL memory usage increases significantly

Solutions:

Restart PostgreSQL sessions periodically

# Schema information is cached per session
# Restart to clear cache

Reduce concurrent users of extension

Monitor extension usage

-- Track function calls
SELECT * FROM pg_stat_user_functions
WHERE funcname LIKE '%generate%';

Debugging Tools

Enable Debug Logging

# ~/.pg_ai.config
[general]
enable_logging = true
log_level = "DEBUG"
enable_postgresql_elog = true

Check Extension Status

-- Verify extension is loaded
SELECT extname, extversion FROM pg_extension WHERE extname = 'pg_ai_query';

-- Check function availability
\df *generate*
\df *database_tables*
\df *table_details*

-- Test basic functionality
SELECT generate_query('SELECT 1 as test');

Monitor System Resources

# Check PostgreSQL logs
tail -f /var/log/postgresql/postgresql-*.log

# Monitor network connections
netstat -an | grep :5432

# Check memory usage
ps aux | grep postgres

Test Configuration

-- Test API connectivity
SELECT generate_query('test query', 'your-test-key', 'openai');

-- Test schema discovery
SELECT get_database_tables();
SELECT get_table_details('your_table');

Common Error Messages

Error Message	Likely Cause	Solution
`extension "pg_ai_query" is not available`	Extension not installed	Run `make install`
`function generate_query does not exist`	Extension not created in DB	Run `CREATE EXTENSION pg_ai_query`
`API key not configured`	Missing or invalid API key	Check `~/.pg_ai.config` file
`No tables found`	No user tables in database	Create some tables or check permissions
`Query generation failed: timeout`	Network or API issues	Check connectivity and increase timeout
`Invalid provider: xyz`	Wrong provider name	Use ‘openai’, ‘anthropic’, or ‘auto’
`Table does not exist: xyz`	Table name incorrect	Check table name and schema

Getting Additional Help

Log Analysis

When reporting issues, include:

Extension version

SELECT extversion FROM pg_extension WHERE extname = 'pg_ai_query';

PostgreSQL version
```
SELECT version();
```
Error logs with debug enabled
Configuration file (with API keys redacted)

Testing Isolation

To isolate problems:

Test with minimal configuration
Use simple test tables
Try different AI providers
Test basic functions first

Community Resources

GitHub Issues: Report bugs and feature requests
Documentation: Check for updated troubleshooting info
PostgreSQL Community: For database-specific issues

Prevention Best Practices

Regular Updates: Keep extension and dependencies updated
Monitoring: Set up logging and monitoring for production use
Testing: Test configuration changes in development first
Backups: Always backup before major updates
Documentation: Document your specific configuration and customizations

Frequently Asked Questions (FAQ)

This page answers common questions about the pg_ai_query extension.

General Questions

What is pg_ai_query?

pg_ai_query is a PostgreSQL extension that converts natural language descriptions into SQL queries using AI models from OpenAI and Anthropic. It integrates directly with your PostgreSQL database to provide intelligent query generation.

How does it work?

You provide a natural language description of what you want to query
The extension analyzes your database schema automatically
It sends the schema context and your request to an AI provider
The AI generates a PostgreSQL-compatible SQL query
The extension returns the SQL query for you to execute

Is my data sent to AI providers?

No. Only your database schema metadata (table names, column names, relationships) is sent to AI providers, never your actual data. The extension is designed with privacy in mind.

Installation and Setup

What PostgreSQL versions are supported?

PostgreSQL 12 and later are supported. We recommend PostgreSQL 14 or later for the best experience.

Do I need to install anything besides PostgreSQL?

You need:

PostgreSQL with development headers
C++ compiler with C++20 support
CMake for building
An API key from OpenAI or Anthropic

Can I use this with cloud PostgreSQL services?

It depends. The extension needs to be compiled and installed on the PostgreSQL server. This works with:

Self-hosted PostgreSQL
VPS or cloud instances where you have admin access
Some managed services that support custom extensions

It won’t work with fully managed services like AWS RDS, Google Cloud SQL, or Azure Database for PostgreSQL that don’t allow custom extensions.

How do I get an API key?

For OpenAI:

Visit platform.openai.com
Create an account and add billing information
Go to API Keys section and create a new key

For Anthropic:

Visit console.anthropic.com
Create an account and add credits
Go to API Keys section and create a new key

Usage Questions

Why does it say “No tables found”?

This happens when:

Your database has no user-created tables (only system tables)
The extension user doesn’t have permission to access tables
You’re in the wrong schema

Solution: Create some tables or check permissions:

-- Check if you have tables
SELECT table_name FROM information_schema.tables
WHERE table_schema = 'public' AND table_type = 'BASE TABLE';

-- Check permissions
SELECT has_table_privilege('your_table', 'SELECT');

Can I use this for INSERT, UPDATE, or DELETE queries?

Yes, but with caution. The extension can generate all types of SQL queries, but:

Always review generated queries before executing
Consider using a read-only database user for safety
Be especially careful with data modification queries

How accurate are the generated queries?

Accuracy depends on several factors:

Schema quality: Well-named tables and columns improve results
AI model: GPT-4 and Claude generally produce better results than GPT-3.5
Query complexity: Simple queries are more accurate than complex ones
Request clarity: Specific requests generate better queries

Can I customize the AI model used?

Yes. You can specify the model in your configuration:

[openai]
default_model = "gpt-4o"  # or "gpt-4", "gpt-3.5-turbo"

[anthropic]
default_model = "claude-3-5-sonnet-20241022"

You can also specify the provider per query:

SELECT generate_query('show users', null, 'openai');
SELECT generate_query('show users', null, 'anthropic');

Performance Questions

How fast is query generation?

Typical response times:

GPT-3.5-turbo: 1-2 seconds
GPT-4/GPT-4o: 2-5 seconds
Claude 3.5 Sonnet: 3-6 seconds

Performance depends on:

AI provider response time
Database schema complexity
Network latency

Does it cache results?

Schema information is cached per PostgreSQL session, but AI responses are not cached. Each query generation makes a fresh API request.

How much does it cost to use?

Costs depend on your AI provider and usage:

GPT-3.5-turbo: ~$0.001-0.005 per query
GPT-4: ~$0.10-0.20 per query
GPT-4o: ~$0.02-0.05 per query
Claude 3.5 Sonnet: ~$0.03-0.06 per query

Actual costs vary based on schema complexity and query length.

Configuration Questions

Where should I put the configuration file?

Place it at ~/.pg_ai.config in the home directory of the user running PostgreSQL (often the postgres user).

Can I use environment variables instead of a config file?

Currently, only configuration files are supported. Place your configuration at ~/.pg_ai.config.

How do I enable logging for debugging?

[general]
enable_logging = true
log_level = "DEBUG"

Then restart your PostgreSQL session and check the logs.

Can I configure multiple providers?

Yes. Configure both OpenAI and Anthropic:

[openai]
api_key = "sk-proj-openai-key"
default_model = "gpt-4o"

[anthropic]
api_key = "sk-ant-anthropic-key"
default_model = "claude-3-5-sonnet-20241022"

The extension will use the first configured provider as default.

Troubleshooting

“Extension not found” error

Make sure the extension is properly installed:

# Check if extension files exist
ls /usr/share/postgresql/*/extension/pg_ai_query*

# Reinstall if needed
cd pg_ai_query
sudo make install

“API key not configured” error

Check your configuration:

# Verify config file exists and has correct content
cat ~/.pg_ai.config

# Check file permissions
ls -la ~/.pg_ai.config  # Should be readable by PostgreSQL user

“Query generation failed” errors

This usually indicates:

Network connectivity issues
Invalid API key
AI provider service problems
Rate limiting

Solutions:

Check your internet connection
Verify API key is correct and active
Try a different provider
Wait a few minutes and retry

Generated queries are wrong

Tips for better results:

Be more specific in your requests
Use proper table and column names
Add table comments to provide context
Try different AI models
Break complex requests into simpler parts

Extension crashes PostgreSQL

This shouldn’t happen in normal operation. If it does:

Check PostgreSQL logs for error messages
Ensure you have the latest version of the extension
Report the issue with reproduction steps

Security Questions

Is it safe to use in production?

Yes, with proper precautions:

Use a read-only database user for query generation
Always review generated queries before execution
Monitor API usage and costs
Keep API keys secure
Enable logging for audit trails

What data is sent to AI providers?

Only database metadata:

Table names and schemas
Column names and data types
Constraint definitions
Relationship information

Never sent:

Actual data content
Database credentials
User information

Can it generate harmful queries?

The extension includes safety features:

Automatic LIMIT clauses on SELECT statements
Blocks access to system tables
Validates query structure

However, always review generated queries, especially for data modification operations.

How do I secure API keys?

Set file permissions: chmod 600 ~/.pg_ai.config
Never commit keys to version control
Use secure configuration files in production
Rotate keys regularly
Monitor usage through provider dashboards

Business Questions

Can I use this commercially?

Yes. The extension itself is open source, but you need to comply with:

Your AI provider’s terms of service
PostgreSQL’s license
Your organization’s data policies

Consider these factors:

Schema metadata may contain sensitive information
AI providers have different compliance certifications
You may need data processing agreements with AI providers
Consider on-premises AI models for sensitive environments

Is there support available?

Community Support: GitHub issues and discussions
Documentation: Comprehensive docs with examples
Enterprise Support: May be available for commercial deployments

Advanced Usage

Can I extend the extension?

The extension is open source and designed to be extensible. You can:

Add support for new AI providers
Customize query generation logic
Add new schema analysis features
Contribute improvements back to the project

Can I use local AI models?

Currently, only cloud-based OpenAI and Anthropic models are supported. Local model support is being considered for future versions.

How do I optimize for my specific use case?

Choose appropriate AI models for your needs
Design your schema with clear, descriptive names
Add comments to tables and columns
Create views for complex business logic
Monitor usage and costs
Train your users on effective natural language queries

Can I integrate this with business intelligence tools?

Yes. Many BI tools can call PostgreSQL functions, so you can integrate query generation into dashboards and reports. Consider creating wrapper functions for specific business use cases.

Don’t see your question here? Check the Troubleshooting Guide or open an issue on the GitHub repository.

Keyboard shortcuts

PostgreSQL AI Query Extension