Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Basic Usage

This guide covers the fundamental ways to use pg_ai_query for generating SQL queries from natural language. Whether you’re exploring data, creating reports, or learning SQL, this guide will help you get the most out of the extension.

Core Function: generate_query()

The main function you’ll use is generate_query(), which converts natural language into SQL:

generate_query(natural_language_query, [api_key], [provider])

Parameters

ParameterTypeRequiredDefaultDescription
natural_language_querytextYes-Your question in plain English
api_keytextNonullAPI key (uses config if not provided)
providertextNo‘auto’AI provider: ‘openai’, ‘anthropic’, or ‘auto’

Basic Examples

-- Simplest form - uses configured API key and provider
SELECT generate_query('show me all users');

-- With custom API key
SELECT generate_query('count total orders', 'your-api-key-here');

-- With specific provider
SELECT generate_query('find recent customers', null, 'openai');

Query Types

1. Simple Data Retrieval

Goal: Get basic data from tables

-- Show all records
SELECT generate_query('show me all products');
SELECT generate_query('list all customers');
SELECT generate_query('display user information');

-- Show specific columns
SELECT generate_query('show user names and emails');
SELECT generate_query('get product names and prices');

2. Filtered Queries

Goal: Find specific records based on conditions

-- Simple filters
SELECT generate_query('find users older than 25');
SELECT generate_query('show completed orders');
SELECT generate_query('get products under $100');

-- Multiple conditions
SELECT generate_query('find users from New York who are older than 30');
SELECT generate_query('show pending orders from the last month');

-- Date-based filters
SELECT generate_query('show orders from yesterday');
SELECT generate_query('find users created in 2024');
SELECT generate_query('get sales from last quarter');

3. Aggregation and Analysis

Goal: Calculate totals, averages, counts, and other statistics

-- Simple counts
SELECT generate_query('count total users');
SELECT generate_query('how many orders do we have');

-- Sums and totals
SELECT generate_query('calculate total revenue');
SELECT generate_query('sum all order amounts');

-- Averages
SELECT generate_query('average order value');
SELECT generate_query('mean user age');

-- Group by analysis
SELECT generate_query('count orders by status');
SELECT generate_query('total sales by month');
SELECT generate_query('average order value by customer');

4. Joined Data

Goal: Combine information from multiple tables

-- Simple joins
SELECT generate_query('show orders with customer names');
SELECT generate_query('list products with category information');

-- Complex relationships
SELECT generate_query('show customers with their total order amounts');
SELECT generate_query('find users who have never placed an order');
SELECT generate_query('list top customers by number of orders');

5. Ranking and Sorting

Goal: Order results and find top/bottom records

-- Top N queries
SELECT generate_query('top 10 customers by revenue');
SELECT generate_query('5 most expensive products');
SELECT generate_query('latest 20 orders');

-- Sorting
SELECT generate_query('show users ordered by age');
SELECT generate_query('list products by price descending');

Best Practices for Natural Language

Be Specific About What You Want

Good:

SELECT generate_query('show user names and email addresses for active users');

Less Good:

SELECT generate_query('show users');

Use Clear Column References

Good:

SELECT generate_query('find orders with amount greater than 1000');

Better:

SELECT generate_query('find orders where order_amount > 1000');

Specify Time Periods Clearly

Good:

SELECT generate_query('show orders from the last 30 days');
SELECT generate_query('find users created between January 1, 2024 and March 31, 2024');

Use Business Terms

The AI understands common business terminology:

-- These work well
SELECT generate_query('show quarterly revenue');
SELECT generate_query('find churned customers');
SELECT generate_query('calculate customer lifetime value');
SELECT generate_query('show conversion rates by month');

Advanced Usage Patterns

1. Incremental Query Building

Start simple and add complexity:

-- Step 1: Basic query
SELECT generate_query('show orders');

-- Step 2: Add filters
SELECT generate_query('show completed orders');

-- Step 3: Add aggregation
SELECT generate_query('count completed orders by customer');

-- Step 4: Add sorting
SELECT generate_query('count completed orders by customer, ordered by count desc');

2. Data Exploration Workflow

-- 1. Understand data structure
SELECT get_database_tables();

-- 2. Explore individual tables
SELECT generate_query('show sample data from users table');
SELECT generate_query('describe the structure of orders table');

-- 3. Understand relationships
SELECT generate_query('show how users and orders are connected');

-- 4. Analyze patterns
SELECT generate_query('show distribution of users by city');
SELECT generate_query('analyze order patterns by day of week');

3. Report Generation

-- Daily report
SELECT generate_query('daily sales summary for today');

-- Customer analysis
SELECT generate_query('customer segmentation by order frequency');

-- Product performance
SELECT generate_query('product performance metrics for last month');

Working with Results

1. Copying and Executing Queries

-- Method 1: Copy-paste the result
SELECT generate_query('show top customers');
-- Copy the returned SQL and execute it separately

-- Method 2: Store in a variable (if your client supports it)
\set query `SELECT generate_query('show top customers')`
:query

2. Query Refinement

If the generated query isn’t quite right:

-- Try rephrasing
SELECT generate_query('show customers with highest total orders');
-- vs
SELECT generate_query('rank customers by total order value');

-- Be more specific
SELECT generate_query('show customers with more than 5 orders and total value over $1000');

3. Validation

Always review generated queries before using in production:

-- Check the generated query
SELECT generate_query('delete old records');
-- Review before executing - the extension won't generate dangerous queries,
-- but always verify the logic matches your intent

Error Handling

Common Issues and Solutions

“No tables found”

-- Check your database has user tables
SELECT get_database_tables();

“Query too complex”

-- Break down into simpler parts
-- Instead of: "show customers with orders and products and categories"
-- Try: "show customers with their order details"

“Ambiguous column reference”

-- Be more specific about table/column names
-- Instead of: "show orders with customer info"
-- Try: "show orders with customer names from users table"

Performance Considerations

Query Limits

All queries automatically include LIMIT clauses for safety:

-- This automatically gets LIMIT 1000 (configurable)
SELECT generate_query('show all orders');

Large Datasets

For large tables, be specific about what you need:

-- Instead of: "show all user data"
-- Use: "show user summary with names and creation dates"

Indexes

Generated queries work best with proper indexes:

-- Create indexes on commonly filtered columns
CREATE INDEX idx_users_city ON users(city);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_orders_status ON orders(status);

Integration Patterns

1. Application Development

# Python example
cursor.execute("SELECT generate_query(%s)", ['show recent user activity'])
query = cursor.fetchone()[0]
cursor.execute(query)
results = cursor.fetchall()

2. Business Intelligence Tools

Many BI tools can use the extension to generate queries dynamically based on user input.

3. Data Documentation

Use the extension to generate example queries for documentation:

-- Generate examples for each table
SELECT generate_query('show sample data from ' || table_name)
FROM information_schema.tables
WHERE table_schema = 'public';

Next Steps