Non-technical stakeholders have questions that require data. Getting those answers usually means a ticket, a wait, and a query written by someone else. A natural language to SQL tool cuts that loop short. You describe a table, ask a question in plain English, and get back SQL you can validate and run.
This project builds that tool in a Colab notebook using the OpenAI API. Note: OpenAI requires a paid account with API credits. For a dataset this size the cost is negligible — a few cents — but it is not free. If you want a free alternative, see the Groq-based projects in this series.
Open a new Colab notebook. Install the dependency and set up your API key using Colab Secrets — never paste credentials into a cell:
!pip install openai
# In Colab: open the Secrets panel (🔑 icon, left sidebar)
# Add secret name: OPENAI_API_KEY — paste your key — enable notebook access
from google.colab import userdata
from openai import OpenAI
client = OpenAI(api_key=userdata.get('OPENAI_API_KEY'))
The key to reliable output is giving the model a precise schema description. The more explicit you are about table names, column names, and types, the more accurate the generated SQL will be. Build a schema string and a generation function:
schema = """
Table: orders
Columns:
order_id INTEGER -- unique order identifier
customer_id INTEGER -- foreign key to customers table
product TEXT -- product name
quantity INTEGER -- units ordered
revenue FLOAT -- total revenue for this order
order_date DATE -- date the order was placed
region TEXT -- sales region (North, South, East, West)
"""
def generate_sql(question, schema):
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{
"role": "user",
"content": f"""Given this database schema:
{schema}
Write a SQL query to answer this question: {question}
Return only the SQL, no explanation."""
}]
)
return response.choices[0].message.content.strip()
Using gpt-4o-mini keeps costs minimal while producing reliable SQL for standard analytical queries. Always validate generated SQL before running it against a production database — inspect the output, check the logic, then execute.
Test it with a few representative questions:
questions = [
"What is the total revenue by region, sorted highest to lowest?",
"Which 5 products have the highest average order quantity?",
"How many orders were placed each month in the last year?",
]
for q in questions:
print(f"Q: {q}")
print(generate_sql(q, schema))
print()
The output won't always be perfect. Models miss implicit filters, use the wrong aggregation, or misread ambiguous column names. That's expected — your job is to review the query and catch what the model got wrong before it reaches your data. The model handles the syntax; you handle the judgment.
The value here isn't replacing SQL knowledge. It's compressing the time between forming a question and having a query worth reviewing.
Want to go deeper? Browse our full resource library →