SQL is the language databases speak. Whether you're pulling data from a warehouse, exploring a company dataset, or building a reporting query, you're writing SQL. The best way to learn it is to run queries against actual data and see what comes back. This project gives you that — a working database in Colab with no setup at all.

Python ships with sqlite3 in its standard library. No !pip install needed. You create a database, load data into it, and query it — all inside a single notebook.


Step 1 — Create the database and load data. An in-memory database (':memory:') exists only for the duration of your session, which is exactly what you want for practice:

import sqlite3
import pandas as pd

# Connect to an in-memory database — no file, no cleanup needed
conn = sqlite3.connect(':memory:')

conn.executescript("""
CREATE TABLE orders (
  order_id   INTEGER PRIMARY KEY,
  rep        TEXT,
  region     TEXT,
  product    TEXT,
  amount     REAL,
  order_date TEXT
);

INSERT INTO orders VALUES
  (1,  'Alice', 'East',  'Pro',     1200, '2026-01-05'),
  (2,  'Bob',   'West',  'Starter',  450, '2026-01-08'),
  (3,  'Alice', 'East',  'Pro',     1800, '2026-01-15'),
  (4,  'Carol', 'North', 'Pro',     1500, '2026-01-20'),
  (5,  'Bob',   'West',  'Starter',  380, '2026-02-02'),
  (6,  'Carol', 'North', 'Pro',     1100, '2026-02-10'),
  (7,  'Alice', 'East',  'Starter',  420, '2026-02-18'),
  (8,  'Bob',   'West',  'Pro',     1650, '2026-03-01'),
  (9,  'Carol', 'North', 'Starter',  390, '2026-03-12'),
  (10, 'Alice', 'East',  'Pro',     1350, '2026-03-22');
""")
conn.commit()
print("Database ready.")

Step 2 — Run your first queries. pd.read_sql_query() executes SQL and returns a DataFrame — the cleanest way to work with query results in Python:

# All rows, all columns
df = pd.read_sql_query("SELECT * FROM orders", conn)
print(df)
# Filter: only Pro orders, sorted by amount descending
df = pd.read_sql_query("""
  SELECT rep, region, product, amount
  FROM orders
  WHERE product = 'Pro'
  ORDER BY amount DESC
""", conn)
print(df)
# LIMIT: top 3 orders by amount
df = pd.read_sql_query("""
  SELECT rep, amount, order_date
  FROM orders
  ORDER BY amount DESC
  LIMIT 3
""", conn)
print(df)

Notice the pattern: SELECT what columns, FROM which table, WHERE to filter rows, ORDER BY to sort, LIMIT to cap results. Most queries you write in practice are variations on exactly this structure.


Step 3 — Aggregate. GROUP BY lets you summarize data by category — the operation behind most business reporting:

# Total revenue and order count by rep
df = pd.read_sql_query("""
  SELECT
    rep,
    COUNT(order_id)  AS deals,
    SUM(amount)      AS total_revenue,
    ROUND(AVG(amount), 2) AS avg_deal
  FROM orders
  GROUP BY rep
  ORDER BY total_revenue DESC
""", conn)
print(df)
# Filter aggregated results with HAVING (not WHERE — that runs before grouping)
df = pd.read_sql_query("""
  SELECT region, SUM(amount) AS revenue
  FROM orders
  GROUP BY region
  HAVING SUM(amount) > 2000
  ORDER BY revenue DESC
""", conn)
print(df)
conn.close()

WHERE filters rows before grouping. HAVING filters groups after. That distinction trips up most beginners — and gets asked in interviews constantly.

The queries here are small but the patterns are the same ones you'd use on a table with ten million rows. SQL doesn't care about scale — the database engine does. That's the point.


Want to go deeper? Browse our full resource library →