Free Projects — Real Data Skills

Sales Performance Analysis

Groupby, aggregation, and ranked output

15 min · Beginner · Python

Build a clean sales summary from raw transaction data using pandas — the kind of output you'd produce before a weekly review meeting.

Clean a Messy Dataset

Nulls, duplicates, types, and dates

20 min · Beginner · Python

A practical reference for the most common cleaning operations in pandas — the work that happens before every real analysis.

Classify Text with an LLM API

Sentiment tagging on product reviews

30 min · Beginner · Python / Groq

Call a free LLM API to classify review sentiment and extract themes — turning raw text into a structured, tagged dataset.

Generate SQL from Plain English

Natural language to query with OpenAI

30 min · Intermediate · Python / OpenAI

Translate plain-English questions into runnable SQL using the OpenAI API — a practical tool for analyst workflows.

Score and Prioritize Leads

LLM-powered pipeline qualification

45 min · Intermediate · Python / Groq

Extract qualification signals from raw CRM notes and rank your pipeline automatically using a free LLM API.

Build a Live Data Store

PostgreSQL persistence with Supabase

60 min · Advanced · Python / Supabase

Set up a real PostgreSQL database on Supabase's free tier, write data from Python, and query it back with filters.

Query a Database with SQL

SELECT, WHERE, ORDER BY, and LIMIT

20 min · Beginner · Python / SQLite

Write real SQL against a live in-memory database inside Colab — no credentials, no cloud account, just Python's built-in sqlite3.

SQL JOINs and Aggregations

Multi-table queries with GROUP BY

35 min · Intermediate · Python / SQLite

Join a three-table schema and aggregate the results by tier, category, and product — the operations behind most real business reporting.

Window Functions and CTEs

Rank, compare, and compute running totals

50 min · Advanced · Python / SQLite

Use RANK(), LAG(), and SUM() OVER to analyze trends across periods — plus CTEs to keep complex queries readable.

Query CSV Files with DuckDB

SQL directly on flat files — no load required

25 min · Intermediate · Python / DuckDB

Run SQL queries against a CSV file without loading it into memory first — then add window functions with QUALIFY for ranking in one pass.

Auto-Generate a Data Dictionary

Column docs from schema info via Claude API

30 min · Intermediate · Python / Claude API

Feed Claude your column names and sample values — get a full data dictionary back in seconds, ready to publish or hand off.

Build a Cohort Retention Table

Month-over-month retention matrix from transactions

45 min · Intermediate · Python / pandas

Tag customers with their cohort month, compute months-since-acquisition, and pivot into the retention matrix every SaaS analyst needs to know.

Walk a Hierarchy with a Recursive CTE

Org charts and trees in a single SQL query

35 min · Advanced · Python / SQLite

Traverse any parent-child structure — org charts, category trees, bill of materials — using the recursive CTE pattern most SQL tutorials skip.

Ask Your Dataset Questions in Plain English

EDA via the Gemini API — no groupby required

20 min · Beginner · Python / Gemini

Load a DataFrame, send it to Gemini, and ask anything — get a first-pass analysis in seconds instead of writing queries from scratch.

Flag Outliers with Z-Scores

Global and per-group anomaly detection

30 min · Intermediate · Python / pandas

Compute z-scores in five lines and flag statistical outliers globally and per-group — find the weird rows before your stakeholders do.