Classify Text with an LLM API

Most text classification tutorials reach for a machine learning model — train it, tune it, evaluate it. That pipeline made sense when the alternative was a rules engine. It doesn't make as much sense now that you can describe what you want in plain language and get a reliable result back in one API call.

This project uses Groq's free API tier and a small open-source model to classify product review sentiment and pull out the main theme of each review. You end up with a clean, structured dataset you can actually do analysis on. Everything runs inside Google Colab — no local setup, no environment configuration.

What you need: a Google account (for Colab) and a free Groq API key from console.groq.com. No paid accounts, no credit card, nothing to install locally.

Open a new Colab notebook at colab.research.google.com. In the first cell, install the dependencies:

!pip install groq pandas

Before writing any code that touches your API key, store it in Colab Secrets — not in a cell. Open the Secrets panel (the 🔑 icon in the left sidebar), add a new secret named GROQ_API_KEY, paste your key as the value, and enable notebook access. Then retrieve it in code like this:

from google.colab import userdata
from groq import Groq

client = Groq(api_key=userdata.get('GROQ_API_KEY'))

This keeps your key out of the notebook entirely. Never paste an API key directly into a cell — Colab notebooks can be shared, and credentials in cells travel with the file.

The core pattern is straightforward. You pass a review to the model with a prompt that specifies the output format you want, then parse the response. The model to use is llama3-8b-8192 — fast, accurate on classification, and well within Groq's free tier limits.

def tag_review(client, text):
    response = client.chat.completions.create(
        model="llama3-8b-8192",
        messages=[{
            "role": "user",
            "content": f"""Classify the following product review.
Respond with exactly two lines:
Sentiment: <positive|negative|mixed>
Theme: <3-5 word phrase summarizing the main point>

Review: {text}"""
        }]
    )
    lines = response.choices[0].message.content.strip().split("\n")
    sentiment = lines[0].replace("Sentiment:", "").strip().lower()
    theme = lines[1].replace("Theme:", "").strip()
    return sentiment, theme

The prompt format matters. Asking the model to respond with "exactly two lines" in a fixed structure makes parsing reliable. If you let the model respond freely, you'll get inconsistent formats and fragile parsing logic. Constrain the output format in the prompt, not in code.

To run this across a dataset, loop through your DataFrame and collect results. Print progress so you can see it working — a silent loop is hard to debug, and Groq's free tier can throttle under sustained load:

import pandas as pd
import time

# Build a small sample dataset
reviews = [
    "Absolutely love these! Sound is incredible and battery lasts all day.",
    "Stopped connecting to my phone after two weeks. Very disappointed.",
    "Great sound but the case feels cheap. Still happy overall.",
    "DO NOT BUY. Broke on first use and support was useless.",
    "Decent for the price. Nothing special but does the job."
]
df = pd.DataFrame({"review_text": reviews})

sentiments, themes = [], []
for i, row in df.iterrows():
    print(f"Tagging {i+1}/{len(df)}...")
    sentiment, theme = tag_review(client, row["review_text"])
    sentiments.append(sentiment)
    themes.append(theme)
    time.sleep(0.3)  # stay within free tier rate limits

df["sentiment"] = sentiments
df["theme"] = themes
print(df)

A note on rate limits: Groq's free tier caps requests per minute, not total usage. The time.sleep(0.3) call keeps you safely under the limit for small datasets. Check console.groq.com for current limits before scaling up — free tiers change.

Once tagged, the analysis is standard pandas. A value_counts() on the sentiment column gives you the distribution. Grouping by a product or category column and aggregating sentiment tells you how different segments are perceived. The theme column is messier — the model won't always produce identical phrasing for similar ideas — but a quick scan reveals the patterns worth investigating further.

That's the part the model can't do for you: deciding which patterns matter to the business, and what to do about them. The tagging is the easy bit.

Want to keep building? Browse our full resource library →