Score and Prioritize Leads with an LLM

Sales reps write notes. Lots of them. "Interested but budget unclear." "Pain is there, timing is the issue." "Strong fit — get to procurement." These notes contain useful qualification signals, but reading all of them to prioritize a pipeline is slow and inconsistent. An LLM can read them faster and apply the same criteria every time.

This project builds a lead scoring pipeline using the Groq API (free tier). You feed it CRM-style notes, it returns a qualification score and the key signal that drove it.

Install and authenticate. Your API key goes in Colab Secrets — not in a cell:

!pip install groq pandas

# Colab: open Secrets panel (🔑 left sidebar)
# Add secret: GROQ_API_KEY — paste your key — enable notebook access
from google.colab import userdata
from groq import Groq
import pandas as pd

client = Groq(api_key=userdata.get('GROQ_API_KEY'))

Build a dataset of sales notes. These represent the kind of raw text you'd export from a CRM:

leads = [
    {"lead": "Acme Corp",    "notes": "VP signed off on budget. Needs to go through procurement but timeline is Q2. Strong product fit confirmed."},
    {"lead": "Globex Ltd",   "notes": "Interested but no budget allocated yet. Wants to revisit in 6 months. Low urgency."},
    {"lead": "Initech",      "notes": "Pain clearly articulated. Current solution is failing them. Decision maker in the room. Wants a proposal by Friday."},
    {"lead": "Umbrella Inc", "notes": "Exploratory call. Not sure they need us. Competitor is entrenched. No clear champion."},
    {"lead": "Stark LLC",    "notes": "Small team, limited budget. Interested in the starter plan. May upgrade later if it works."},
]
df = pd.DataFrame(leads)

Write the scoring function. The prompt defines your qualification criteria explicitly — the model applies them consistently across every row:

def score_lead(client, notes):
    response = client.chat.completions.create(
        model="llama3-8b-8192",
        messages=[{
            "role": "user",
            "content": f"""Score this sales lead based on their CRM notes.
Criteria: budget confirmed, clear pain, decision maker involved, defined timeline.
Respond with exactly two lines:
Score: <High|Medium|Low>
Signal: <one sentence: the key factor driving this score>

Notes: {notes}"""
        }]
    )
    lines = response.choices[0].message.content.strip().split("\n")
    score  = lines[0].replace("Score:", "").strip()
    signal = lines[1].replace("Signal:", "").strip()
    return score, signal

Run across the dataset and build the ranked output:

import time

scores, signals = [], []
for i, row in df.iterrows():
    print(f"Scoring {row['lead']}...")
    score, signal = score_lead(client, row["notes"])
    scores.append(score)
    signals.append(signal)
    time.sleep(0.3)

df["score"]  = scores
df["signal"] = signals

order = {"High": 0, "Medium": 1, "Low": 2}
df["rank"] = df["score"].map(order)
print(df[["lead","score","signal"]].sort_values("rank").drop(columns="rank").to_string(index=False))

The output is a ranked pipeline with a one-sentence rationale for each score. The model won't always agree with your judgment — but it will be consistent, and consistency is what makes this useful at scale. Treat the scores as a first pass, not a final answer. A human still owns the call.

The model is applying your criteria. Make sure your criteria are actually right before you trust the output.

Want to go deeper? Browse our full resource library →