Flag Outliers in Any Dataset with Z-Scores

Outliers cause two problems. The first is analytical: a single extreme value can skew an average badly enough to make a trend invisible. The second is practical: someone in a meeting will spot the weird row in your chart and spend twenty minutes asking about it instead of looking at the insight you spent an hour building.

A z-score measures how many standard deviations a value is from the mean of its group. A z-score above 2.5 or below -2.5 is a reasonable flag for "this is unusual." The math is simple — the value is in knowing when to apply it and how to apply it per-group, not globally.

Step 1 — Build a dataset with outliers deliberately embedded.

import pandas as pd
import numpy as np

np.random.seed(7)

# Generate realistic order data with a few outliers injected
n = 60
df = pd.DataFrame({
    'order_id': range(1, n + 1),
    'rep':      np.random.choice(['Alice', 'Bob', 'Carol', 'Dave'], n),
    'region':   np.random.choice(['East', 'West', 'North'], n),
    'amount':   np.random.normal(loc=1200, scale=250, n).round(2)
})

# Inject obvious outliers
df.loc[14, 'amount'] = 8500   # suspiciously high
df.loc[31, 'amount'] = -200   # negative — likely a data error
df.loc[47, 'amount'] = 12.50  # near-zero, probably a test order

print(df.describe())

Step 2 — Compute global z-scores. This catches values unusual relative to the entire dataset:

# Z-score: (value - mean) / standard deviation
df['z_score'] = (df['amount'] - df['amount'].mean()) / df['amount'].std()
df['z_score'] = df['z_score'].round(2)

# Flag anything beyond 2.5 standard deviations
df['is_outlier'] = df['z_score'].abs() > 2.5

print(f"\nOutliers flagged: {df['is_outlier'].sum()}")
print(df[df['is_outlier']][['order_id', 'rep', 'region', 'amount', 'z_score']])

Step 3 — Per-group z-scores. Global detection misses outliers that are normal globally but extreme within a specific rep or region. Group-level z-scores catch those:

# Compute z-scores within each rep's own distribution
def zscore(series):
    return (series - series.mean()) / series.std()

df['z_score_by_rep'] = df.groupby('rep')['amount'].transform(zscore).round(2)
df['outlier_by_rep'] = df['z_score_by_rep'].abs() > 2.0

print("\nPer-rep outliers:")
print(df[df['outlier_by_rep']][['order_id', 'rep', 'amount', 'z_score_by_rep']])

The threshold matters. 2.5 is a reasonable default — strict enough to flag real anomalies, lenient enough not to flood you with false positives. In a smaller dataset, 2.0 is often more useful. In a very large one, you might go to 3.0.

Step 4 — Produce a clean flagged report.

# Summary: all flagged rows, either method
flagged = df[df['is_outlier'] | df['outlier_by_rep']].copy()
flagged = flagged[['order_id', 'rep', 'region', 'amount', 'z_score', 'z_score_by_rep']].sort_values('amount', ascending=False)

print(f"\n{len(flagged)} rows flagged for review:")
print(flagged.to_string(index=False))

# Export to CSV for review
flagged.to_csv('/tmp/flagged_orders.csv', index=False)
print("Saved to /tmp/flagged_orders.csv")

The negative amount and near-zero order will always appear in a global z-score flag. The per-rep flag will surface reps whose individual deal sizes are inconsistent — useful for catching pricing errors or unusually large discounts that blend into the overall average.

Z-scores are a starting point, not a verdict. Every flagged row is worth a look — some will be data errors, some will be legitimate edge cases, and occasionally one will be fraud. The point is to look before someone else finds it for you.

Want to go deeper? Browse our full resource library →