The interquartile range (IQR) measures the spread of the middle 50% of a distribution — the distance between the first quartile (Q1) and the third quartile (Q3). Combined with Tukey’s 1.5×IQR fence rule, it provides a robust method for identifying outliers that is far less sensitive to extreme values than mean ± standard deviation.
MySQL has no built-in quartile or outlier function. Leveraging the VillageSQL Extension Bundle (VEB)
, STATS_IQR(col) computes the full five-number summary and fence values in a single aggregate pass, returning them as JSON. VillageSQL
is a drop-in MySQL 8.4 replacement — no Python or R required.
Returns JSON: { "q1": ..., "median": ..., "q3": ..., "iqr": ...,
"lower_fence": ..., "upper_fence": ... }
| Field | Description |
|---|---|
q1, q3 |
Tukey’s hinges (exclusive-median method) |
median |
Middle value of the sorted group |
iqr |
Q3 − Q1 |
lower_fence |
Q1 − 1.5 × IQR (Tukey inner lower fence) |
upper_fence |
Q3 + 1.5 × IQR (Tukey inner upper fence) |
Values outside the fences are mild Tukey outliers. Use CAST(STATS_IQR(...) AS CHAR) to read results in the mysql CLI, and CAST(col AS DOUBLE) on integer columns to preserve decimal precision in JSON output.
Parkrun Finish Times by Age Group
The first example computes IQR statistics per age group for a community 5 km run, then joins the fence values back to individual finish times to count outliers. IQR grows with age — older runners span a wider range of fitness — and each group contains one walker or injury finish that sits above the upper fence.
WITH iqr_stats AS (
SELECT
age_group,
COUNT(*) AS n,
CAST(STATS_IQR(finish_time) AS CHAR(300)) AS json
FROM race_results
GROUP BY age_group
),
fences AS (
SELECT
age_group,
n,
ROUND(JSON_EXTRACT(json, '$.q1'), 2) AS q1,
ROUND(JSON_EXTRACT(json, '$.median'), 2) AS median_time,
ROUND(JSON_EXTRACT(json, '$.q3'), 2) AS q3,
ROUND(JSON_EXTRACT(json, '$.iqr'), 2) AS iqr,
JSON_EXTRACT(json, '$.lower_fence') AS lower_fence,
JSON_EXTRACT(json, '$.upper_fence') AS upper_fence
FROM iqr_stats
)
SELECT
f.age_group,
f.n,
CONCAT(f.q1, '–', f.q3, ' min') AS q1_q3,
CONCAT(f.median_time, ' min') AS median,
CONCAT(f.iqr, ' min') AS iqr,
CONCAT(ROUND(f.upper_fence, 2), ' min') AS upper_fence,
SUM(CASE
WHEN r.finish_time > f.upper_fence
OR r.finish_time < f.lower_fence THEN 1 ELSE 0
END) AS outlier_count
FROM race_results r
JOIN fences f USING (age_group)
GROUP BY f.age_group, f.n, f.q1, f.q3, f.median_time, f.iqr, f.lower_fence, f.upper_fence
ORDER BY f.age_group;
Results
age_group n q1_q3 median iqr upper_fence outlier_count
18-29 12 19.7–23 min 21.75 min 3.3 min 27.95 min 1
30-44 12 22.75–26.4 min 24.5 min 3.65 min 31.88 min 1
45-59 12 24.85–30.25 min 27.15 min 5.4 min 38.35 min 1
60+ 12 29.25–38 min 32.75 min 8.75 min 51.12 min 1
IQR grows steadily from 3.3 minutes (18–29) to 8.75 minutes (60+), confirming that older age groups have wider performance spread. Each group has exactly one outlier above the upper fence — finish times of 35.8, 43.0, 55.0, and 67.0 minutes respectively — consistent with a walker or injury finish in an otherwise competitive pack. The GROUP BY → join-back pattern is the natural structure for per-group outlier counts.
E-Commerce Order Values — Flag Individual Outlier Rows
The second example applies STATS_IQR to a single group of 25 orders and cross-joins the fence values to every row, tagging each order as normal, low outlier, or high outlier. Normal orders cluster between $45 and $90; a $2.00 test/refund entry and a $200.00 bulk purchase both fall outside the Tukey fences.
WITH fences AS (
SELECT
JSON_EXTRACT(json, '$.q1') AS q1,
JSON_EXTRACT(json, '$.median') AS median_value,
JSON_EXTRACT(json, '$.q3') AS q3,
JSON_EXTRACT(json, '$.lower_fence') AS lower_fence,
JSON_EXTRACT(json, '$.upper_fence') AS upper_fence
FROM (
SELECT CAST(STATS_IQR(order_value) AS CHAR(300)) AS json
FROM orders
) s
)
SELECT
o.order_id,
o.customer,
o.order_value,
CASE
WHEN o.order_value < f.lower_fence THEN 'LOW OUTLIER'
WHEN o.order_value > f.upper_fence THEN 'HIGH OUTLIER'
ELSE 'normal'
END AS flag
FROM orders o, fences f
ORDER BY o.order_value;
Results
order_id customer order_value flag
1 Chen 2 LOW OUTLIER
2 Walsh 45 normal
...
24 Bauer 90 normal
25 Costa 200 HIGH OUTLIER
With n = 25, Q1 = $55, Q3 = $81, and IQR = $26, the fences are $16.00 (lower) and $120.00 (upper). Order #1 ($2.00) is flagged as a low outlier — likely a test transaction or refund. Order #25 ($200.00) is a high outlier — a bulk purchase well outside the normal $45–$90 range. The cross-join pattern works for any single-group outlier scan without needing GROUP BY.
Manufacturing — Cylinder Diameter by Machine Line
The third example combines IQR per machine with engineering tolerance checks. Target diameter is 50.0 mm with a spec of ±2.0 mm (48.0–52.0 mm). Three machines show different process characteristics: Machine A is well-calibrated, Machine B has drifted high, and Machine C has worn tooling with wide spread and multiple violations.
WITH iqr_stats AS (
SELECT
machine,
COUNT(*) AS n,
CAST(STATS_IQR(diameter) AS CHAR(300)) AS json
FROM components
GROUP BY machine
),
fences AS (
SELECT
machine,
n,
ROUND(JSON_EXTRACT(json, '$.q1'), 2) AS q1,
ROUND(JSON_EXTRACT(json, '$.median'), 2) AS median_mm,
ROUND(JSON_EXTRACT(json, '$.q3'), 2) AS q3,
ROUND(JSON_EXTRACT(json, '$.iqr'), 2) AS iqr_mm,
JSON_EXTRACT(json, '$.lower_fence') AS lower_fence,
JSON_EXTRACT(json, '$.upper_fence') AS upper_fence
FROM iqr_stats
)
SELECT
f.machine,
f.median_mm,
f.q1,
f.q3,
f.iqr_mm,
SUM(CASE
WHEN c.diameter < 48.0 OR c.diameter > 52.0 THEN 1 ELSE 0
END) AS spec_violations,
SUM(CASE
WHEN c.diameter < f.lower_fence
OR c.diameter > f.upper_fence THEN 1 ELSE 0
END) AS tukey_outliers
FROM components c
JOIN fences f USING (machine)
GROUP BY f.machine, f.median_mm, f.q1, f.q3, f.iqr_mm, f.lower_fence, f.upper_fence
ORDER BY f.machine;
Results
machine median_mm q1 q3 iqr_mm spec_violations tukey_outliers
A 50.0 49.7 50.3 0.6 0 0
B 50.6 50.3 50.8 0.5 0 0
C 50.2 49.0 52.0 3.0 4 1
Machine A is tight and centred on target (IQR = 0.6 mm, median = 50.0 mm) with no violations. Machine B is equally tight (IQR = 0.5 mm) but the median has drifted 0.6 mm above target to 50.6 mm — a calibration issue visible in the median but not yet causing spec failures. Machine C tells a different story: IQR of 3.0 mm is five times wider than Machine A, with four parts outside the ±2.0 mm engineering spec and one statistical outlier at 57.5 mm. IQR quantifies process consistency; combining it with domain-specific tolerance checks gives both statistical and engineering views in one query.
Three Usage Patterns
| Pattern | When to use | Example |
|---|---|---|
GROUP BY → join fences back |
Per-group summary and outlier counts | Parkrun age groups, machine lines |
Single STATS_IQR → cross-join |
Flag individual rows in one group | Order value review |
GROUP BY + domain thresholds |
Statistical outliers plus engineering specs | Manufacturing QC |
All three patterns extract fence values from the JSON returned by STATS_IQR and apply them to raw rows — the function computes the distribution summary once per group; the join does the row-level classification.
Summary
All three examples use STATS_IQR(), which returns a JSON object with quartiles, median, IQR, and Tukey fence values computed using the exclusive-median (Tukey’s hinges) method.
| Example | Group | Median | IQR | Outliers |
|---|---|---|---|---|
| Parkrun 18–29 | 12 runners | 21.75 min | 3.3 min | 1 (35.8 min walker) |
| Parkrun 30–44 | 12 runners | 24.5 min | 3.65 min | 1 (43.0 min walker) |
| Parkrun 45–59 | 12 runners | 27.15 min | 5.4 min | 1 (55.0 min walker) |
| Parkrun 60+ | 12 runners | 32.75 min | 8.75 min | 1 (67.0 min walker) |
| E-commerce orders | 25 orders | $68 | $26 | 2 ($2 low, $200 high) |
| Machine A | 15 parts | 50.0 mm | 0.6 mm | 0 |
| Machine B | 15 parts | 50.6 mm | 0.5 mm | 0 (drifting high) |
| Machine C | 15 parts | 50.2 mm | 3.0 mm | 4 spec + 1 Tukey |
The extension is available at github.com/ronaldbradford/vsql-statistics and installs with a single statement:
INSTALL EXTENSION vsql_statistics;