The one-sample Z-test determines whether a sample mean differs significantly from a known population mean when the population standard deviation is also known. It is the appropriate test when the population parameters are established — quality control benchmarks, national averages, long-run process measurements — and you want to evaluate whether a new sample is consistent with them. The null hypothesis is that the sample was drawn from the known population.
Leveraging the VillageSQL Extension Bundle (VEB)
, you can run a Z-test directly in SQL with STATS_ZTEST(value, mu, sigma). Pass mu (the known population mean) and sigma (the known population standard deviation) as constants on every row; the function accumulates the sample and returns JSON containing the Z-statistic, the upper one-tail p-value, and the two-tail p-value. VillageSQL
is a drop-in MySQL 8.4 replacement.
Returns JSON: { "z": ..., "p_one_tail": ..., "p_two_tail": ... }
For directional (one-tailed) tests:
- H₁: μ > μ₀ — use
p_one_taildirectly - H₁: μ < μ₀ — use
1 − p_one_tail
Soft Drink Fill Quality Control
A production line targets 500 ml per bottle with a long-run process standard deviation of σ = 8 ml. A sample of 25 bottles is taken from a single run. The question is whether the line is filling correctly (two-tailed) and specifically whether it is systematically underfilling (lower one-tailed).
-- Sample descriptives
SELECT COUNT(*) AS n, ROUND(AVG(volume_ml), 2) AS sample_mean
FROM bottle_fill;
n sample_mean
25 495.60
The sample mean is 4.4 ml below the 500 ml target. Whether that gap is statistically meaningful is what the Z-test determines.
WITH ztest AS (
SELECT CAST(STATS_ZTEST(volume_ml, 500.0, 8.0) AS CHAR(300)) AS json
FROM bottle_fill
)
SELECT
ROUND(JSON_EXTRACT(json, '$.z'), 6) AS z_statistic,
ROUND(JSON_EXTRACT(json, '$.p_one_tail'), 6) AS p_one_tail,
ROUND(1 - JSON_EXTRACT(json, '$.p_one_tail'), 6) AS p_lower_tail,
ROUND(JSON_EXTRACT(json, '$.p_two_tail'), 6) AS p_two_tail,
IF(JSON_EXTRACT(json, '$.p_two_tail') < 0.05,
'Reject H₀ — fill volume differs significantly from 500 ml target',
'Retain H₀ — no significant deviation from 500 ml target') AS two_tail_conclusion,
IF(1 - JSON_EXTRACT(json, '$.p_one_tail') < 0.05,
'Reject H₀ — significant underfill detected (μ < 500)',
'Retain H₀ — no significant underfill') AS lower_tail_conclusion
FROM ztest\G
Results
*************************** 1. row ***************************
z_statistic: -2.75
p_one_tail: 0.99702
p_lower_tail: 0.00298
p_two_tail: 0.00596
two_tail_conclusion: Reject H₀ — fill volume differs significantly from 500 ml target
lower_tail_conclusion: Reject H₀ — significant underfill detected (μ < 500)
Z = −2.75, p (two-tail) = 0.006. We reject the null both ways: the fill volume differs significantly from target, and specifically the line is underfilling. A mean shortfall of 4.4 ml across 25 bottles is not random variation — the process needs adjustment.
Note the directionality: p_one_tail is the upper-tail probability P(Z > z), which at −2.75 is 0.997 — almost certain the true mean is not above 500 ml. The lower-tail probability 1 − p_one_tail = 0.003 is what we use to test for underfill.
School Reading Scores vs National Benchmark
The second example shows STATS_ZTEST used with GROUP BY to test multiple groups against the same population benchmark in a single query — a natural pattern for comparing sites, branches, or cohorts against a known standard.
The national reading benchmark is μ = 72, σ = 11. Three schools each have 30 students.
-- Sample descriptives per school
SELECT school, COUNT(*) AS n, ROUND(AVG(score), 2) AS sample_mean
FROM reading_scores
GROUP BY school
ORDER BY school;
school n sample_mean
Central 30 72.50
Highland 30 64.17
Riverside 30 77.30
WITH ztest AS (
SELECT
school,
ROUND(AVG(score), 2) AS sample_mean,
CAST(STATS_ZTEST(score, 72.0, 11.0) AS CHAR(300)) AS json
FROM reading_scores
GROUP BY school
)
SELECT
school,
sample_mean,
ROUND(JSON_EXTRACT(json, '$.z'), 4) AS z_statistic,
ROUND(JSON_EXTRACT(json, '$.p_two_tail'), 4) AS p_two_tail,
CASE
WHEN JSON_EXTRACT(json, '$.p_two_tail') < 0.05
AND JSON_EXTRACT(json, '$.z') > 0
THEN 'Significantly above national average'
WHEN JSON_EXTRACT(json, '$.p_two_tail') < 0.05
AND JSON_EXTRACT(json, '$.z') < 0
THEN 'Significantly below national average'
ELSE 'No significant difference from national average'
END AS conclusion
FROM ztest
ORDER BY school\G
Results
*************************** 1. row ***************************
school: Central
sample_mean: 72.5
z_statistic: 0.2490
p_two_tail: 0.8034
conclusion: No significant difference from national average
*************************** 2. row ***************************
school: Highland
sample_mean: 64.17
z_statistic: -3.9004
p_two_tail: 0.0001
conclusion: Significantly below national average
*************************** 3. row ***************************
school: Riverside
sample_mean: 77.3
z_statistic: 2.6390
p_two_tail: 0.0083
conclusion: Significantly above national average
Central (Z = 0.25, p = 0.80) is indistinguishable from the national average — its 0.5-point gap above 72 is well within chance variation. Riverside (Z = 2.64, p = 0.008) performs significantly above average. Highland (Z = −3.90, p < 0.001) is significantly below — a 7.8-point deficit that is extremely unlikely to be random. The GROUP BY pattern means all three tests run in a single pass over the table.
Battery Lifetime Quality Control
Battery / IQ Z-Test SQL Example
These two examples, adapted from Statistics By Jim , reinforce the directionality patterns for left-tailed and two-tailed tests.
A manufacturer claims their batteries last μ₀ = 500 hours with a known process σ = 40 hours. A QA engineer samples 30 batteries from a suspect batch and tests whether it falls short of specification — a left-tailed test.
WITH ztest AS (
SELECT CAST(STATS_ZTEST(hours, 500.0, 40.0) AS CHAR(300)) AS json
FROM battery_lifetimes
),
result AS (
SELECT COUNT(*) AS n, ROUND(AVG(hours), 4) AS sample_mean,
500.0 AS claimed_mean, 40.0 AS known_sigma
FROM battery_lifetimes
)
SELECT
r.n, r.sample_mean, r.claimed_mean, r.known_sigma,
ROUND(JSON_EXTRACT(z.json, '$.z'), 4) AS z_statistic,
ROUND(1 - JSON_EXTRACT(z.json, '$.p_one_tail'), 6) AS p_left_tail,
ROUND(JSON_EXTRACT(z.json, '$.p_two_tail'), 6) AS p_two_tail,
IF(1 - JSON_EXTRACT(z.json, '$.p_one_tail') < 0.05,
'Reject H₀ — batch lifetime IS significantly below 500 hrs',
'Retain H₀ — no significant evidence batch falls short') AS conclusion
FROM result r, ztest z\G
Results
*************************** 1. row ***************************
n: 30
sample_mean: 480.3333
claimed_mean: 500.0
known_sigma: 40.0
z_statistic: -2.693
p_left_tail: 0.003541
p_two_tail: 0.007082
conclusion: Reject H₀ — batch lifetime IS significantly below 500 hrs
Z = −2.69, left-tail p = 0.0035. The batch mean of 480.3 hours is significantly below the claimed 500-hour specification. The p_one_tail returned by the function is the upper-tail probability P(Z > z) = 0.997 — near-certain the mean is not above 500. Subtracting from 1 gives the left-tail p-value used for the underfill test.
IQ Enrichment Programme
A researcher tests 40 graduates from an enrichment programme against the standardised population mean of μ₀ = 100, σ = 15. The research question is whether the programme changes IQ in either direction — a two-tailed test.
WITH ztest AS (
SELECT CAST(STATS_ZTEST(iq, 100.0, 15.0) AS CHAR(300)) AS json
FROM iq_scores
),
result AS (
SELECT COUNT(*) AS n, ROUND(AVG(iq), 4) AS sample_mean,
100.0 AS population_mean, 15.0 AS known_sigma
FROM iq_scores
)
SELECT
r.n, r.sample_mean, r.population_mean, r.known_sigma,
ROUND(JSON_EXTRACT(z.json, '$.z'), 4) AS z_statistic,
ROUND(JSON_EXTRACT(z.json, '$.p_one_tail'), 6) AS p_one_tail_upper,
ROUND(JSON_EXTRACT(z.json, '$.p_two_tail'), 6) AS p_two_tail,
IF(JSON_EXTRACT(z.json, '$.p_two_tail') < 0.05,
'Reject H₀ — programme graduates score significantly differently from μ=100',
'Retain H₀ — no significant difference from population mean') AS conclusion
FROM result r, ztest z\G
Results
*************************** 1. row ***************************
n: 40
sample_mean: 106.1
population_mean: 100.0
known_sigma: 15.0
z_statistic: 2.572
p_one_tail_upper: 0.005056
p_two_tail: 0.010112
conclusion: Reject H₀ — programme graduates score significantly differently from μ=100
Z = 2.57, p (two-tail) = 0.010. The programme graduates score 6.1 points above the population mean — a statistically significant difference at α = 0.05. Since the sample mean is above μ₀ the upper-tail probability p_one_tail is already small (0.005); for a two-tailed test p_two_tail is the correct value to compare against α.
Smartphone Battery Life
GeeksForGeeks Z-Test SQL Example
These two examples are adapted from GeeksForGeeks and cover a large-sample one-sample test and a two-sample z-test.
A manufacturer claims their smartphones deliver μ₀ = 12.0 hours battery life, with a known production σ = 0.5 hours. A consumer group tests 100 devices and finds a mean of 11.8 hours — a two-tailed test of whether the claim holds.
WITH ztest AS (
SELECT CAST(STATS_ZTEST(battery_hr, 12.0, 0.5) AS CHAR(300)) AS json
FROM smartphone_battery
),
summary AS (
SELECT COUNT(*) AS n, ROUND(AVG(battery_hr), 4) AS sample_mean
FROM smartphone_battery
)
SELECT
s.n, s.sample_mean,
12.0 AS claimed_mean_mu0, 0.5 AS known_sigma,
ROUND(JSON_EXTRACT(z.json, '$.z'), 4) AS z_statistic,
ROUND(JSON_EXTRACT(z.json, '$.p_two_tail'), 6) AS p_two_tail,
IF(JSON_EXTRACT(z.json, '$.p_two_tail') < 0.05,
'Reject H₀ — actual battery life differs from 12 hrs',
'Retain H₀ — no significant difference from 12 hrs') AS conclusion
FROM summary s, ztest z\G
Results
*************************** 1. row ***************************
n: 100
sample_mean: 11.8
claimed_mean_mu0: 12.0
known_sigma: 0.5
z_statistic: -4
p_two_tail: 0.000063
conclusion: Reject H₀ — actual battery life differs from 12 hrs
Z = −4.0, p (two-tail) < 0.0001. With 100 observations a 0.2-hour shortfall against a σ of 0.5 hours produces a very strong signal — the manufacturer’s claim is not supported by the test data.
Online vs Offline Class Scores (Two-Sample Z-Test)
STATS_ZTEST is a one-sample aggregate. When both population standard deviations are known, a two-sample z-test can be derived analytically in SQL from GROUP BY summary statistics using the formula z = (x̄₁ − x̄₂) / √(σ₁²/n₁ + σ₂²/n₂).
Group A (offline classroom): n = 50, σ₁ = 10 (known). Group B (online course): n = 60, σ₂ = 12 (known). The question is whether teaching mode affects scores — a two-tailed test.
WITH group_stats AS (
SELECT
MAX(CASE WHEN class_type = 'offline' THEN mean_score END) AS xbar_offline,
MAX(CASE WHEN class_type = 'online' THEN mean_score END) AS xbar_online,
MAX(CASE WHEN class_type = 'offline' THEN n END) AS n_offline,
MAX(CASE WHEN class_type = 'online' THEN n END) AS n_online
FROM (
SELECT class_type, COUNT(*) AS n, AVG(score) AS mean_score
FROM class_scores GROUP BY class_type
) sub
),
two_sample AS (
SELECT
xbar_offline, xbar_online, n_offline, n_online,
10.0 AS sigma_offline, 12.0 AS sigma_online,
(xbar_offline - xbar_online) /
SQRT(POW(10.0, 2) / n_offline + POW(12.0, 2) / n_online) AS z
FROM group_stats
)
SELECT
ROUND(xbar_offline, 2) AS mean_offline,
ROUND(xbar_online, 2) AS mean_online,
n_offline, n_online, sigma_offline, sigma_online,
ROUND(z, 4) AS z_statistic,
IF(ABS(z) > 1.96,
'Reject H₀ — online and offline scores differ significantly (α=0.05)',
'Retain H₀ — no significant difference between teaching modes') AS conclusion
FROM two_sample\G
Results
*************************** 1. row ***************************
mean_offline: 75
mean_online: 80
n_offline: 50
n_online: 60
sigma_offline: 10.0
sigma_online: 12.0
z_statistic: -2.3837
conclusion: Reject H₀ — online and offline scores differ significantly (α=0.05)
Z = −2.38, |z| > 1.96 critical value at α = 0.05. The online group’s 5-point mean advantage over the offline group is statistically significant. Note that this analytical two-sample approach requires the population standard deviations to be genuinely known in advance — if they are only estimated from the sample, a two-sample t-test is the correct choice.
Student Exam Scores — Right-Tailed Test
These three examples are adapted from Cuemath and cover all three directional cases: right-tailed, left-tailed, and a two-proportion test.
A teacher claims their class outperforms the school average of μ₀ = 82 with σ = 20 known from historical data. A sample of 81 students has a mean of 90 — tested right-tailed since the teacher’s claim is specifically that scores are above average.
WITH ztest AS (
SELECT CAST(STATS_ZTEST(score, 82.0, 20.0) AS CHAR(300)) AS json
FROM exam_scores
)
SELECT
ROUND(JSON_EXTRACT(json, '$.z'), 4) AS z_statistic,
ROUND(JSON_EXTRACT(json, '$.p_one_tail'), 6) AS p_one_tail,
IF(JSON_EXTRACT(json, '$.p_one_tail') < 0.05,
'Reject H₀ — sufficient evidence that μ > 82',
'Retain H₀ — insufficient evidence that μ > 82') AS conclusion
FROM ztest\G
*************************** 1. row ***************************
z_statistic: 3.6
p_one_tail: 0.000159
conclusion: Reject H₀ — sufficient evidence that μ > 82
Z = 3.6, p (upper tail) < 0.001. When the sample mean is above μ₀ the upper-tail p_one_tail is small and used directly for a right-tailed test. The class does perform significantly above the school average.
Medicine Delivery Times — Left-Tailed Test
An online pharmacy claims deliveries take under 120 minutes on average. σ = 30 minutes is known from long-run data, and 49 orders are sampled with a mean of 100 minutes. The test is left-tailed: we only care whether the true mean is below the claim.
WITH ztest AS (
SELECT CAST(STATS_ZTEST(minutes, 120.0, 30.0) AS CHAR(300)) AS json
FROM delivery_times
)
SELECT
ROUND(JSON_EXTRACT(json, '$.z'), 4) AS z_statistic,
ROUND(JSON_EXTRACT(json, '$.p_one_tail'), 8) AS p_one_tail,
ROUND(1 - JSON_EXTRACT(json, '$.p_one_tail'), 8) AS p_lower_tail,
IF(1 - JSON_EXTRACT(json, '$.p_one_tail') < 0.05,
'Reject H₀ — sufficient evidence that μ < 120',
'Retain H₀ — insufficient evidence that μ < 120') AS conclusion
FROM ztest\G
*************************** 1. row ***************************
z_statistic: -4.6667
p_one_tail: 0.99999847
p_lower_tail: 0.00000153
conclusion: Reject H₀ — sufficient evidence that μ < 120
Z = −4.67, left-tail p < 0.000002. When the sample mean is below μ₀ the upper-tail p_one_tail is close to 1; subtracting from 1 gives the left-tail probability used for this test. The pharmacy’s claim is well supported — deliveries are significantly faster than the 120-minute benchmark.
Assembly Line Defect Rates — Two-Proportion Test
The two-proportion z-test asks whether two independent proportions differ. STATS_ZTEST handles one-sample mean tests; for proportions the z-statistic is derived directly from aggregate counts using the pooled formula z = (p̂A − p̂B) / √[p̂(1−p̂)(1/nA + 1/nB)].
Line A produced 18 defects in 200 units; Line B produced 25 defects in 600 units. Are the defect rates significantly different?
WITH
props AS (SELECT line, defects, n, defects * 1.0 / n AS prop FROM defect_counts),
pooled AS (SELECT SUM(defects) * 1.0 / SUM(n) AS p_pool FROM defect_counts),
z_calc AS (
SELECT
MAX(CASE WHEN p.line = 'A' THEN p.prop END) AS p_a,
MAX(CASE WHEN p.line = 'B' THEN p.prop END) AS p_b,
po.p_pool,
MAX(CASE WHEN p.line = 'A' THEN p.n END) AS n_a,
MAX(CASE WHEN p.line = 'B' THEN p.n END) AS n_b
FROM props p, pooled po GROUP BY po.p_pool
)
SELECT
ROUND(p_a, 4) AS defect_rate_a,
ROUND(p_b, 4) AS defect_rate_b,
ROUND(p_pool, 4) AS pooled_proportion,
ROUND((p_a - p_b) / SQRT(p_pool * (1-p_pool) * (1.0/n_a + 1.0/n_b)), 4) AS z_statistic,
IF(ABS((p_a - p_b) / SQRT(p_pool * (1-p_pool) * (1.0/n_a + 1.0/n_b))) > 1.96,
'Reject H₀ — defect rates differ significantly between lines',
'Retain H₀ — no significant difference in defect rates') AS conclusion
FROM z_calc\G
*************************** 1. row ***************************
defect_rate_a: 0.0900
defect_rate_b: 0.0417
pooled_proportion: 0.0538
z_statistic: 2.6246
conclusion: Reject H₀ — defect rates differ significantly between lines
|z| = 2.62 > 1.96 critical value at α = 0.05. Line A’s defect rate of 9.0% is significantly higher than Line B’s 4.2%. No extension function is needed here — the pooled proportion formula is expressed entirely in SQL aggregates.
When to Use the Z-Test
The Z-test is appropriate when:
- The population standard deviation σ is known (from a long-run process, regulatory standard, or national dataset)
- The sample is reasonably large (n ≥ 30 is the common rule of thumb) or the population is known to be normally distributed
When σ is unknown and must be estimated from the sample, use a t-test instead. The two-sample t-test covered in the earlier post addresses comparisons between two groups; the Z-test here addresses comparisons against a single known benchmark.
The extension is available at github.com/ronaldbradford/vsql-statistics and installs with a single statement:
INSTALL EXTENSION vsql_statistics;