Producing One-Sample Z-Test statistics with SQL

Ronald Bradford
June 17, 2026

The one-sample Z-test determines whether a sample mean differs significantly from a known population mean when the population standard deviation is also known. It is the appropriate test when the population parameters are established — quality control benchmarks, national averages, long-run process measurements — and you want to evaluate whether a new sample is consistent with them. The null hypothesis is that the sample was drawn from the known population.

Leveraging the VillageSQL Extension Bundle (VEB) , you can run a Z-test directly in SQL with STATS_ZTEST(value, mu, sigma). Pass mu (the known population mean) and sigma (the known population standard deviation) as constants on every row; the function accumulates the sample and returns JSON containing the Z-statistic, the upper one-tail p-value, and the two-tail p-value. VillageSQL is a drop-in MySQL 8.4 replacement.

Returns JSON: { "z": ..., "p_one_tail": ..., "p_two_tail": ... }

For directional (one-tailed) tests:

H₁: μ > μ₀ — use p_one_tail directly
H₁: μ < μ₀ — use 1 − p_one_tail

Soft Drink Fill Quality Control

Z-Test SQL Example

A production line targets 500 ml per bottle with a long-run process standard deviation of σ = 8 ml. A sample of 25 bottles is taken from a single run. The question is whether the line is filling correctly (two-tailed) and specifically whether it is systematically underfilling (lower one-tailed).

-- Sample descriptives
SELECT COUNT(*) AS n, ROUND(AVG(volume_ml), 2) AS sample_mean
FROM bottle_fill;

n    sample_mean
25   495.60

The sample mean is 4.4 ml below the 500 ml target. Whether that gap is statistically meaningful is what the Z-test determines.

WITH ztest AS (
    SELECT CAST(STATS_ZTEST(volume_ml, 500.0, 8.0) AS CHAR(300)) AS json
    FROM bottle_fill
)
SELECT
    ROUND(JSON_EXTRACT(json, '$.z'),          6) AS z_statistic,
    ROUND(JSON_EXTRACT(json, '$.p_one_tail'), 6) AS p_one_tail,
    ROUND(1 - JSON_EXTRACT(json, '$.p_one_tail'), 6) AS p_lower_tail,
    ROUND(JSON_EXTRACT(json, '$.p_two_tail'), 6) AS p_two_tail,
    IF(JSON_EXTRACT(json, '$.p_two_tail') < 0.05,
       'Reject H₀ — fill volume differs significantly from 500 ml target',
       'Retain H₀ — no significant deviation from 500 ml target') AS two_tail_conclusion,
    IF(1 - JSON_EXTRACT(json, '$.p_one_tail') < 0.05,
       'Reject H₀ — significant underfill detected (μ < 500)',
       'Retain H₀ — no significant underfill') AS lower_tail_conclusion
FROM ztest\G

Results

*************************** 1. row ***************************
          z_statistic: -2.75
           p_one_tail: 0.99702
         p_lower_tail: 0.00298
           p_two_tail: 0.00596
  two_tail_conclusion: Reject H₀ — fill volume differs significantly from 500 ml target
lower_tail_conclusion: Reject H₀ — significant underfill detected (μ < 500)

Z = −2.75, p (two-tail) = 0.006. We reject the null both ways: the fill volume differs significantly from target, and specifically the line is underfilling. A mean shortfall of 4.4 ml across 25 bottles is not random variation — the process needs adjustment.

Note the directionality: p_one_tail is the upper-tail probability P(Z > z), which at −2.75 is 0.997 — almost certain the true mean is not above 500 ml. The lower-tail probability 1 − p_one_tail = 0.003 is what we use to test for underfill.

School Reading Scores vs National Benchmark

The second example shows STATS_ZTEST used with GROUP BY to test multiple groups against the same population benchmark in a single query — a natural pattern for comparing sites, branches, or cohorts against a known standard.

The national reading benchmark is μ = 72, σ = 11. Three schools each have 30 students.

-- Sample descriptives per school
SELECT school, COUNT(*) AS n, ROUND(AVG(score), 2) AS sample_mean
FROM reading_scores
GROUP BY school
ORDER BY school;

school     n    sample_mean
Central    30   72.50
Highland   30   64.17
Riverside  30   77.30

WITH ztest AS (
    SELECT
        school,
        ROUND(AVG(score), 2)                               AS sample_mean,
        CAST(STATS_ZTEST(score, 72.0, 11.0) AS CHAR(300)) AS json
    FROM reading_scores
    GROUP BY school
)
SELECT
    school,
    sample_mean,
    ROUND(JSON_EXTRACT(json, '$.z'),          4) AS z_statistic,
    ROUND(JSON_EXTRACT(json, '$.p_two_tail'), 4) AS p_two_tail,
    CASE
        WHEN JSON_EXTRACT(json, '$.p_two_tail') < 0.05
         AND JSON_EXTRACT(json, '$.z') > 0
        THEN 'Significantly above national average'
        WHEN JSON_EXTRACT(json, '$.p_two_tail') < 0.05
         AND JSON_EXTRACT(json, '$.z') < 0
        THEN 'Significantly below national average'
        ELSE 'No significant difference from national average'
    END AS conclusion
FROM ztest
ORDER BY school\G

Results

*************************** 1. row ***************************
     school: Central
sample_mean: 72.5
z_statistic: 0.2490
 p_two_tail: 0.8034
 conclusion: No significant difference from national average
*************************** 2. row ***************************
     school: Highland
sample_mean: 64.17
z_statistic: -3.9004
 p_two_tail: 0.0001
 conclusion: Significantly below national average
*************************** 3. row ***************************
     school: Riverside
sample_mean: 77.3
z_statistic: 2.6390
 p_two_tail: 0.0083
 conclusion: Significantly above national average

Central (Z = 0.25, p = 0.80) is indistinguishable from the national average — its 0.5-point gap above 72 is well within chance variation. Riverside (Z = 2.64, p = 0.008) performs significantly above average. Highland (Z = −3.90, p < 0.001) is significantly below — a 7.8-point deficit that is extremely unlikely to be random. The GROUP BY pattern means all three tests run in a single pass over the table.

Battery Lifetime Quality Control

Battery / IQ Z-Test SQL Example

These two examples, adapted from Statistics By Jim , reinforce the directionality patterns for left-tailed and two-tailed tests.

A manufacturer claims their batteries last μ₀ = 500 hours with a known process σ = 40 hours. A QA engineer samples 30 batteries from a suspect batch and tests whether it falls short of specification — a left-tailed test.

WITH ztest AS (
    SELECT CAST(STATS_ZTEST(hours, 500.0, 40.0) AS CHAR(300)) AS json
    FROM battery_lifetimes
),
result AS (
    SELECT COUNT(*) AS n, ROUND(AVG(hours), 4) AS sample_mean,
           500.0 AS claimed_mean, 40.0 AS known_sigma
    FROM battery_lifetimes
)
SELECT
    r.n, r.sample_mean, r.claimed_mean, r.known_sigma,
    ROUND(JSON_EXTRACT(z.json, '$.z'),          4) AS z_statistic,
    ROUND(1 - JSON_EXTRACT(z.json, '$.p_one_tail'), 6) AS p_left_tail,
    ROUND(JSON_EXTRACT(z.json, '$.p_two_tail'), 6) AS p_two_tail,
    IF(1 - JSON_EXTRACT(z.json, '$.p_one_tail') < 0.05,
       'Reject H₀ — batch lifetime IS significantly below 500 hrs',
       'Retain H₀ — no significant evidence batch falls short') AS conclusion
FROM result r, ztest z\G

Results

*************************** 1. row ***************************
           n: 30
 sample_mean: 480.3333
claimed_mean: 500.0
 known_sigma: 40.0
 z_statistic: -2.693
 p_left_tail: 0.003541
  p_two_tail: 0.007082
  conclusion: Reject H₀ — batch lifetime IS significantly below 500 hrs

Z = −2.69, left-tail p = 0.0035. The batch mean of 480.3 hours is significantly below the claimed 500-hour specification. The p_one_tail returned by the function is the upper-tail probability P(Z > z) = 0.997 — near-certain the mean is not above 500. Subtracting from 1 gives the left-tail p-value used for the underfill test.

IQ Enrichment Programme

A researcher tests 40 graduates from an enrichment programme against the standardised population mean of μ₀ = 100, σ = 15. The research question is whether the programme changes IQ in either direction — a two-tailed test.

WITH ztest AS (
    SELECT CAST(STATS_ZTEST(iq, 100.0, 15.0) AS CHAR(300)) AS json
    FROM iq_scores
),
result AS (
    SELECT COUNT(*) AS n, ROUND(AVG(iq), 4) AS sample_mean,
           100.0 AS population_mean, 15.0 AS known_sigma
    FROM iq_scores
)
SELECT
    r.n, r.sample_mean, r.population_mean, r.known_sigma,
    ROUND(JSON_EXTRACT(z.json, '$.z'),          4) AS z_statistic,
    ROUND(JSON_EXTRACT(z.json, '$.p_one_tail'), 6) AS p_one_tail_upper,
    ROUND(JSON_EXTRACT(z.json, '$.p_two_tail'), 6) AS p_two_tail,
    IF(JSON_EXTRACT(z.json, '$.p_two_tail') < 0.05,
       'Reject H₀ — programme graduates score significantly differently from μ=100',
       'Retain H₀ — no significant difference from population mean') AS conclusion
FROM result r, ztest z\G

Results

*************************** 1. row ***************************
               n: 40
     sample_mean: 106.1
 population_mean: 100.0
     known_sigma: 15.0
     z_statistic: 2.572
p_one_tail_upper: 0.005056
      p_two_tail: 0.010112
      conclusion: Reject H₀ — programme graduates score significantly differently from μ=100

Z = 2.57, p (two-tail) = 0.010. The programme graduates score 6.1 points above the population mean — a statistically significant difference at α = 0.05. Since the sample mean is above μ₀ the upper-tail probability p_one_tail is already small (0.005); for a two-tailed test p_two_tail is the correct value to compare against α.

Smartphone Battery Life

GeeksForGeeks Z-Test SQL Example

These two examples are adapted from GeeksForGeeks and cover a large-sample one-sample test and a two-sample z-test.

A manufacturer claims their smartphones deliver μ₀ = 12.0 hours battery life, with a known production σ = 0.5 hours. A consumer group tests 100 devices and finds a mean of 11.8 hours — a two-tailed test of whether the claim holds.

WITH ztest AS (
    SELECT CAST(STATS_ZTEST(battery_hr, 12.0, 0.5) AS CHAR(300)) AS json
    FROM smartphone_battery
),
summary AS (
    SELECT COUNT(*) AS n, ROUND(AVG(battery_hr), 4) AS sample_mean
    FROM smartphone_battery
)
SELECT
    s.n, s.sample_mean,
    12.0 AS claimed_mean_mu0, 0.5 AS known_sigma,
    ROUND(JSON_EXTRACT(z.json, '$.z'),          4) AS z_statistic,
    ROUND(JSON_EXTRACT(z.json, '$.p_two_tail'), 6) AS p_two_tail,
    IF(JSON_EXTRACT(z.json, '$.p_two_tail') < 0.05,
       'Reject H₀ — actual battery life differs from 12 hrs',
       'Retain H₀ — no significant difference from 12 hrs') AS conclusion
FROM summary s, ztest z\G

Results

*************************** 1. row ***************************
               n: 100
     sample_mean: 11.8
claimed_mean_mu0: 12.0
     known_sigma: 0.5
     z_statistic: -4
      p_two_tail: 0.000063
      conclusion: Reject H₀ — actual battery life differs from 12 hrs

Z = −4.0, p (two-tail) < 0.0001. With 100 observations a 0.2-hour shortfall against a σ of 0.5 hours produces a very strong signal — the manufacturer’s claim is not supported by the test data.

Online vs Offline Class Scores (Two-Sample Z-Test)

STATS_ZTEST is a one-sample aggregate. When both population standard deviations are known, a two-sample z-test can be derived analytically in SQL from GROUP BY summary statistics using the formula z = (x̄₁ − x̄₂) / √(σ₁²/n₁ + σ₂²/n₂).

Group A (offline classroom): n = 50, σ₁ = 10 (known). Group B (online course): n = 60, σ₂ = 12 (known). The question is whether teaching mode affects scores — a two-tailed test.

WITH group_stats AS (
    SELECT
        MAX(CASE WHEN class_type = 'offline' THEN mean_score END) AS xbar_offline,
        MAX(CASE WHEN class_type = 'online'  THEN mean_score END) AS xbar_online,
        MAX(CASE WHEN class_type = 'offline' THEN n          END) AS n_offline,
        MAX(CASE WHEN class_type = 'online'  THEN n          END) AS n_online
    FROM (
        SELECT class_type, COUNT(*) AS n, AVG(score) AS mean_score
        FROM class_scores GROUP BY class_type
    ) sub
),
two_sample AS (
    SELECT
        xbar_offline, xbar_online, n_offline, n_online,
        10.0 AS sigma_offline, 12.0 AS sigma_online,
        (xbar_offline - xbar_online) /
            SQRT(POW(10.0, 2) / n_offline + POW(12.0, 2) / n_online) AS z
    FROM group_stats
)
SELECT
    ROUND(xbar_offline, 2) AS mean_offline,
    ROUND(xbar_online,  2) AS mean_online,
    n_offline, n_online, sigma_offline, sigma_online,
    ROUND(z, 4) AS z_statistic,
    IF(ABS(z) > 1.96,
       'Reject H₀ — online and offline scores differ significantly (α=0.05)',
       'Retain H₀ — no significant difference between teaching modes') AS conclusion
FROM two_sample\G

Results

*************************** 1. row ***************************
     mean_offline: 75
      mean_online: 80
        n_offline: 50
         n_online: 60
    sigma_offline: 10.0
     sigma_online: 12.0
      z_statistic: -2.3837
       conclusion: Reject H₀ — online and offline scores differ significantly (α=0.05)

Z = −2.38, |z| > 1.96 critical value at α = 0.05. The online group’s 5-point mean advantage over the offline group is statistically significant. Note that this analytical two-sample approach requires the population standard deviations to be genuinely known in advance — if they are only estimated from the sample, a two-sample t-test is the correct choice.

Student Exam Scores — Right-Tailed Test

Cuemath Z-Test SQL Example

These three examples are adapted from Cuemath and cover all three directional cases: right-tailed, left-tailed, and a two-proportion test.

A teacher claims their class outperforms the school average of μ₀ = 82 with σ = 20 known from historical data. A sample of 81 students has a mean of 90 — tested right-tailed since the teacher’s claim is specifically that scores are above average.

WITH ztest AS (
    SELECT CAST(STATS_ZTEST(score, 82.0, 20.0) AS CHAR(300)) AS json
    FROM exam_scores
)
SELECT
    ROUND(JSON_EXTRACT(json, '$.z'),          4) AS z_statistic,
    ROUND(JSON_EXTRACT(json, '$.p_one_tail'), 6) AS p_one_tail,
    IF(JSON_EXTRACT(json, '$.p_one_tail') < 0.05,
       'Reject H₀ — sufficient evidence that μ > 82',
       'Retain H₀ — insufficient evidence that μ > 82') AS conclusion
FROM ztest\G

*************************** 1. row ***************************
z_statistic: 3.6
 p_one_tail: 0.000159
 conclusion: Reject H₀ — sufficient evidence that μ > 82

Z = 3.6, p (upper tail) < 0.001. When the sample mean is above μ₀ the upper-tail p_one_tail is small and used directly for a right-tailed test. The class does perform significantly above the school average.

Medicine Delivery Times — Left-Tailed Test

An online pharmacy claims deliveries take under 120 minutes on average. σ = 30 minutes is known from long-run data, and 49 orders are sampled with a mean of 100 minutes. The test is left-tailed: we only care whether the true mean is below the claim.

WITH ztest AS (
    SELECT CAST(STATS_ZTEST(minutes, 120.0, 30.0) AS CHAR(300)) AS json
    FROM delivery_times
)
SELECT
    ROUND(JSON_EXTRACT(json, '$.z'),                      4) AS z_statistic,
    ROUND(JSON_EXTRACT(json, '$.p_one_tail'),             8) AS p_one_tail,
    ROUND(1 - JSON_EXTRACT(json, '$.p_one_tail'),         8) AS p_lower_tail,
    IF(1 - JSON_EXTRACT(json, '$.p_one_tail') < 0.05,
       'Reject H₀ — sufficient evidence that μ < 120',
       'Retain H₀ — insufficient evidence that μ < 120') AS conclusion
FROM ztest\G

*************************** 1. row ***************************
 z_statistic: -4.6667
  p_one_tail: 0.99999847
p_lower_tail: 0.00000153
  conclusion: Reject H₀ — sufficient evidence that μ < 120

Z = −4.67, left-tail p < 0.000002. When the sample mean is below μ₀ the upper-tail p_one_tail is close to 1; subtracting from 1 gives the left-tail probability used for this test. The pharmacy’s claim is well supported — deliveries are significantly faster than the 120-minute benchmark.

Assembly Line Defect Rates — Two-Proportion Test

The two-proportion z-test asks whether two independent proportions differ. STATS_ZTEST handles one-sample mean tests; for proportions the z-statistic is derived directly from aggregate counts using the pooled formula z = (p̂A − p̂B) / √[p̂(1−p̂)(1/nA + 1/nB)].

Line A produced 18 defects in 200 units; Line B produced 25 defects in 600 units. Are the defect rates significantly different?

WITH
props  AS (SELECT line, defects, n, defects * 1.0 / n AS prop FROM defect_counts),
pooled AS (SELECT SUM(defects) * 1.0 / SUM(n) AS p_pool FROM defect_counts),
z_calc AS (
    SELECT
        MAX(CASE WHEN p.line = 'A' THEN p.prop END) AS p_a,
        MAX(CASE WHEN p.line = 'B' THEN p.prop END) AS p_b,
        po.p_pool,
        MAX(CASE WHEN p.line = 'A' THEN p.n   END) AS n_a,
        MAX(CASE WHEN p.line = 'B' THEN p.n   END) AS n_b
    FROM props p, pooled po GROUP BY po.p_pool
)
SELECT
    ROUND(p_a,    4) AS defect_rate_a,
    ROUND(p_b,    4) AS defect_rate_b,
    ROUND(p_pool, 4) AS pooled_proportion,
    ROUND((p_a - p_b) / SQRT(p_pool * (1-p_pool) * (1.0/n_a + 1.0/n_b)), 4) AS z_statistic,
    IF(ABS((p_a - p_b) / SQRT(p_pool * (1-p_pool) * (1.0/n_a + 1.0/n_b))) > 1.96,
       'Reject H₀ — defect rates differ significantly between lines',
       'Retain H₀ — no significant difference in defect rates') AS conclusion
FROM z_calc\G

*************************** 1. row ***************************
    defect_rate_a: 0.0900
    defect_rate_b: 0.0417
pooled_proportion: 0.0538
      z_statistic: 2.6246
       conclusion: Reject H₀ — defect rates differ significantly between lines

|z| = 2.62 > 1.96 critical value at α = 0.05. Line A’s defect rate of 9.0% is significantly higher than Line B’s 4.2%. No extension function is needed here — the pooled proportion formula is expressed entirely in SQL aggregates.

When to Use the Z-Test

The Z-test is appropriate when:

The population standard deviation σ is known (from a long-run process, regulatory standard, or national dataset)
The sample is reasonably large (n ≥ 30 is the common rule of thumb) or the population is known to be normally distributed

When σ is unknown and must be estimated from the sample, use a t-test instead. The two-sample t-test covered in the earlier post addresses comparisons between two groups; the Z-test here addresses comparisons against a single known benchmark.

The extension is available at github.com/ronaldbradford/vsql-statistics and installs with a single statement:

INSTALL EXTENSION vsql_statistics;

Tagged with: MySQL VillageSQL Extensions

Producing Skewness statistics with SQL

Ronald Bradford
June 14, 2026

Skewness measures the asymmetry of a distribution. A perfectly symmetric distribution has a skewness of zero. A positive skew (right-skewed) means the tail extends to the right — a small number of high values pull the mean above the median.

Producing Chi-Squared statistics with SQL

Ronald Bradford
June 12, 2026

The Chi-Squared test is one of the most widely used statistical tests for categorical data. It comes in two flavors: the goodness-of-fit test asks whether an observed frequency distribution matches an expected one, while the test of independence asks whether two categorical variables are associated with each other.

Producing Two Sample T-Test statistics with SQL

Ronald Bradford
June 8, 2026

The two sample t-test for equal variance is a statistical test to determine if the means of two groups are different enough that the difference is likely caused by some underlying difference, rather than random chance.