How to check normality assumptions
Most parametric statistical tests — t-tests, ANOVA, Pearson correlation, linear regression — assume that the data (or the residuals) follow a normal distribution. Violating this assumption can lead to inaccurate p-values and unreliable confidence intervals.
This guide covers how to check normality, how to interpret the results, and what to do when your data isn't normal.
Why normality matters
Parametric tests calculate p-values based on the assumption that the sampling distribution of the test statistic is known. When data is normally distributed, these calculations are exact. When data is severely non-normal — especially with small samples — the p-values can be misleading.
That said, normality is often misunderstood. The assumption is about the residuals or sampling distribution, not the raw data. With large samples (roughly n > 30 per group), the Central Limit Theorem means that parametric tests are robust to moderate departures from normality.
Practical rule of thumb: With sample sizes above 30 per group, moderate skewness is usually fine for t-tests and ANOVA. With small samples (n < 15), normality matters more, and you should check carefully.
Method 1: Shapiro-Wilk test
The Shapiro-Wilk test is the most widely recommended formal test for normality. It tests the null hypothesis that the data came from a normal distribution.
- If p > .05: no evidence against normality (proceed with parametric test)
- If p ≤ .05: evidence that the data departs from normality (consider a non-parametric alternative)
How to interpret Shapiro-Wilk results
W = 0.96, p = .42
A W statistic close to 1.0 indicates the data is approximately normal. Here, p = .42 is well above .05, so there's no evidence against normality.
Limitations of Shapiro-Wilk
- Overpowered with large samples. When n > 5,000, Shapiro-Wilk almost always rejects the null hypothesis, even for trivially small departures from normality that won't affect your analysis. In this case, rely on Q-Q plots instead.
- Underpowered with small samples. When n < 10, the test may fail to detect genuine non-normality. Visual inspection (Q-Q plot) is especially important here.
Method 2: Q-Q plots (visual inspection)
A quantile-quantile (Q-Q) plot graphs the quantiles of your data against the theoretical quantiles of a normal distribution. If your data is normal, the points fall approximately along a straight diagonal line.
Reading a Q-Q plot
- Points on the line → Data is approximately normal
- S-shaped curve → Data has heavy tails (leptokurtic) or light tails (platykurtic)
- Banana shape curving upward → Right-skewed data
- Banana shape curving downward → Left-skewed data
- Isolated points far from the line → Outliers
Q-Q plots are often more informative than formal tests because they show how the data departs from normality — which helps you choose the right response.
Method 3: Descriptive indicators
Skewness and kurtosis values provide numerical summaries of distribution shape:
- Skewness: Values between −1 and +1 suggest approximate symmetry
- Kurtosis: Values between −2 and +2 suggest approximate normality (some sources use −1 to +1 for excess kurtosis)
These are supplementary — use them alongside Shapiro-Wilk and Q-Q plots, not as the sole check.
What to do when normality fails
When your data isn't normally distributed, you have several options:
Option 1: Switch to a non-parametric test
This is the most common and safest approach. Non-parametric tests don't assume normality:
| Parametric test | Non-parametric alternative |
|---|---|
| Unpaired t-test | Mann-Whitney U test |
| Paired t-test | Wilcoxon signed-rank test |
| One-way ANOVA | Kruskal-Wallis H test |
| Repeated measures ANOVA | Friedman test |
| Pearson correlation | Spearman correlation |
Option 2: Transform the data
Log, square root, or inverse transformations can sometimes normalize skewed data. However, this changes the scale of your outcome, making interpretation harder. Modern practice increasingly favors non-parametric tests over transformations.
Option 3: Proceed with caution (large samples)
If your sample is large (roughly n > 30 per group) and the departure from normality is moderate, parametric tests are often robust enough. Report that you checked normality, note the violation, and mention it in your limitations.
When to check normality
Check normality at the right level depending on your test:
- t-tests: Check normality in each group separately (unpaired) or normality of the differences (paired)
- ANOVA: Check normality of the residuals, or normality within each group
- Linear regression: Check normality of the residuals (not the raw variables)
- Pearson correlation: Check normality of both variables
Common mistake: Testing normality on the combined data instead of per group. If you have a treatment and control group, run Shapiro-Wilk on each group separately. The combined distribution may look non-normal even when each group is normal.
Reporting normality checks
In your methods section, briefly note how you checked assumptions:
Normality was assessed using the Shapiro-Wilk test and visual inspection of Q-Q plots. Both groups satisfied the normality assumption (W > 0.94, p > .05). Homogeneity of variances was confirmed by Levene's test, F(1, 48) = 0.82, p = .370.
The Shapiro-Wilk test indicated departure from normality in the treatment group (W = 0.87, p = .012). Accordingly, a Mann-Whitney U test was used instead of an independent t-test.
Planning your study? Use our free sample size calculator to determine how many participants you need — adequate sample size improves both the reliability of normality tests and the robustness of parametric tests to non-normality.
Join the beta to try this in GraphHelix — upload your data, and the AI will check normality (Shapiro-Wilk + Q-Q plots) automatically before every parametric test.
Join the beta