t-test vs. Mann-Whitney U: when to use each
You have two independent groups and a continuous outcome. Should you use an unpaired t-test or a Mann-Whitney U test? This is one of the most frequent decisions in applied statistics, and the answer depends on whether your data meets the assumptions of the parametric test.
The short answer
| Condition | Use this test |
|---|---|
| Data is approximately normal in each group, similar variances | Unpaired t-test |
| Unequal variances but approximately normal | Welch's t-test (default in most software) |
| Data is not normal (skewed, outliers, ordinal) | Mann-Whitney U test |
| Large samples (n > 30 per group), moderate non-normality | Either — t-test is robust here |
What each test does
Unpaired t-test
The unpaired (independent samples) t-test compares the means of two groups. It assumes:
- Independence — observations in the two groups are independent
- Normality — the outcome variable is approximately normally distributed in each group
- Equal variances — the variances in the two groups are similar (relaxed by Welch's correction)
When these assumptions hold, the t-test is the most powerful test for detecting a difference in means — meaning it has the best chance of finding a real effect.
Mann-Whitney U test
The Mann-Whitney U test compares the distributions (more precisely, the ranks) of two independent groups. It only assumes:
- Independence — observations are independent
- Ordinal data — the outcome can be ranked (which all continuous data can)
It does not assume normality or equal variances. This makes it the safer choice when assumptions are in doubt.
Key distinction: The t-test compares means. The Mann-Whitney U tests whether one group tends to have larger values than the other. If both distributions have the same shape, these answer the same question. If the distributions differ in shape, they can give different answers — and the Mann-Whitney may be more meaningful.
When to choose each test
Use the t-test when:
- Your data passes the Shapiro-Wilk normality test in each group (or Q-Q plots look approximately linear)
- Your sample size is large enough (>30 per group) that the Central Limit Theorem provides robustness
- You specifically care about comparing means
- You want maximum statistical power for detecting a mean difference
Use Mann-Whitney U when:
- Shapiro-Wilk rejects normality and your sample is small
- Your data is ordinal (e.g., Likert scale, pain ratings)
- Your data has pronounced outliers that would inflate the mean
- The distributions are heavily skewed and medians are more meaningful than means
- You want a test that's valid regardless of distribution shape
A common misconception
Many researchers believe that if the Shapiro-Wilk test is significant, they must use Mann-Whitney. This is too rigid. Consider:
- With large samples, Shapiro-Wilk is overpowered — it rejects normality for trivial departures that don't affect the t-test
- The t-test is robust to moderate non-normality when sample sizes are equal and reasonably large
- Look at the Q-Q plot: if the points are roughly on the line with minor wobbles, the t-test is likely fine
The decision should be based on the severity of the violation, not just whether Shapiro-Wilk's p-value is below .05.
Effect sizes
Both tests have appropriate effect size measures:
| Test | Effect size | Interpretation |
|---|---|---|
| t-test | Cohen's d | Small: 0.20, Medium: 0.50, Large: 0.80 |
| Mann-Whitney U | Rank-biserial r | Small: 0.10, Medium: 0.30, Large: 0.50 |
Always report an effect size alongside the p-value. A statistically significant result with a tiny effect size may not be practically meaningful. See our effect sizes guide for more detail.
How to report each test in APA format
Unpaired t-test
An independent-samples t-test indicated that the treatment group (M = 23.4, SD = 5.1) scored significantly higher than the control group (M = 18.7, SD = 4.8), t(48) = 3.45, p = .001, d = 0.97, 95% CI [0.38, 1.56].
Mann-Whitney U test
A Mann-Whitney U test indicated that pain ratings were significantly lower in the treatment group (Mdn = 3) than in the control group (Mdn = 5), U = 156, p = .003, r = .45.
Note: for Mann-Whitney, report medians (not means), since the test is based on ranks.
Decision checklist
- Check normality in each group (Shapiro-Wilk + Q-Q plot)
- If normal in both groups: unpaired t-test
- If normality fails but n > 30 per group and violation is moderate: t-test is likely fine — report the violation in your methods
- If normality fails with small samples, strong skew, or ordinal data: Mann-Whitney U
- If in doubt: report both tests. If they agree, it strengthens your conclusion
Once you've decided which test to use, calculate the sample size you need with our free power analysis calculator — it supports both t-tests and non-parametric alternatives.
Join the beta to try this in GraphHelix — the AI checks normality and equal variances automatically, and suggests switching to Mann-Whitney U when assumptions are violated.
Join the beta