Which statistical test should I use?
Choosing the right statistical test is one of the most common challenges in research. The decision depends on your research question, your variable types, and how many groups you're comparing.
Use the interactive flowchart below to find the right test, or read the full guide to understand the reasoning behind each recommendation.
Interactive test selector
Answer each question to narrow down to the right test. The flowchart covers the most common research scenarios.
What is your research goal?
What are you trying to find out?
How many groups are you comparing?
One-sample t-test
Compare your sample mean to a hypothetical or population value. If your data is not normally distributed, consider the Wilcoxon signed-rank test as a non-parametric alternative.
Are the observations paired or independent?
Paired: same subjects measured twice (before/after). Independent: different subjects in each group.
Is the difference between pairs normally distributed?
Run a Shapiro-Wilk test on the differences. See our normality guide.
Paired t-test
Compares the means of two related groups. Reports t, df, p, and Cohen's d. For a Bayesian approach, use a Bayesian paired t-test to get a Bayes factor (BF10).
Wilcoxon signed-rank test
Non-parametric alternative to the paired t-test. Does not assume normality. Reports W, p, and rank-biserial r as effect size.
Is the outcome normally distributed in each group?
Run Shapiro-Wilk on each group separately. Also check Levene's test for equal variances.
Unpaired (independent) t-test
Compares the means of two independent groups. Reports t, df, p, and Cohen's d. If Levene's test indicates unequal variances, use Welch's t-test (the default in most software). For a Bayesian approach, use a Bayesian t-test.
Mann-Whitney U test
Non-parametric alternative to the unpaired t-test. Compares distributions of two independent groups without assuming normality. Reports U, p, and rank-biserial r. See our t-test vs. Mann-Whitney guide for more detail.
Are the observations repeated measures or independent groups?
Repeated measures: same subjects measured across all conditions. Independent: different subjects in each group.
Is the outcome normally distributed?
Repeated measures ANOVA
Compares means across 3+ related conditions. Reports F, df, p, and partial eta-squared (η2p). Automatically checks Mauchly's sphericity and applies Greenhouse-Geisser correction if needed.
Friedman test
Non-parametric alternative to repeated measures ANOVA. Compares distributions across 3+ related conditions without assuming normality.
Is the outcome normally distributed in each group?
Kruskal-Wallis H test
Non-parametric alternative to one-way ANOVA. Compares distributions across 3+ independent groups. Follow up with Dunn's post-hoc test to identify which groups differ.
How many factors (grouping variables)?
One-way ANOVA
Compares means across 3+ independent groups on one factor. Reports F, df, p, and eta-squared (η2). Follow up with Tukey's HSD, Bonferroni, or Holm-Šidák post-hoc tests. For a Bayesian alternative, use Bayesian ANOVA. See our APA ANOVA reporting guide.
Two-way (factorial) ANOVA
Tests main effects of two factors and their interaction. Reports F, df, p, and partial eta-squared for each effect. Check the interaction term first — if significant, interpret simple effects rather than main effects.
What kind of relationship?
Are both variables continuous and normally distributed?
Pearson correlation
Measures the strength and direction of a linear relationship. Reports r, p, and 95% confidence interval. The effect size is the correlation coefficient: small (r = .10), medium (r = .30), large (r = .50).
Spearman correlation
Rank-based correlation for ordinal data or non-normal continuous data. Measures monotonic (not just linear) relationships. Reports rs, p, and 95% confidence interval.
What type is your outcome variable?
Linear regression
Models the relationship between one or more predictors and a continuous outcome. Reports R2, adjusted R2, F, p, and standardized betas for each predictor. Check residual normality and VIF for multicollinearity.
Logistic regression
Models the probability of a binary outcome from one or more predictors. Reports odds ratios, 95% CI, AUC, and classification accuracy. Use when your outcome is categorical (e.g., responded/not responded, survived/died).
Chi-square test of independence
Tests whether two categorical variables are associated. Reports χ2, df, p, and Cramér's V as effect size. Use when both variables are categorical (e.g., treatment group × outcome category).
Do you want to adjust for covariates?
Kaplan-Meier survival analysis
Estimates survival probabilities over time. Compare groups with the log-rank test. Reports survival curves with 95% confidence intervals. Best for visualizing time-to-event data and comparing two or more groups.
Cox proportional hazards regression
Models the effect of covariates on survival time. Reports hazard ratios (HR), 95% CI, and concordance index. Use when you need to adjust for confounders (age, sex, treatment) in a survival analysis.
The key questions
Every statistical test decision comes down to a few fundamental questions about your data and research design:
1. What is your research goal?
Comparing groups (Is there a difference between treatment and control?) requires t-tests or ANOVA. Examining relationships (Does X predict Y?) requires correlation or regression. Testing categorical associations (Are smokers more likely to develop disease X?) requires chi-square. Time-to-event (How long until relapse?) requires survival analysis.
2. What types are your variables?
Continuous variables (weight, blood pressure, reaction time) allow parametric tests. Ordinal variables (pain scale 1-10, Likert ratings) generally require non-parametric tests. Categorical variables (treatment/control, male/female) require chi-square or logistic regression.
3. How many groups are you comparing?
Two groups: t-test or Mann-Whitney. Three or more groups: ANOVA or Kruskal-Wallis. Never run multiple t-tests to compare 3+ groups — this inflates your Type I error rate.
4. Are observations independent or paired?
Independent: Different participants in each group. Paired: Same participants measured at two time points, or matched pairs. Using the wrong test here is one of the most common statistical errors in published research.
5. Are the parametric assumptions met?
Parametric tests (t-test, ANOVA, Pearson correlation) assume normality and, in some cases, equal variances. When these assumptions are violated, non-parametric alternatives are more appropriate. See our normality assumptions guide for how to check.
Not sure where to start? The most common mistake is overthinking the decision. If you can answer the five questions above, the flowchart gives you a defensible answer. The key is checking assumptions after choosing the test family — not before.
Common scenarios
| Scenario | Recommended test |
|---|---|
| Treatment vs. control, continuous outcome | Unpaired t-test (or Mann-Whitney U) |
| Before/after measurement on same subjects | Paired t-test (or Wilcoxon signed-rank) |
| Three drug doses, continuous outcome | One-way ANOVA (or Kruskal-Wallis) |
| Drug × sex on blood pressure | Two-way factorial ANOVA |
| Does age predict recovery time? | Linear regression |
| Does treatment predict survival/death? | Logistic regression |
| Is smoking associated with disease X? | Chi-square test |
| Time to relapse across three treatments | Kaplan-Meier + log-rank test |
Once you know which test you need, use our free sample size calculator to determine how many participants your study requires.
Join the beta to try this in GraphHelix — describe your research question, and the AI will recommend the right test and check assumptions automatically.
Join the beta