16. Interpreting Statistics in Context
Before you start
- Familiarity with descriptive and inferential statistics
- Comfort with effect sizes and confidence intervals
- Awareness that p-values are conditional probabilities, not truth claims
By the end you'll be able to
- Interpret statistical findings without losing context
- Apply contextual inference to hypothesis testing
- Avoid reductionist conclusions from aggregate data
- Recognize ecological fallacy and Simpson's paradox in real studies
- Read statistics as patterns within systems
Statistics describe patterns; they don't escape context
A statistical analysis produces estimates with uncertainty. Those estimates describe patterns in the data we have, conditional on the assumptions we made. They don't deliver objective truth about the world independent of context. Treating them as if they do is the most common interpretive error in quantitative research.
The transdisciplinary discipline is to interpret statistics in context — keeping the social, biological, and methodological context active in the interpretation rather than stripping it away to produce a clean number.
Effect sizes need anchors
An effect size in isolation tells you very little. A Cohen's d of 0.3, an odds ratio of 1.5, a 12% absolute risk reduction — none of these means anything without comparison anchors:
- What's a typical effect size in this domain? (A d of 0.3 is small for individual-difference research, modest for clinical trials, possibly substantial for population-level interventions.)
- What's the practical magnitude in real-world units? (A 12% reduction in a baseline rate of 200 per 10,000 means 24 averted cases — small absolute, meaningful for a costly condition.)
- What's the cost of producing this effect? (A small effect at low cost may be policy-relevant; the same effect at high cost may not be.)
- What's the counterfactual? (Against doing nothing? Against current practice? Against an alternative intervention?)
Translating effect sizes into anchored, practical terms is the work of interpretation. Skipping it produces clean numbers and bad decisions.
Confidence intervals matter more than p-values
A p-value is a conditional probability about data, given a null hypothesis. It is not a probability that the null is true, not a measure of effect size, and not a measure of clinical or practical significance.
A confidence interval describes the range of effect sizes consistent with the data. It is far more interpretively useful:
- The center estimates the most likely effect
- The width shows how precise the estimate is
- The bounds show whether trivial and substantial effects are both plausible
A study with p = 0.04 and a confidence interval from 0.01 to 0.40 is statistically significant and compatible with effects ranging from trivial to substantial. A study with p = 0.10 and a confidence interval from 0.18 to 0.22 is "not statistically significant" but estimates the effect more precisely.
Reading CIs rather than p-values is one of the most leverage-rich shifts in quantitative interpretation.
Subgroup variance and the main-effect trap
A main effect describes the average across subgroups. When subgroups vary substantially, the average can mislead.
Three patterns to watch for:
-
Heterogeneous effects — the intervention works well in one subgroup, not at all in another; the main effect averages these to a modest effect, suggesting the intervention is "modestly effective" when it's actually highly effective for some and useless for others.
-
Simpson's paradox — the aggregate finding reverses when subgroups are examined separately. The classic case: Berkeley graduate admissions in 1973 appeared to disfavor women in aggregate; department-level analysis showed the reverse, because women applied disproportionately to more selective departments.
-
Suppressor effects — a relationship is null in aggregate because two subgroup patterns cancel.
Reporting only the main effect when these patterns are present hides the actual finding.
Ecological fallacy
The ecological fallacy is inferring individual-level relationships from group-level data. A famous case: international suicide rates correlate with religious composition at the national level, but the individual-level relationship between religiousness and suicide can differ in direction.
The implication for interpretation: a relationship found at one level of analysis (county-level rates, neighborhood averages) does not automatically hold at another (individuals within those places). The transdisciplinary discipline is to specify the level of inference and stick to it.
Statistical vs. practical significance
A statistically significant effect at p < 0.001 with an effect size of 0.02 may be trivial for any decision. A non-significant effect with an effect size of 0.4 and wide confidence intervals may suggest a real effect that the study was underpowered to detect.
Translating between statistical and practical significance is interpretive work:
- Magnitude in real-world units
- Cost of achieving the effect
- Comparison to alternative uses of resources
- Variability across stakeholders' definitions of "meaningful"
A finding that's statistically significant but practically trivial should be reported as such, not framed as a policy recommendation.
Honoring uncertainty
Quantitative studies often present results with false precision. A point estimate of 12.3% reads as a precise fact; the underlying CI may be 5% to 19%. Communicating uncertainty honestly serves both the science and the decisions made from it.
Practical moves:
- Report CIs alongside point estimates as a default
- Round to a precision your CI supports (don't report 12.3% if the CI is 5–19%)
- Use language that signals uncertainty ("estimated at," "consistent with") rather than language that hides it ("is")
- For high-stakes decisions, run sensitivity analyses and report the range
False precision is a form of overclaim. Quantitative rigor includes honesty about what the numbers don't know.
Naturalizing the absence of context
A particular kind of error: presenting findings as if context didn't matter when context shaped every step.
- The sample was drawn from a specific setting; what does that mean for generalizability?
- The intervention happened during a specific period; what was happening contextually that might shape outcomes?
- The outcome was measured with a particular instrument; what does that instrument capture and miss?
- The analytic decisions were made by a research team with particular backgrounds; how did those shape what was tested?
Naming context isn't a hedge; it's part of the interpretation. A study that explicitly locates its findings in context is more useful to decision-makers than one that pretends to context-free objectivity.
A worked vignette
A trial of a workplace stress intervention reports a 30% reduction in self-reported stress at 6 months, p < 0.001, in an N of 500 employees at three large tech companies.
A clean statistical interpretation: the intervention reduced stress, significantly.
A contextualized interpretation:
- 30% reduction in self-report scale points — meaningful or trivial? (Anchor against a known minimal clinically important difference.)
- N = 500 from tech companies — generalizable to small-business or low-wage settings? (Likely not; the populations differ on key dimensions.)
- 6-month follow-up — does effect persist at 12 months? 24? (Unknown without longer data.)
- Subgroup variance — did the intervention work equally across job levels, demographics? (Report; if heterogeneous, the main effect oversells.)
The contextualized interpretation is more honest, more useful, and more defensible.
Closing
Statistics describe patterns within data; they don't escape context. Effect sizes need anchors. Confidence intervals beat p-values for interpretation. Subgroup variance can flip main-effect conclusions. Ecological fallacy and Simpson's paradox lurk in level-of-analysis decisions. Statistical and practical significance can diverge in either direction.
Next: qualitative analysis as meaning-making — coding, rigor, and human-centric analysis with software as tool.
Common mistakes
These are the traps learners hit most often on this topic. Knowing them in advance is half the fix.
Reporting effect size without context
A standardized mean difference of 0.3 means different things in different domains. Without comparison anchors and clinical/practical significance, the number is decorative.
Stripping subgroup variance into a main effect
When subgroups vary, the main effect can mislead. Report subgroup estimates and explain why the main effect is or isn't the right summary.
Treating statistical significance as policy significance
A statistically significant effect at p < 0.001 with an effect size of 0.02 may be policy-trivial. Translate magnitude to practical terms before recommending action.
Practice problems
Try each on paper first. Click Show solution only after you've made a real attempt.
- Problem 1Take a published statistical finding and write one sentence translating its magnitude into practical terms.
Show solution
Example: 'OR = 1.15 (95% CI 1.08–1.22) means about 15% higher odds. In a population of 10,000 with a baseline rate of 200, that's about 30 additional cases per year — meaningful for a chronic condition, marginal for a rare acute one.'
- Problem 2Identify a real or hypothetical case where the main effect hides a Simpson's-paradox-style reversal in subgroups.
Show solution
The classic teaching case is Berkeley graduate admissions, where aggregate data suggested gender bias against women, but department-level data showed the reverse — women applied disproportionately to more selective departments. The aggregate hid the selection structure.
Practice quiz
- Question 1What is the ecological fallacy?
- Reflection 2Why might a statistically significant finding still be the wrong policy answer?
Lesson 16 recap
- Effect size always travels with comparison anchors
- Subgroup variance can flip main-effect interpretation
- Statistical significance ≠ practical significance
- Ecological fallacy and Simpson's paradox are level-of-analysis traps
Coming next: Lesson 17 — Qualitative Analysis as Meaning-Making
- Next: qualitative analysis as meaning-making
- Coding and rigor in interpretive work
- Human-centric analysis with software as tool, not thinker
Saved in your browser only — no account, no server.