"

Chapter 8. Probability and Risk

§2 Statistical Generalizations

A statistical generalization is an inductive argument that moves from a premise about a sample (a subset) to a conclusion about a population (the whole). In the philosophy of science, this is known as enumerative induction. Because we cannot poll every person on Earth or test every drop of water in an ocean, we rely on the logic of sampling to make claims about reality.


2.1 The Logic of the Sample: Fisher vs. Neyman

The “Reasonable Person” evaluates a generalization based on the quality of the inference. Modern statistics is built on a debate between two schools of thought regarding how we justify these inferences:

  • Ronald A. Fisher (Null Hypothesis): Fisher argued that we cannot “prove” a generalization is true; we can only prove that the data is highly unlikely if the “null hypothesis” (the idea that there is no effect or difference) were true. This is a form of Falsificationism.

  • Jerzy Neyman & Egon Pearson (Decision Theory): They argued that statistical generalizations are about managing two types of errors:

    • Type I Error (False Positive): Claiming the population has a trait when it doesn’t.

    • Type II Error (False Negative): Claiming the population doesn’t have a trait when it does.

2.2 Three Criteria for a Strong Generalization

To avoid the Fallacy of Hasty Generalization (Chapter 4), a sample must meet three rigorous philosophical and mathematical standards:

  1. Sample Size (The Law of Large Numbers): As the size of a random sample increases, its mean gets closer to the mean of the whole population. A small sample is “noisy” and easily skewed by outliers.

  2. Representativeness (Stratification): A sample must be a “microcosm” of the population. It must share the same relevant characteristics (age, gender, socioeconomic status) as the group being studied.

  3. Randomness (Eliminating Selection Bias): Every individual in the population must have an equal probability of being selected. Without randomness, we fall victim to Selection Bias, where the method of choosing the sample pre-determines the result.

2.3 The Margin of Error and Confidence Levels

In the logic of sampling, we never claim “100% certainty.” Instead, we use two parameters to define our Epistemic Modality:

  • The Margin of Error (Precision): This defines the “range” of the truth. If a poll says 40% of people support a policy with a +/- 3% margin, the logical claim is that the truth lies between 37% and 43%.

  • The Confidence Level (Reliability): This is the probability that the sample results would fall within the margin of error if the study were repeated. In social sciences, the standard is usually a 95% Confidence Level.

2.4 The Fallacy of Anecdotal Evidence

The opposite of a statistical generalization is the Anecdotal Evidence fallacy. This occurs when someone uses a single “vivid” story to refute a statistical trend.

  • The Logic: “My grandfather smoked three packs a day and lived to be 100, so smoking isn’t dangerous.”

  • The Critical Counter: A single data point is not a sample. While the grandfather is a “real” data point, he is a statistical outlier that does not negate the broad, representative trend found in a large-scale population study.


§2 Summary Table: Evaluating the Strength of Induction

Criterion Logical Goal Failure Mode
Randomness Neutrality Selection Bias (The “Self-Selection” Trap)
Representativeness Similarity Biased Sample (The “Library” Trap)
Sample Size Stability Hasty Generalization (The “Small N” Trap)
Margin of Error Precision False Certainty (Ignoring the “Range”)

License

Icon for the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

How to Think For Yourself Copyright © 2023 by Rebeka Ferreira is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, except where otherwise noted.