Chapter 7. Inductive Arguments and Statistics
§2 Generalization and Sampling
The most common form of inductive reasoning is the Inductive Generalization. This involves taking a property found in a small group (the Sample) and attributing it to a larger group (the Target Population). To ensure the “inductive leap” is logically sound, we must adhere to the mathematical and philosophical principles of sampling theory.
2.1 The Law of Large Numbers (LLN)
In probability theory, the Law of Large Numbers is a theorem that describes the result of performing the same experiment a large number of times.
-
The Principle: As a sample size grows, its mean gets closer to the average of the whole population.
-
Philosophical Significance: Jakob Bernoulli, who proved the theorem in Ars Conjectandi (1713), argued that even if we cannot know the “hidden” nature of a population, we can arrive at the truth through persistent observation. For the critical thinker, this means that Size Matters: a small sample is statistically “noisy” and prone to the fallacy of Hasty Generalization.
2.2 The Problem of Bias: Representative Samples
Even a large sample can fail if it is not representative. A sample is representative only if it possesses all the relevant characteristics of the target population in the same proportions.
-
Selection Bias: This occurs when the method of choosing the sample systematically excludes certain members of the population. A classic academic example is the Literary Digest poll of 1936, which predicted Alf Landon would defeat FDR. Despite a massive sample of 2 million people, the poll was biased because it drew from telephone listings and car registrations—at a time when only the wealthy (who tended to vote Republican) owned such things.
-
Stratified Random Sampling: To combat bias, statisticians use stratification. If a target population is 60% women and 40% men, the sample must be intentionally structured to mirror those percentages.
2.3 Margin of Error and Confidence Levels
In academic statistics, no generalization is presented as an absolute. Instead, it is accompanied by two metrics of “honesty”:
-
Margin of Error: The range (e.g., $+/- 3\%$) within which the true population value is expected to fall.
-
Confidence Level: The probability (usually 95%) that if we repeated the sample, the results would fall within that margin of error.
Philosophically, these metrics represent Intellectual Humility. They acknowledge that induction provides a “window” into the truth, not a mirror of it.