Skip to main content

Berkson's Bias

Berkson's Bias

Berkson's bias
(also referred to as Berkson's Paradox) refers to a type of systematic error that can occur when using sample data to make inferences about a population. It is named after the statistician Joseph Berkson, who first described the phenomenon in a 1943 paper. Berkson's bias is also related to the phenomenon of collider bias.

Berkson's paradox often manifests as a perceived negative correlation between two desirable traits in a population, for example, the belief that individuals who excel in one area tend to lack skills in another. However, this perception can be misleading because it may be based on incomplete data. In reality, the two traits may not be correlated at all, or they may even have a positive correlation. This paradox occurs when individuals who do not possess either of the desired traits are not equally represented in the sample population.

Innotoon | Berkson's Bias

As an example, consider a model in which a person's income is determined by only two variables: conscientiousness and intelligence. The income of an individual can be represented as income = conscientiousness + intelligence. In this model, if we condition on income and examine the relationship between conscientiousness and intelligence, we may find a negative correlation. For instance, among high-income individuals, we may observe that those who are more intelligent tend to be less conscientious, and vice versa. This seemingly negative correlation may arise because we are considering only a subset of the population (those with high income) and not the population as a whole.

As another example, consider the case of an applicant seeking admission to a prestigious private university in the United States. In order to gain acceptance, an applicant must typically be either extremely intelligent or exceptionally wealthy. In this situation, we might expect to see a negative correlation between intelligence and wealth among the student body of the university as a result of Berkson's paradox.

The bias arises when the sample data are not representative of the population as a whole, due to some sort of selection effect. For example, consider a study that aims to estimate the average income of a population by sampling a subset of individuals and calculating the mean income of the sample. If the sample is not randomly selected from the population but rather consists only of high-income individuals, then the resulting estimate of the mean income will be biased upwards.