Healthy sampling A very brief primer on selection bias for students of #epidemiology

Selection bias can occur when choosing people to participate in the study isn’t random. This creates a study sample that is not representative of the entire population you want to know about.  This systematic error leads to error in your results.

Selection bias can occur when choosing people to participate in the study isn’t random. This creates a study sample that is not representative of the entire population you want to know about.  This systematic error leads to error in your results.

What if your population is segmented, perhaps into people who have diabetes and those that don’t? Or people that are exposed to an important cause of disease like cigarette smoking and those that are not exposed? Can you see that if you pick a sample mostly from the upper left portion of the circle you will overestimate the amount of smoking in your population? And if you are studying a disease strongly associated with smoking, you will end up estimating a higher proportion of disease in your population.     *According to the CDC, prevalence of cigarette smoking among U.S. adults is highest among people living in the Midwest (25.4%), where TheEvidenceDoc is located.  https://www.cdc.gov/tobacco/disparities/geographic/index.htm

What if your population is segmented, perhaps into people who have diabetes and those that don’t? Or people that are exposed to an important cause of disease like cigarette smoking and those that are not exposed? Can you see that if you pick a sample mostly from the upper left portion of the circle you will overestimate the amount of smoking in your population? And if you are studying a disease strongly associated with smoking, you will end up estimating a higher proportion of disease in your population.

 

*According to the CDC, prevalence of cigarette smoking among U.S. adults is highest among people living in the Midwest (25.4%), where TheEvidenceDoc is located. https://www.cdc.gov/tobacco/disparities/geographic/index.htm

You can find some examples of biased sampling in the polls on Twitter. Since twitter uses hashtags to group tweets and make it easier to follow certain topics, some pollsters have made use of the hashtag to direct their polls to certain groups of people. If the intent is to accurately measure a population opinion, how will this segmented reach impact the results of their polls and the generalizability of those findings?

You can find some examples of biased sampling in the polls on Twitter. Since twitter uses hashtags to group tweets and make it easier to follow certain topics, some pollsters have made use of the hashtag to direct their polls to certain groups of people. If the intent is to accurately measure a population opinion, how will this segmented reach impact the results of their polls and the generalizability of those findings?

A Powerball® ticket and relative vs absolute estimates of disease

This week's Powerball® mania seems a good time to talk about relative and absolute estimates of disease. This epi professor learned from her students years ago (especially from a bartender and professional gambler) that real world examples can be useful to explain epidemiology and biostatistic methods.

Got yours?

Got yours?

Do you have a lottery ticket for Saturday's draw? I'll admit to an occasional purchase when the payout is high, just for the entertainment value of having a ticket in hand in the company of friends as the numbers are called out. But only the one set of numbers. For after all, my absolute risk of winning the big payout, according to Powerball® is just 1 in 175,223,510 or 0.000000005707. If I buy 10 tickets, it's now just 10 in 175,223,510 or 0.00000005707, a barely detectable increase.  But according to news reports, that doesn't stop people from buying 1,000 or more tickets to increase their chance of winning.

This lottery ticket example provides an easy example of how changing my relative odds or relative chance of winning - a ten fold increase by buying 10 instead of 1 ticket - doesn't really change my absolute chance of winning by any appreciable amount because the underlying chance of winning is such a rare event.

So why are epidemiologists like me so enamored with relative disease estimates and seemingly less enamored with absolute disease estimates?

These different measurements serve different purposes.

Relative Disease Estimates

One of the big goals of epidemiology is to study patterns of disease and health and in so doing, to discover associations that may be causal relationships. So we do research like the studies of Sir Richard Doll that uncovered cigarette smoking as a cause of lung cancer, at a time when the suspected major cause was believed to be industrial pollution.  Or like the research that uncovered occupational exposure to vinyl chloride as a cause of angiosarcoma of the liver.  These kinds of studies compare the occurrence of disease in persons with exposure to those without. Through these relative comparisons, epidemiologists demonstrate 1)strength of association by comparing exposed to unexposed people and 2)dose response by measuring increasing occurrence of disease in persons with increasing levels of exposure. These represent two of HIll's causal criteria, with Hill's Causal Criteria being one of the logic frameworks for assessing causality. We'll talk more about Hill in context in a later blog. For now, we are simply explaining the relative importance of relative measures in an epidemiologist's armamentarium.  Epidemiologists like to discover causal relationships, after all, Sir Richard Doll is famous, well, at least among us epidemiologists.

Of course, it's pretty obvious that having a ticket is causally related to winning. Your odds of winning with no ticket are zero. But there isn't a dose response; the person with the most tickets isn't guaranteed to win it.

So a causal path, or analytic framework for winning the lottery is having at least one ticket.

Absolute disease estimates

Absolute estimates provide an assessment of the frequency of a disease or condition in a population.  Absolute estimates are an important tool for health policy and planning when determining where to place limited resources.  Likewise, they should be used by people with limited resources when deciding how many tickets to buy or even whether to play the lottery.

Your doctor will balance information from relative and absolute estimates, which is important in direct patient care, particularly in shared decision making when there are care choices. We'll need to spend a whole blog on that topic. But our goal for now is to start with a clear picture of the difference between relative and absolute estimates.  Got it?

TheEvidenceDoc May 2013