Healthy sampling A very brief primer on selection bias for students of #epidemiology

Selection bias can occur when choosing people to participate in the study isn’t random. This creates a study sample that is not representative of the entire population you want to know about.  This systematic error leads to error in your results.

Selection bias can occur when choosing people to participate in the study isn’t random. This creates a study sample that is not representative of the entire population you want to know about.  This systematic error leads to error in your results.

What if your population is segmented, perhaps into people who have diabetes and those that don’t? Or people that are exposed to an important cause of disease like cigarette smoking and those that are not exposed? Can you see that if you pick a sample mostly from the upper left portion of the circle you will overestimate the amount of smoking in your population? And if you are studying a disease strongly associated with smoking, you will end up estimating a higher proportion of disease in your population.   *According to the CDC, prevalence of cigarette smoking among U.S. adults is highest among people living in the Midwest (25.4%), where TheEvidenceDoc is located. https://www.cdc.gov/tobacco/disparities/geographic/index.htm

What if your population is segmented, perhaps into people who have diabetes and those that don’t? Or people that are exposed to an important cause of disease like cigarette smoking and those that are not exposed? Can you see that if you pick a sample mostly from the upper left portion of the circle you will overestimate the amount of smoking in your population? And if you are studying a disease strongly associated with smoking, you will end up estimating a higher proportion of disease in your population.

 

*According to the CDC, prevalence of cigarette smoking among U.S. adults is highest among people living in the Midwest (25.4%), where TheEvidenceDoc is located. https://www.cdc.gov/tobacco/disparities/geographic/index.htm

You can find some examples of biased sampling in the polls on Twitter. Since twitter uses hashtags to group tweets and make it easier to follow certain topics, some pollsters have made use of the hashtag to direct their polls to certain groups of people. If the intent is to accurately measure a population opinion, how will this segmented reach impact the results of their polls and the generalizability of those findings?

You can find some examples of biased sampling in the polls on Twitter. Since twitter uses hashtags to group tweets and make it easier to follow certain topics, some pollsters have made use of the hashtag to direct their polls to certain groups of people. If the intent is to accurately measure a population opinion, how will this segmented reach impact the results of their polls and the generalizability of those findings?