Edgar Bueno, researcher at the Department of Statistics.

The problem with the approximations that a survey makes about a population, is that we will never know how accurate they were. (Unless the survey includes the whole population, which would be expensive.) With the method that Edgar Bueno is researching, someone who is planning a survey will know how to select a sample of the population that will give a more precise estimation, or require a smaller sample of the population to make a good approximation.

To count on the unknown

Since the approximations for a survey will always be uncertain, it will be desirable to be able to calculate how uncertain they are. The proposed method in Edgar Bueno's research allows to quantify the uncertainty of the approximations, so that one can choose the best sampling design.

The choice of sampling method affects both the required sample size and how precise the results will be. If you, for instance, want to make a survey to find out how many people would vote for Trump or Biden if the election was today, you need to use some quantities that are unknown. It is common to then fall back on previous studies, for proxy values of those quantities.

This is a method that Edgar in his research finds risky and uncertain. Instead, he is adding one more ingredient in the decision process for the survey: A prior distribution, representing how uncertain those proxies are. In this way, the statistician can quantify how risky is to implement each sampling design, and choose the one that minimizes this risk.

How well are the voters represented?

Let's say that a pollster wants to select a sample to estimate the proportion of people who would vote for either Trump or Biden in the forthcoming elections. The pollster might then be considering two possible sampling designs. As a first option, they consider selecting a sample of states with probabilities proportional to the population and then select a simple random sample of potential voters in each selected state. As a second option, the pollster is considering to stratify the states with respect to their population, then selecting a sample of states in each stratum and, finally, selecting a sample of potential voters in each selected state.
 
It is not possible to know which method will yield better results. If the number of voters by state for each candidate was strictly proportional to the population in each state, the first method would be more efficient. If this relation does not hold, the second option may be preferred. Since the pollster does not know which method would be best, Edgar’s proposal can be used by assuming a prior distribution and estimating how certain or uncertain we are about the proportionality of the sample related to the whole population. If the pollster is very uncertain of the relation, they shall assume a large variance. This method will help with computing the risk of each method and then use the method that yields the smaller risk.

Edgar defends his thesis December 4th 2020, at 1 pm. If you are interested in this research and want to know more, you can contact him at edgar.bueno(a)stat.su.se