Analysing survey data can be tricky. There’s often a mismatch between the characteristics of the survey respondents and those of the general population. If the discrepancies are not accounted for then the survey results can (and generally will!) be misleading.
A common approach to this problem is to weight the individual survey responses so that the marginal proportions of the survey are close to those of the population. Raking (also known as proportional fitting, sample-balancing, or ratio estimation) is a technique for generating the required weights.
A fictional population and an imaginary survey are used to illustrate how survey raking works.
Fictional Population Characteristics
A census gives a breakdown of the fictional population according to three categories: mood, sex and age.
population
mood sex age fraction
1 happy female young 0.0750
2 neutral female young 0.1250
3 grumpy female young 0.0500
4 happy male young 0.0750
5 neutral male young 0.1250
6 grumpy male young 0.0500
7 happy female middle 0.0600
8 neutral female middle 0.1000
9 grumpy female middle 0.0400
10 happy male middle 0.0600
11 neutral male middle 0.1000
12 grumpy male middle 0.0400
13 happy female senior 0.0165
14 neutral female senior 0.0275
15 grumpy female senior 0.0110
16 happy male senior 0.0135
17 neutral male senior 0.0225
18 grumpy male senior 0.0090
To apply raking we only need to know the marginal proportions of these categories, which can be derived from the data above. Use the wpct()
function from the weights
package.
library(weights)
target <- with(population, list(
mood = wpct(mood, fraction),
sex = wpct(sex, fraction),
age = wpct(age, fraction)
))
Store the marginal proportions in a list
, where the elements of the list are named according to variable they’ll match in survey data.
str(target)
List of 3
$ mood: Named num [1:3] 0.3 0.5 0.2
..- attr(*, "names")= chr [1:3] "happy" "neutral" "grumpy"
$ sex : Named num [1:2] 0.505 0.495
..- attr(*, "names")= chr [1:2] "female" "male"
$ age : Named num [1:3] 0.5 0.4 0.1
..- attr(*, "names")= chr [1:3] "young" "middle" "senior"
Imaginary Survey Results
The imaginary survey has ten thousand responses to a binary question. I’m imagining that it’s a survey for an important referendum, gauging the sentiment of the population on a particular proposal.
nrow(survey)
[1] 10000
Here’s a sample of the results.
head(survey)
caseid mood sex age response
1 1 grumpy male young 1
2 2 happy male senior 1
3 3 neutral male senior 1
4 4 happy female young 1
5 5 grumpy male young 1
6 6 happy female young 0
Now let’s take a look at the proportions of the survey characteristics.
wpct(survey$mood)
happy neutral grumpy
0.3308 0.3372 0.3320
wpct(survey$sex)
female male
0.4971 0.5029
wpct(survey$age)
young middle senior
0.3393 0.3341 0.3266
The survey design was such that the levels of each of the survey characteristics were sampled in similar proportion. For sex this works well because the sample proportions are close to those in the population. For mood, however, this means that the proportion of neutral in the survey is less than in the population, while the sample proportion of grumpy is higher than in the population. Similarly, for age the senior category is oversampled in the survey, while the middle and young categories are undersampled.
What’s the result of the survey if we naively just average the responses?
wpct(survey$response)
0 1
0.4277 0.5723
It appears that 57% of the survey respondents were in favour of the proposal. Of course, we know that this is probably not an accurate reflection of the sentiment of the population, because our survey is not an accurate representation of the population demographics.
Let’s use raking to fix that.
Raking
We’ll use the anesrake
package, which implements the ANES (American National Election Study) weighting algorithm. The algorithm, documented by DeBell and Krosnick (Computing Weights for American National Election Study Survey Data, 2009), aims to provide a default approach to survey weighting (there’s no single “right” way to do survey weighting, but this is a decent starting point). It uses an iterative procedure to generate multiplicative weights. The weights are chosen so that the survey marginals agree with the population marginals for a specific set of parameters. In each iteration all parameters are considered in turn. For each parameter weights are adjusted to align the survey marginals with the population marginals. These weights are then used as the starting point for the next parameter.
library(anesrake)
The anesrake()
function has three required parameters:
inputter
- a list of target valuesdataframe
- the survey results andcaseid
- a unique identifier for each respondent in the survey (normally a part of the survey results).
The remaining parameters are optional. The choosemethod
and type
parameters affect the way that parameters are selected for inclusion in the weighting procedure. Ideally you want to choose only those parameters where there is a significant discrepancy between the sample and population proportions.
One important thing to note is that anesrake()
does not play nicely with tibbles. If your data are in a tibble, then convert to a data frame first.
raking <- anesrake(
target,
survey,
survey$caseid,
cap = 5, # Maximum allowed weight per iteration
choosemethod = "total", # How are parameters compared for selection?
type = "pctlim", # What selection criterion is used?
pctlim = 0.05 # Threshold for selection
)
[1] "Raking converged in 7 iterations"
It returns a list with a number of components. Fortunately there’s a summary()
method which gives a quick overview of the results.
raking_summary <- summary(raking)
Which variables were used for weighting?
raking_summary$raking.variables
[1] "mood" "age"
Only mood and age were used. As we observed previously, the sample proportions for sex are already pretty close to those of the population.
Let’s take a look at the specifics for those two variables.
raking_summary$mood
Target Unweighted N Unweighted % Wtd N Wtd % Change in % Resid. Disc. Orig. Disc.
happy 0.3 3308 0.3308 3000 0.3 -0.0308 0 -0.0308
neutral 0.5 3372 0.3372 5000 0.5 0.1628 0 0.1628
grumpy 0.2 3320 0.3320 2000 0.2 -0.1320 0 -0.1320
Total 1.0 10000 1.0000 10000 1.0 0.3256 0 0.3256
The summary data for mood gives the number of cases and proportions for each level in both the unweighted and weighted survey. It also tells us
- how the proportions change between the unweighted and weighted surveys (the
Change in %
column); - the discrepancy between the unweighted and target proportions (the
Orig. Disc.
) — the discrepancy before weighting; and - the discrepancy between the weighted and target proportions (the
Resid. Disc.
) — the residual discrepancy after weighting.
Obviously we want the residual discrepancy to be as small as possible and here we can see that the mood proportions are perfect after weighting.
raking_summary$age
Target Unweighted N Unweighted % Wtd N Wtd % Change in % Resid. Disc. Orig. Disc.
young 0.5 3393 0.3393 5000 0.5 0.1607 1.110223e-16 0.1607
middle 0.4 3341 0.3341 4000 0.4 0.0659 0.000000e+00 0.0659
senior 0.1 3266 0.3266 1000 0.1 -0.2266 -1.387779e-17 -0.2266
Total 1.0 10000 1.0000 10000 1.0 0.4532 1.249001e-16 0.4532
Similarly, for age the proportions are effectively perfect after weighting.
What about sex? Well, although sex was not used in the weighting algorithm, the weightings still have an effect on their proportions.
raking_summary$sex
Target Unweighted N Unweighted % Wtd N Wtd % Change in % Resid. Disc. Orig. Disc.
female 0.505 4971 0.4971 5006.939 0.5006939 0.003593906 0.004306094 0.0079
male 0.495 5029 0.5029 4993.061 0.4993061 -0.003593906 -0.004306094 -0.0079
Total 1.000 10000 1.0000 10000.000 1.0000000 0.007187812 0.008612188 0.0158
The discrepancies in the unweighted results are small and the residual discrepancies after weighting are smaller still (it could have gone the other way too!).
Examining the Weights
It makes sense to inject the weights into the survey data.
survey$weight <- raking$weightvec
What does that look like?
head(survey, 10)
caseid mood sex age response weight
1 1 grumpy male young 1 0.8790030
2 2 happy male senior 1 0.2732412
3 3 neutral male senior 1 0.4531776
4 4 happy female young 1 1.3322952
5 5 grumpy male young 1 0.8790030
6 6 happy female young 0 1.3322952
7 7 neutral male middle 1 1.8035884
8 8 grumpy male senior 1 0.1802753
9 9 grumpy female young 0 0.8790030
10 10 happy female young 0 1.3322952
What is the range of weights?
survey %>% select(mood, age, weight) %>% unique() %>% arrange(weight)
mood age weight
1 grumpy senior 0.1802753
2 happy senior 0.2732412
3 neutral senior 0.4531776
4 grumpy middle 0.7174723
5 grumpy young 0.8790030
6 happy middle 1.0874649
7 happy young 1.3322952
8 neutral middle 1.8035884
9 neutral young 2.2096458
raking_summary$weight.summary
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.1803 0.4532 0.8790 1.0000 1.3323 2.2096
The weights are always adjusted so that the average weight is precisely one. Large weights are likely to be problematic. The cap
parameter to anesrake()
places an upper limit on the weights in each iteration. The largest weight in our survey falls below the default upper limit of 5. It’s always worthwhile checking on the range of weights just so that you are aware of whether they are being capped or not.
Interpreting the Weights
Characteristics that were oversampled in the survey (like grumpy young females) get a weight smaller than one, while those that were undersampled (like middle aged neutral males) get a weight that is larger than one. Note that since sex was not included in the weighting algorithm, it does not influence weight: the weight for a grumpy young female is the same as that for a grumpy young male.
survey %>% group_by(mood) %>% summarise(count = n(), weight = sum(weight))
# A tibble: 3 x 3
mood count weight
<fct> <int> <dbl>
1 happy 3308 3000.
2 neutral 3372 5000.
3 grumpy 3320 2000.
survey %>% group_by(age) %>% summarise(count = n(), weight = sum(weight))
# A tibble: 3 x 3
age count weight
<fct> <int> <dbl>
1 young 3393 5000.
2 middle 3341 4000.
3 senior 3266 1000.
Although the proportions of samples (the count
column) are not consistent with those in the population, the proportions of total weights most definitely are. As long as we interpret the survey results taking into account these weights them we can be reasonably confident that they represent the population and not just the (possibly biased) survey.
Effect on Survey Results
What was the effect of raking on the survey results?
wpct(survey$response, survey$weight)
0 1
0.5340386 0.4659614
Wow! The outcome swings completely the other way. A naive interpretation of the survey results would have been completely misleading. Actually around 53% of the population are opposed to the proposal.
Weighting results in an increase in the variance of means or proportions derived from survey results. We can get an idea of how large this effect is by looking at the design effect.
raking_summary$general.design.effect
[1] 1.422105
This indicates that weighting results in a 42% increase in variance. Unfortunately this means that conclusions are less likely to be statistically significant. However, this is probably a worthwhile trade off: they might not be significant, but at least they will be representative of the population!