-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathMarkdown.Rmd
228 lines (165 loc) · 10.2 KB
/
Markdown.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
---
#title: "Stamped out: In-Person Voting and Covid"
output: html_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(glmnet)
library(dplyr)
library(readr)
library(lfe)
library(radiant.data)
library(data.table)
fips_df <- read_csv("fips_df.csv")
state_subset <- read_csv("state_subset.csv")
fips_df %>%
right_join(state_subset) %>%
mutate(pre_week_rate = pre_week_diff/pop_2018*100000)-> state_subset
df = state_subset
df = data.table(df)
```
## Stamped Out: In-Person Voting and Covid
I begin by presenting the results of the model and then end with a variety of robustness checks for the models.
### 1. Model Results
#### 1.1 Initial Model & Variable Descriptions
Initial model is presented below:
```{r}
model_turnout <- lm(post_election_rate ~ turnout + pre_election_rate + num_prec_pc + state + med_age +
medIncome_2018 + pctPovty_2018 + unplmtRate_2018 + noins_u65_2018 +
pctUrban_2010 + pct_vet + avg_householdsize + pct_snap + pctno_hs +
pctcollege + popDens + pct_black + pct_hisp + pct_asian + pct65plus +
pct4064 + pct1840 + pctTrump_2016 + pre_week_rate,
data = df, weights = pop_2018)
summary(model_turnout)
```
The outcome variable is `post_election_rate`, which is the number of covid cases per 100,000 in the county between November 16th and November 30th. The variable of interest is `turnout`, which is the proportion of a county's residents who cast a vote in-person on Election Day (November 3rd, 2020).
I control at the county-level for:
* race [`pct_black`, `pct_hisp`, `pct_asian`],
* education [`pctno_hs`, `pctcollege`],
* age [`med_age`, `pct65plus`, `pct4064`, `pct1840`]
* socioeconomic status [`medIncome_2018`, `pctPovty_2018`, `unplmtRate_2018`],
* urban [`pctUrban_2010`],
* population density [`popDens`],
* Trump vote in 2016 [`pctTrump_2016`],
* the number of precincts per capita [`num_prec_pc`],
* health insurance enrollment [`noins_u65_2018`],
* state fixed effects [stateSTATENAME],
* and other demographic co-variates like the proportion of veterans [`pct_vet`],
* the average household size [`avg_householdsize`],
* and the proportion of county population using SNAP (food stamps) benefits [`pct_snap`].
In addition, I control for a few covid incidence factors:
* `pre_week_rate`: the relative change in covid rates in the 2 weeks leading up to the election. This accounts for whether covid was increasing or decreasing in a county leading up to election day.
* `pre_election_rate`: the number of covid cases per 100,000 people in a county between October 16 and October 30, before the election
I find significance for `turnout` along with a host of other variables.
These results suggest that a 1% increase in in-person turnout on Election Day in a county is associated with a
5.08 case increase per 100,000 people.
#### 1.2 Simplified Models
I then simplified the model using AIC to obtain a model incorporating the most important predictors.
In addition, I standardize the co-variates.
```{r, echo = FALSE}
for (var in c('turnout', 'pre_election_rate', 'num_prec_pc', 'pctUrban_2010', 'pct_snap',
'pct_hisp', 'pctTrump_2016', 'pre_week_rate')) {
df <- df[!is.na(get(var))]
df[, (paste0('std_', var)) := ((get(var)) - (weighted.mean(get(var), pop_2018))) /
(weighted.sd(get(var), pop_2018))]
}
fixed_fx <- felm(post_election_rate ~ std_turnout + std_pre_election_rate + std_num_prec_pc + std_pctUrban_2010 +
std_pct_snap + std_pct_hisp + std_pctTrump_2016 + std_pre_week_rate | state, df, weights = df$pop_2018)
coefficient_plot <- cbind(fixed_fx$coefficients, fixed_fx$se)
coefficient_plot <- cbind(fixed_fx$coefficients, fixed_fx$coefficients - fixed_fx$se, fixed_fx$coefficients + fixed_fx$se)
colnames(coefficient_plot) <- c('mean_estimate', 'low_estimate', 'high_estimate')
rownames(coefficient_plot) <- c('in_person_voting_rate_above_state_avg',
'pre_election_covid_cases_per_100k', 'precincts_per_capita',
'urban_pct', 'food_stamp_pct', 'hispanic_pct', 'trump_vote_2016', 'pre_election_trend')
```
```{r}
coefficient_plot
```
Above, I present confidence intervals for the change in covid rates for a one-standard deviation
increase in the co-variate. These results suggest that a one-standard deviation increase in the rate of
in-person voting above the state average is associated with 58 additional covid cases
per 100,000 people.
```{r}
model_turnout = felm(post_election_rate ~ turnout + std_pre_election_rate + std_num_prec_pc +
std_pctUrban_2010 + std_pct_snap + std_pct_hisp + std_pctTrump_2016 +
std_pre_week_rate | state, data = df, weights = df$pop_2018)
summary(model_turnout)
```
These results, with turnout un-standardized, are above. Now, with the model simplified to the most predictive co-variates the effect estimate is 4.95 additional covid cases per 100,000 for a 1% increase in the proportion of a county's residents who voted in-person on Election Day. This is consistent with the full model presented above that found an additional 5.08 covid cases per 100,000.
### 2. Robustness
I then perform a variety of robustness checks to ensure turnout is really a significant predictor.
#### 2.1 State Sensitivity Checks
First, I remove each state in my data, one at a time, to guarantee no single state is
driving these results. For each of the ensuing regressions, I incorporate `pre_election_rate`
using an offset term to more closely match traditional difference in differences.
```{r, echo = FALSE}
state_list = as.character(unique(df$state))
for(state_ex in state_list){
df_temp = df %>% filter(state != state_ex)
model = lm(post_election_rate ~ turnout + num_prec_pc + pre_week_rate + state +
med_age + medIncome_2018 + pctPovty_2018 + unplmtRate_2018 + noins_u65_2018 +
pctUrban_2010 + pct_vet + avg_householdsize + pct_snap + pctno_hs + pctcollege +
popDens + pct_black + pct_hisp + pct_asian + pct65plus + pct4064 +
pct1840 + pctTrump_2016 + offset(pre_election_rate), data = df_temp, weights = pop_2018)
print(summary(model)[[5]][2,])
}
```
I find significance for the turnout variable in each of the 20 regressions.
Furthermore, every regression provides an estimate at least as large as 3.75 additional covid cases per 100,000 for each 1% increase in in-person voting on Election Day. Under one specification, the effect is as large as 6.23 additional covid cases per 100,000.
#### 2.2 LASSO Variable Selection
I then use LASSO to select variables with lambda cross-validated.
```{r}
set.seed(07102021)
df = df[complete.cases(df),]
df %>% select(-c(post_covid_cases, pop_2018)) -> X
X = X[,-c(2:10)]
y = df$post_covid_cases
X_test = model.matrix(post_covid_cases ~.-1 - county - FIPS - std_pre_week_rate - state_pre_rate -
state_post_rate - state_inperson_rate - std_pre_election_rate - pre_election_rate -
std_turnout - std_num_prec_pc - std_pctUrban_2010 - std_pct_snap -
std_pct_hisp - std_pctTrump_2016 - std_pre_week_rate + offset(pre_election_rate),data = df)
X_test = X_test[,-c(21:26,47)]
cvfit = cv.glmnet(X_test,y,alpha = 1, weights = X_test[,3])
lasso_mod = glmnet(X_test, y, alpha = 1, weights = df$pop_2018, lambda = cvfit$lambda.min)
coef(lasso_mod)
```
The results when LASSO selects variables are largely the same. These results suggest that there
are 4 additional covid cases per 100,000 people for a 1% increase in the proportion of people who voted
in person on Election Day.
#### 2.3 County Temperature Checks
I then incorporate October and November temperatures in a county to ensure that these results
are not weather-dependent.
```{r, echo = FALSE, message = FALSE}
temp <- read_csv("temperatures.csv")
df %>% left_join(temp) -> df_temp
model_temp <- lm(post_election_rate ~ turnout + num_prec_pc + state + med_age +
medIncome_2018 + pctPovty_2018 + unplmtRate_2018 + noins_u65_2018 +
pctUrban_2010 + pct_vet + avg_householdsize + pct_snap + pctno_hs +
pctcollege + popDens + pct_black + pct_hisp + pct_asian + pct65plus +
pct4064 + pct1840 + pctTrump_2016 + pre_week_rate + temp_oct + temp_nov +
offset(pre_election_rate), data = df_temp, weights = pop_2018)
summary(model_temp)
```
I retain significance and there are now 3.58 additional covid cases for a 1% increase in the proportion of people
who voted in-person on Election Day. When I instead control for the change in temperature,
rather than both October and November temperatures, the coefficient for in-person voting on Election Day is 4.85.
#### 2.4 Placebo Check
Lastly, as a placebo check, I perform this same analysis, but on a different time period that does not include the day of the 2020 General Election. I now define the pre-election period as September 16 to September 30 and the "post-election" period as October 16 to October 30
(rather than October 16 to October 30 and November 16 to November 30, respectively).
With this earlier time period, `turnout` is no longer significant. The direction of the effect is also opposite.
```{r,echo = FALSE, message = FALSE}
##' Placebo check with old date range to confirm turnout is not significant
county_covid_placebo = read_csv("county_covid_placebo.csv")
df %>%
select(-c(pre_covid_cases, post_covid_cases, pre_week_diff)) %>%
left_join(county_covid_placebo, by = c('FIPS' = 'fips')) %>%
mutate(pre_election_rate = pre_covid_cases/pop_2018*100000,
post_election_rate = post_covid_cases/pop_2018*100000,
pre_week_rate = pre_week_diff/pop_2018*100000) -> df_sept
model_sept = lm(post_election_rate ~ turnout + num_prec_pc + pre_week_rate + state +
med_age + medIncome_2018 + pctPovty_2018 + unplmtRate_2018 + noins_u65_2018 +
pctUrban_2010 + pct_vet + avg_householdsize + pct_snap + pctno_hs + pctcollege +
popDens + pct_black + pct_hisp + pct_asian + pct65plus + pct4064 +
pct1840 + pctTrump_2016 + offset(pre_election_rate), data = df_sept, weights = pop_2018)
summary(model_sept)
```