Prediction #5: Demographics and Turnout
October 7, 2024 - We are now officially less than one month until the 2024 Presidential Election. This week, we will investigate the question: how do demographic characteristics and turnout affect vote outcomes?
There has been extensive research to answer this question. For example, one of the earliest papers, Wolfinger and Rosenstone (1980) found that education level in particular is an important factor in determining turnout, using Census data from the early 1970s. Similarly, a decade later, Rosenstone and Hansen used American National Election Studies data to reveal that it is not just education, but the most participatory demographics were white, wealthy, educated voters.
In recent years, there has been discourse as to who increased voter turnout benefits. People often posit that increased voter turnout often benefits Democrats, since the most reliable voting blocs of older, whiter, wealthier Americans tend to vote more with the Republican Party. However, Shaw and Petrocik (2020) refute this claim, finding that increased voter turnout across a large sample of elections does not significantly boost one party or the other. There is also evidence to suggest that while demographic variables do matter, when used to predict whether one will vote for a Democrat or Republican, they are only correct 63% of the time. Therefore, a person’s vote choice, and their decision to even vote, is not solely a product of their demographic background (Kim and Zilinksy 2023).
Nonetheless, it is still a useful exercise to get a sense of how demographic variables affect vote outcomes since they are related to vote outcomes, especially in an increasingly calcified electorate. Predicting vote outcomes from turnout and demographics requires that we know these input variables for the election we’re trying to predict. But, this is a problem, since we do not know the turnout of an election that has not happened. To address this, I estimate 2024 voter turnout, measured by the voting eligible turnout of the highest office (turnout to vote for President) by averaging VEP highest office turnout across the 2012, 2016, and 2020 elections for each state. In other words, I assume that for the 2024 election, voter turnout will be average in each state.
Similarly, while we could get estimates of state demographic composition in 2024 by using samples of the voter file for each state, but these data contain many missing values, and therefore will be difficult to get estimates. Instead, I estimate state demographic characteristics by creating regressions for 20 different demographic variables using data from 2000 to 2020 to predict what state demographic compositions are in 2024. Then, I used Monte Carlo simulations for these predictions to better understand the variability that these values could have, assuming a Gaussian distribution.
As for my predictive model, I used the set of 20 demographic variables (including variables for race, age, and education) and VEP highest office turnout, lagged Democrat two-party vote share from the previous election, as well as a state factor variable to predict Democrat two-party vote share. The training data has a more narrow time span, from 2000 to 2020, because we do not have values for all of these variables for earlier elections. For this model, I used a Random Forest instead of an OLS linear regression model. Random forest models are more robust for prediction and can automatically handle interaction terms, which is useful since demographic groups do not vote uniformly across states. For example, white voters in Southern states tend to vote Republican than white voters in Minnesota or Massachusetts, or how Hispanic voters in California, who are mostly of Mexican descent, vote more for Democrats than Cuban Americans, which are the largest Hispanic group in Florida. The resulting random forest model explains 90.47% of the variance in outcomes, has an in-sample RMSE of 1.49 and a grouped k-fold cross validation out-of-sample RMSE of 6.52. The importance of each variable in the model is shown below.
With the simulated and estimated 2024 data, my Random Forest demographics and turnout model predicts that Democrat Kamala Harris will win the 2024 Presidential Election with 349 electoral votes compared to Republican Donald Trump’s 189 electoral votes. Despite different vote share estimates, this is the same outcome as my incumbency factors model last week. State results are shown in the map below, as well as the range of plausible vote shares for each state. The 95% confidence intervals for each state’s vote share indicates that my model suggests there are scenarios in which either Trump or Harris could carry the most competitive states.