Who Votes?: Demographics and Turnout

Nick Dominguez

2024/10/07

Prediction #5: Demographics and Turnout

October 7, 2024 - We are now officially less than one month until the 2024 Presidential Election. This week, we will investigate the question: how do demographic characteristics and turnout affect vote outcomes?

There has been extensive research to answer this question. For example, one of the earliest papers, Wolfinger and Rosenstone (1980) found that education level in particular is an important factor in determining turnout, using Census data from the early 1970s. Similarly, a decade later, Rosenstone and Hansen used American National Election Studies data to reveal that it is not just education, but the most participatory demographics were white, wealthy, educated voters.

In recent years, there has been discourse as to who increased voter turnout benefits. People often posit that increased voter turnout often benefits Democrats, since the most reliable voting blocs of older, whiter, wealthier Americans tend to vote more with the Republican Party. However, Shaw and Petrocik (2020) refute this claim, finding that increased voter turnout across a large sample of elections does not significantly boost one party or the other. There is also evidence to suggest that while demographic variables do matter, when used to predict whether one will vote for a Democrat or Republican, they are only correct 63% of the time. Therefore, a person’s vote choice, and their decision to even vote, is not solely a product of their demographic background (Kim and Zilinksy 2023).

Nonetheless, it is still a useful exercise to get a sense of how demographic variables affect vote outcomes since they are related to vote outcomes, especially in an increasingly calcified electorate. Predicting vote outcomes from turnout and demographics requires that we know these input variables for the election we’re trying to predict. But, this is a problem, since we do not know the turnout of an election that has not happened. To address this, I estimate 2024 voter turnout, measured by the voting eligible turnout of the highest office (turnout to vote for President) by averaging VEP highest office turnout across the 2012, 2016, and 2020 elections for each state. In other words, I assume that for the 2024 election, voter turnout will be average in each state.

Similarly, while we could get estimates of state demographic composition in 2024 by using samples of the voter file for each state, but these data contain many missing values, and therefore will be difficult to get estimates. Instead, I estimate state demographic characteristics by creating regressions for 20 different demographic variables using data from 2000 to 2020 to predict what state demographic compositions are in 2024. Then, I used Monte Carlo simulations for these predictions to better understand the variability that these values could have, assuming a Gaussian distribution.

As for my predictive model, I used the set of 20 demographic variables (including variables for race, age, and education) and VEP highest office turnout, lagged Democrat two-party vote share from the previous election, as well as a state factor variable to predict Democrat two-party vote share. The training data has a more narrow time span, from 2000 to 2020, because we do not have values for all of these variables for earlier elections. For this model, I used a Random Forest instead of an OLS linear regression model. Random forest models are more robust for prediction and can automatically handle interaction terms, which is useful since demographic groups do not vote uniformly across states. For example, white voters in Southern states tend to vote Republican than white voters in Minnesota or Massachusetts, or how Hispanic voters in California, who are mostly of Mexican descent, vote more for Democrats than Cuban Americans, which are the largest Hispanic group in Florida. The resulting random forest model explains 90.47% of the variance in outcomes, has an in-sample RMSE of 1.49 and a grouped k-fold cross validation out-of-sample RMSE of 6.52. The importance of each variable in the model is shown below.

With the simulated and estimated 2024 data, my Random Forest demographics and turnout model predicts that Democrat Kamala Harris will win the 2024 Presidential Election with 349 electoral votes compared to Republican Donald Trump’s 189 electoral votes. Despite different vote share estimates, this is the same outcome as my incumbency factors model last week. State results are shown in the map below, as well as the range of plausible vote shares for each state. The 95% confidence intervals for each state’s vote share indicates that my model suggests there are scenarios in which either Trump or Harris could carry the most competitive states.

As with each weekly blog post, I must emphasize that these predictions are mere components of a forthcoming final prediction that will integrate all of my individual models together. As evidenced by the literature and the fairly large out-of-sample RMSE, the predictive power of demographics and turnout are limited. Additionally, due to data constraints, I use overall VEP highest office turnout for each state, which does not fully account for turnout differences between groups. In subsequent models I may use election laws and more specific turnout estimates by demographic group to better predict outcomes, to account for the large disparity in voter turnout by demographics. You may notice in my model predictions that many Southern states are predicted to be less Republican in 2024 than 2020. This could be because my model is not fully accounting for partisan differences among racial groups in different states. Southern states like Mississippi, Alabama, Georgia, and Louisiana have high Black populations, but have white populations that are more Republican compared to other states, which may be why my model predicts them to be less Republican in 2024 than 2020. This demographic and turnout model could also be further improved by including polling crosstab averages to better reflect how each demographic group may vote in 2024, and how it may depart from previous elections. If I am able to implement these changes, I would have more confidence in the predictions for the 2024 election that this model provides.