Week Five: Turnout

Grant Williams

2024/10/02

Introduction

In this fifth blog post, I am going to discuss two areas of election forecasting: turnout and demographics.

Then, I am going to prepare a baseline model to predict the 2024 election. Over the next 4 weeks until the November 5th election, I will fine tune this model and ultimately use it to predict the next president of the United States of America.

The code used to produce these visualizations is publicly available in my github repository and draws heavily from the section notes and sample code provided by the Gov 1347 Head Teaching Fellow, Matthew Dardet.

Analysis

The United States’ current priors and general beliefs about turnout and demographics are informed, in part, by longstanding academic literature on the subject.

Two of the most influential publications are Who Votes?, a 1980 book by Professors Wolfinger and Rosenstone, and Mobilization, Participation, and Democracy in America, a 1993 book by Professors Rosenstone and Hansen. Both of these publications popularized theories about the connection between demographics and voter turnout that would permeate US society for decades to come.

Wolfinger and Rosenstone ran OLS regressions on census data between 1972 and 1974 to determine that education was the key demographic variable influencing turnout; age, marital status, and the restrictiveness of voter registration laws also dispalyed high rates of correlation. In 1993, Rosenstone and Hansen expanded upon Wolfinger and Rosenstone’s findings, determining that those most likely to vote tended to be white, wealthy, and educated. They also uncovered, however, using data from the American National Election Studies (ANES) that turnout was highly affected by mobilization efforts within social networks. These two studies gave the strong impression that demographics were of significant relevance to turnout.

This prevailing narrative, however, has faced increased scrutiny in recent years.

Professors Shaw and Petrocik, for example, challenged The Turnout Myth in their 2020 book of the same title, finding no evidence in the past 50 years of presidential election data that higher rates of turnout benefit Democrats, as the conventional narrative would suggest. Instead, Shaw and Petrocik argue that “turnout does not consistently help either party” (Shaw & Petrocik 2020).

Another two professors, using logistic regressions and random forest models on demographic data from the American National Election Studies (ANES) between 1952 and 2020, observed results that similarly pour cold water on assumptions about demographics’ high predictiveness of turnout. Leveraging public opinion surveys, Professors Kim and Zilinsky determined that predictions using the demographic “variables of age, gender, race, education, and income” exhibited less than 64% accuracy out-of-sample, regardless of whether the predictions were made with a random forest or a logistic regression model. Including party identification, however, improves the accuracy by between 20 and 30 percentage points. The improvement possible by including even all of the additional covariates found in a voter file (marital status, homeownership, etc.), at this point, is fairly marginal.

For these reasons, in my first electoral college and national popular vote model, I am not going to explicitly include demographic variables. Instead, I will consider polling averages and fundamental economic conditions. In future weeks, I hope to include additional analysis from voter files and more explicitly model turnout and demographics at the state level, but, for this first week, I will start with something simpler.

My Model

Both Sabato’s Crystal Ball of the Center for Politics and the Cook Political Report list the same seven states as “toss-ups.” These include the following:

While it is not inconceivable that other states/districts could unexpectedly flip (Florida, Ohio, Nebraska 2nd district, Virgina, Texas, etc), it is unlikely that one of these states/districts would ‘decide’ the election. If Florida were to go blue, for example, other more competitive states would have likely gone blue as well, clinching the election for Harris. While there exist realities where Texas or Florida or Ohio could be the tipping point of the presidential election, for the purposes of this week’s blog post, I will focus on the seven most commonly cited battleground states.

With this assumption in place, assuming other states and districts vote as they did in 2020, the base electoral map for 2024 looks as follows:

As we can see, this election cycle is incredibly competitive. 93 electoral votes reside in the seven toss-up states. Neither the Democrats nor the Republicans can claim a clear edge in the electoral college.

Preparing My Electoral College Model

Using state-level polling average data since 1980 from FiveThirtyEight and national economic data from the Federal Reserve Bank of St. Louis, I construct an elastic net model that uses the following fundamental and polling features:

There are only 19 states for which we have polling averages for 2024. These 19 states include our 7 most competitive battleground states, a few other more competitive states, and a handful of non-competitive states (California, Montana, New York, Maryland, Missouri, etc.)

We will train a model using all of the state-level polling data that we have access to since 1980, and then test this data on our 19 states on which we have 2024 polling data. We can then evaluate how sensible the predictions are given what we know about each state.

statepredicted_R_pv2ppredicted_D_pv2ppred_winner
Arizona51.0297048.97038R
California35.7888164.21084D
Florida52.6468247.35331R
Georgia50.8736249.12645R
Maryland33.1785066.82108D
Michigan49.2004750.79956D
Minnesota46.9777953.02218D
Missouri57.5115842.48869R
Montana59.7923140.20801R
Nevada49.6588850.34117D
New Hampshire46.2930653.70689D
New Mexico45.4909554.50897D
North Carolina50.7301549.26993R
Ohio55.3095844.69062R
Pennsylvania49.9365450.06351D
Texas53.8434446.15672R
Virginia46.3499453.65000D
Wisconsin49.4759650.52409D
New York44.4321655.56768D

Here, we can see that, apart from Arizona, Pennsylvania, and Georgia, all of the 19 states on which we have data are projected to vote for the same party they did in 2020. This should give us some confidence in the accuracy of our model as it is in line with the historical behavior of the states.

I will now use a simulation to get an estimate of how confident we are in these results. I will do this by sampling new state-level polling measurements for each of our 19 states 10,000 times, assuming a normal distribution around the current polling values with a standarad deviation of two percentage points.

Doing so yields the following table.

StateD Win Percentage
Arizona29.74
California100.00
Florida6.94
Georgia33.20
Maryland100.00
Michigan74.75
Minnesota97.69
Missouri0.00
Montana0.00
Nevada63.17
New Hampshire99.37
New Mexico99.83
New York99.92
North Carolina41.66
Ohio0.05
Pennsylvania57.53
Texas1.27
Virginia99.01
Wisconsin72.99

As we can see, the seven battleground states exhibit much more uncertainty than the other states. California, for example, does not vote red in a single simulation, and even Florida votes blue less than 7% of the time in our simulations. I will use the Democratic win percentages for the battleground states to estimate whether they will vote blue or red in 2024.

Projections

Using this model, our ultimate electoral college would look as follows, with Vice President Kamala Harris narrowly squeaking out a win.

If we also wanted to model the national popular vote, we could use what we did in Week 3, using an elastic net on both fundamental and polling data, weighting such that the polls closer to November matter more. This was Nate Silver’s approach.

Doing so, we find that the Democrats are projected to have a narrow lead in the two-party popular vote nationally (after scaling so that the estimates sum to 100%).

## Democrat two-party vote share:  50.86 %
## Republican two-party vote share:  49.14 %

Citations:

Kim, Seo-young Silvia, and Jan Zilinsky. 2021. “The Divided (But Not More Predictable) Electorate: A Machine Learning Analysis of Voting in American Presidential Elections.” APSA Preprints. doi: 10.33774/apsa-2021-45w3m-v2. This content is a preprint and has not been peer-reviewed.

Rosenstone, Steven J., and John Mark Hansen. Mobilization, Participation, and Democracy in America. Macmillan Pub. Co: Maxwell Macmillan Canada: Maxwell Macmillan International, 1993.

Shaw, Daron, and John Petrocik. The Turnout Myth: Voting Rates and Partisan Outcomes in American National Elections. 1st ed., Oxford University Press, 2020, https://doi.org/10.1093/oso/9780190089450.001.0001.

Wolfinger, Raymond E., and Steven J. Rosenstone. Who Votes? Yale University Press, 1980.

Data Sources:

Data are from the US presidential election popular vote results from 1948-2020, state-level polling data for 1980 onwards, and economic data from the St. Louis Fed.