Week 6: Bayesian Approach

Grant Williams

2024/10/12

Introduction

In this sixth blog post, I am going to discuss the role that ads play in elections, and, then, I will discuss how a Frequentist approach to polling compares to a Bayesian approach.

I will also be updating my model from last week.

The code used to produce these visualizations is publicly available in my github repository and draws heavily from the section notes and sample code provided by the Gov 1347 Head Teaching Fellow, Matthew Dardet.

Analysis

After reading that the Harris campaign had reached over $1 billion in campaign donations,, I did a deep dive into campaign advertising throughout history.

By using data from the Wesleyan Media Project, I was able to visualize the tone of television advertisements for the presidential elections between 2000 and 2012.

As is visible in the graph above, the election years between 2000 and 2012 saw a variety of tones within advertisements. The 2012 election cylce appeared to be pretty heated given the high incidences of “attack[ing]” tones among both candidates’ advertisements (as classified by the Wesleyan Media Project).

I will now prepare another visualization of the content of political advertisements from the same source, this time including 2016. Publicly available data only exists up until 2012, so I am using non-public data to provide the estimates for 2016.

Immediately striking from this graph is the high incidence of “personal” content from the Democratic aisle in 2016. Many have noted this as the Clinton campaign’s most significant mistake: her insistence on criticizing the language, behavior, and character of Trump to voters at the potential expense of clearly articulating and evidencing her policy positions

The following graph explores the 2012 election and, for a variety of topics, the breakdown of the percentage of ads discussing those topics aired by each party.

Though this election took place in 2012 in a pre-MAGA America, many of the basic dynamics between the Democratic and Republican parties still remain. For example, Republicans remain more likely to air ads on crime and Democrats more likely to air ads on child care, though it is interesting that immigration ads appear evenly split between both parties — a subject that has become much more partisan and racially charged since 2012.

Now, I am going to prepare two more graphs that evaluate campaigns’ election spending.

From these two graphs, we can observe that campaigns spend immense amounts of money on advertising and that this expense only increases as the election date nears. As reported by Open Secrets, TV ads are the single-largest expense of presidential campaigns, and the cost of presidential elections has only ballooned in recent cycles. The cost of the 2020 presidential election was near $5.7 billion (Open Secrets). The bulk of this spending is also concentrated in more competitive swing states.

Given the sheer volume of money that is spent on presidential elections, I am interested in constructing a regression to measure if there is any statistically significant relationship between campaign spending and two-party vote share. I will focus on the Democratic aisle between 2008 and 2020 using campaign spending data from the FEC.

Effect of Campaign Spending on Democratic Vote Share
Dependent variable:
D
(1)(2)(3)
Log(Contribution Amount)4.659***1.0910.343
(0.460)(0.678)(1.234)
State Fixed EffectsNoYesYes
Year Fixed EffectsNoNoYes
Observations200200200
R20.3410.9380.959
Adjusted R20.3380.9180.944
Note:*p<0.1; **p<0.05; ***p<0.01

While this is admittedly a very rough regression table, it is still telling that, even before controlling for time and entity fixed effects, the effect of campaign spending on democratic vote share is exceptionally minimal. And, once we have considered these two fixed effects, the effect of campaign spending is no longer statistically significant. This isn’t to suggest that advertisement spending is not consequential — it more likely evidences how campaign spending is like an arms race where the spending of one party is negated by the spending of the other.

Improving My Electoral College Model

Last week, I constructed an elastic model of the 2024 election using both fundamental and polling data.

This week, I will modify this model by exploring a Bayesian linear model in addition to the frequentist elastic net model. My elastic net model, this week, will be slightly different too as I will only consider the polling data from the past 8 weeks and I will not simultaneously predict both Republican and Democratic vote share. I am only considering polling data from the past 8 weeks as I believe constructing an “average polling average” for weeks when Biden was the nominee or before Harris had been cemented as the nominee could introduce inaccuracies to the projection. The Bayesian linear regression model will assume that the two-party Democratic vote share is normally distributed around the mean as calculate by the linear combination of the same variables initially included in the elastic net, and, then, I will construct a posterior distribution using Markov Chain Monte Carlo before ultimately offering a final prediction.

As was the case last week, I will use state-level polling average data since 1980 from FiveThirtyEight and national economic data from the Federal Reserve Bank of St. Louis. I will construct an elastic net model that uses the following fundamental and polling features:

There are only 19 states for which we have polling averages for 2024. These 19 states include our 7 most competitive battleground states, a few other more competitive states, and a handful of non-competitive states (California, Montana, New York, Maryland, Missouri, etc.)

We will train a model using all of the state-level polling data that we have access to since 1980, and then test this data on our 19 states on which we have 2024 polling data. We can then evaluate how sensible the predictions are given what we know about each state.

Here are the results from our elastic-net model:

statesimp_pred_demwinner
arizona49.75443Republican
california61.33980Democrat
florida47.73111Republican
georgia50.29620Democrat
maryland64.73352Democrat
michigan50.83720Democrat
minnesota53.06472Democrat
missouri44.17112Republican
montana42.55932Republican
nevada50.19077Democrat
new hampshire53.95817Democrat
new mexico53.16381Democrat
new york56.56808Democrat
north carolina49.65808Republican
ohio45.64163Republican
pennsylvania50.40565Democrat
texas47.30844Republican
virginia53.57639Democrat
wisconsin50.53415Democrat

And here are the predictions from our Bayesian linear regression model:

statebayes_pred_dembayes_winner
arizona49.54392Republican
california61.08696Democrat
florida47.45501Republican
georgia50.11482Democrat
maryland64.59866Democrat
michigan50.62250Democrat
minnesota52.89277Democrat
missouri43.95808Republican
montana42.35355Republican
nevada49.95145Republican
new hampshire53.80285Democrat
new mexico52.93264Democrat
new york56.27732Democrat
north carolina49.43626Republican
ohio45.38727Republican
pennsylvania50.18365Democrat
texas47.09131Republican
virginia53.40091Democrat
wisconsin50.29996Democrat

Apart from slightly different polling predictions, the only significant departure in this Bayesian prediction from the frequentist prediction is the winner of Nevada, which, per the Bayesian model, is Trump.

These electoral maps are visible below.

If we also wanted to model the national popular vote, we could use what we did in Week 3, using an elastic net on both fundamental and polling data, weighting such that the polls closer to November matter more. This was Nate Silver’s approach. Again, I will only be considering polls within 8 weeks of the election.

Doing so, we find that the Democrats are projected to have a narrow lead in the two-party popular vote nationally (after scaling so that the estimates sum to 100%).

## Democrat two-party vote share:  50.93 %
## Republican two-party vote share:  49.07 %

Citations:

Cavazos, Nidia, et al. “Kamala Harris Campaign Surpasses $1 Billion in Fundraising, Source Says.” CBS News, CBS Interactive, 10 Oct. 2024, www.cbsnews.com/news/kamala-harris-campaign-fundraising-1-billion/.

Evers-Hillstrom, Karl. “Most Expensive Ever: 2020 Election Cost $14.4 Billion.” OpenSecrets News, 11 Feb. 2021, www.opensecrets.org/news/2021/02/2020-cycle-cost-14p4-billion-doubling-16/.

Kamarck, Elaine, et al. “Why Hillary Clinton Lost.” Brookings, 20 Sept. 2017, www.brookings.edu/articles/why-hillary-clinton-lost/.

Data Sources:

Data are from the US presidential election popular vote results from 1948-2020, polling data from fivethirtyeight, economic data from the St. Louis Fed, campaign spending data from the FEC between 2008 and 2024, and campaign advertisement data from the Wesleyan Media Project.