Does anyone else care if the movie’s “Fresh?”

A regression discontinuity side project investigating whether Rotten Tomatoes scores have a causal effect on box office revenue in the post-COVID era. Data and code available on GitHub.

Motivation

Fun fact about me: I love reviews. I have a near-religious obsession with online review aggregators: Rotten Tomatoes, Pitchfork, Metacritic, Goodreads, you name it. I write a Google Review for every restaurant I visit. I’m not entirely sure why I value strangers’ opinions so much, but that’s a blog post for another day.

So when I came across a Hollywood Reporter article about the Melania documentary holding the record for the largest critic–audience score gap in Rotten Tomatoes history (6% critics vs. 99% audience), I was intrigued. It got me wondering: do people actually consult Rotten Tomatoes before buying a ticket? And could you identify these scores having any real, causal effect on revenue?

The “Fresh” vs. “Rotten” label creates a sharp cutoff at 60%, which screamed regression discontinuity to me. A quick Google search confirmed I wasn’t the first to think of this: Nishijima, Rodrigues & Souza (2021) ran an RDD on the Tomatometer using 1,239 films from 1999–2019 and found no effect. Their paper was interesting, but I wanted to see if things looked different in the post-COVID era, when theatrical appetite has arguably shrunk and audiences might be more selective about what’s worth a trip to the theater. I also wanted to extend the analysis to the Audience Score, which their study didn’t cover.

Approach

Rotten Tomatoes labels a film “Fresh” if its score is ≥ 60% and “Rotten” if it’s < 60%. If the binary label itself nudges people to buy tickets, we’d expect a discontinuous jump in box office revenue right at that cutoff, beyond what the continuous score would predict. I test this for both the Tomatometer (critic consensus) and the Audience Score, using two outcomes: log opening weekend gross and log total domestic gross.

The preferred estimates use rdrobust (Calonico, Cattaneo & Titiunik, 2014) with MSE-optimal bandwidth and a triangular kernel. I also run parametric OLS with linear and quadratic polynomials (interacted with treatment) as robustness checks. Standard errors are heteroskedasticity-robust (HC1). Controls include log budget, log opening theaters, MPAA rating dummies, and release-year fixed effects.

Data

I scraped three sources and merged them together:

After merging and filtering, the final dataset has 621 films. Films still in theatrical release at the end of the study window are flagged and excluded from the total domestic gross analysis (since their grosses are incomplete).

Score Distributions

Before diving into results, a sanity check: for an RDD to work, the density of scores should be smooth through the 60% cutoff. If a bunch of films were suspiciously clustered just above it, that would suggest some kind of manipulation. Things look clean here.

Tomatometer distribution Audience Score distribution

Results

Panel A: Critic Score (Tomatometer)

Outcome: Log Opening Weekend Gross

Method Controls Coef. Std. Err. p-value 95% CI N BW
rdrobust No 0.3316 0.7560 0.6610 [-1.1502, 1.8133] 126 11.90
rdrobust Yes 0.0602 0.2599 0.8169 [-0.4493, 0.5697] 201 17.06
OLS Linear No 0.0082 0.2700 0.9756 [-0.5209, 0.5374] 538
OLS Linear Yes -0.0465 0.1333 0.7270 [-0.3077, 0.2147] 538
OLS Quadratic No 0.0043 0.4072 0.9915 [-0.7938, 0.8025] 538
OLS Quadratic Yes -0.1321 0.1975 0.5034 [-0.5192, 0.2549] 538

Outcome: Log Total Domestic Gross (excluding films still in theaters)

Method Controls Coef. Std. Err. p-value 95% CI N BW
rdrobust No -0.1265 0.8800 0.8857 [-1.8514, 1.5983] 108 10.96
rdrobust Yes -0.2552 0.3773 0.4989 [-0.9946, 0.4843] 133 12.60
OLS Linear No -0.1073 0.2996 0.7201 [-0.6945, 0.4798] 516
OLS Linear Yes -0.0434 0.1574 0.7830 [-0.3519, 0.2652] 516
OLS Quadratic No -0.2545 0.4507 0.5723 [-1.1379, 0.6289] 516
OLS Quadratic Yes -0.1485 0.2332 0.5242 [-0.6055, 0.3085] 516

Panel B: Audience Score

Outcome: Log Opening Weekend Gross

Method Controls Coef. Std. Err. p-value 95% CI N BW
rdrobust No -0.4432 0.5440 0.4152 [-1.5094, 0.6230] 148 14.51
rdrobust Yes -0.0641 0.3230 0.8426 [-0.6972, 0.5689] 148 14.19
OLS Linear No 0.6049** 0.2369 0.0107 [0.1405, 1.0692] 599
OLS Linear Yes 0.0853 0.1371 0.5339 [-0.1834, 0.3539] 599
OLS Quadratic No -0.1539 0.3497 0.6598 [-0.8394, 0.5315] 599
OLS Quadratic Yes -0.0210 0.1995 0.9163 [-0.4121, 0.3701] 599

Outcome: Log Total Domestic Gross (excluding films still in theaters)

Method Controls Coef. Std. Err. p-value 95% CI N BW
rdrobust No -0.1114 0.6547 0.8649 [-1.3946, 1.1717] 143 14.96
rdrobust Yes 0.1580 0.3543 0.6556 [-0.5363, 0.8523] 164 16.87
OLS Linear No 0.8443*** 0.2667 0.0015 [0.3216, 1.3671] 573
OLS Linear Yes 0.2515 0.1535 0.1014 [-0.0494, 0.5525] 573
OLS Quadratic No 0.0908 0.3962 0.8188 [-0.6858, 0.8673] 573
OLS Quadratic Yes 0.1801 0.2280 0.4297 [-0.2669, 0.6270] 573

RDD Plots

Binned scatter plots with quadratic fits on each side of the cutoff. If the “Fresh” label were causing a jump in revenue, you’d see a visible gap at zero. Spoiler: you don’t.

Critic - Opening Gross Critic - Total Gross
Audience - Opening Gross Audience - Total Gross

What I Found

Notes: Shaded rows are the preferred spec (rdrobust with controls). *, **, *** = significance at 10%, 5%, 1%. rdrobust reports robust bias-corrected coefficients and CIs; N is the effective sample within the MSE-optimal bandwidth (BW). OLS uses the full score range. Coefficients are in log points (multiply by 100 for approximate % effect).

References:
Calonico, S., Cattaneo, M.D. & Titiunik, R. (2014). “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs.” Econometrica, 82(6), 2295–2326.
Nishijima, M., Rodrigues, M. & Souza, T.L.D. (2021). “Is Rotten Tomatoes killing the movie industry? A regression discontinuity approach.” Applied Economics Letters, 29(13), 1187–1192.