Does anyone else care if the movie’s “Fresh?”

A regression discontinuity side project investigating whether Rotten Tomatoes scores have a causal effect on box office revenue in the post-COVID era. Data and code available on GitHub.

Motivation

Fun fact about me: I love reviews. I have a near-religious obsession with online review aggregators: Rotten Tomatoes, Pitchfork, Metacritic, Goodreads, you name it. I write a Google Review for every restaurant I visit. I’m not entirely sure why I value strangers’ opinions so much, but that’s a blog post for another day.

So when I came across a Hollywood Reporter article about the Melania documentary holding the record for the largest critic–audience score gap in Rotten Tomatoes history (6% critics vs. 99% audience), I was intrigued. It got me wondering: do people actually consult Rotten Tomatoes before buying a ticket? And could you identify these scores having any real, causal effect on revenue?

The “Fresh” vs. “Rotten” label creates a sharp cutoff at 60%, which screamed regression discontinuity to me. A quick Google search confirmed I wasn’t the first to think of this: Nishijima, Rodrigues & Souza (2021) ran an RDD on the Tomatometer using 1,239 films from 1999–2019 and found no effect. Their paper was interesting, but I wanted to see if things looked different in the post-COVID era, when theatrical appetite has arguably shrunk and audiences might be more selective about what’s worth a trip to the theater. I also wanted to extend the analysis to the Audience Score, which their study didn’t cover.

Approach

Rotten Tomatoes labels a film “Fresh” if its score is ≥ 60% and “Rotten” if it’s < 60%. If the binary label itself nudges people to buy tickets, we’d expect a discontinuous jump in box office revenue right at that cutoff, beyond what the continuous score would predict. I test this for both the Tomatometer (critic consensus) and the Audience Score, using two outcomes: log opening weekend gross and log total domestic gross.

The preferred estimates use rdrobust (Calonico, Cattaneo & Titiunik, 2014) with MSE-optimal bandwidth and a triangular kernel. I also run parametric OLS with linear and quadratic polynomials (interacted with treatment) as robustness checks. Standard errors are heteroskedasticity-robust (HC1). Controls include log budget, log opening theaters, MPAA rating dummies, and release-year fixed effects.

Data

I scraped three sources and merged them together:

Box Office Mojo for the core film list and box office numbers: opening weekend gross, theater counts, total domestic gross, MPAA rating, and release date. I pulled every wide release (600+ theaters) from September 2021 through February 2026.
Rotten Tomatoes for the running variables: Tomatometer (critic consensus) and Audience Score, plus review/rating counts. I matched each film by constructing URL slugs from the title, falling back to RT’s search page when that didn’t work.
The Numbers for production budgets, which I fuzzy-matched to the BOM titles by normalized name and release year.

After merging and filtering, the final dataset has 621 films. Films still in theatrical release at the end of the study window are flagged and excluded from the total domestic gross analysis (since their grosses are incomplete).

Score Distributions

Before diving into results, a sanity check: for an RDD to work, the density of scores should be smooth through the 60% cutoff. If a bunch of films were suspiciously clustered just above it, that would suggest some kind of manipulation. Things look clean here.

Results

Panel A: Critic Score (Tomatometer)

Outcome: Log Opening Weekend Gross

Method	Controls	Coef.	Std. Err.	p-value	95% CI	N	BW
rdrobust	No	0.3316	0.7560	0.6610	[-1.1502, 1.8133]	126	11.90
rdrobust	Yes	0.0602	0.2599	0.8169	[-0.4493, 0.5697]	201	17.06
OLS Linear	No	0.0082	0.2700	0.9756	[-0.5209, 0.5374]	538	—
OLS Linear	Yes	-0.0465	0.1333	0.7270	[-0.3077, 0.2147]	538	—
OLS Quadratic	No	0.0043	0.4072	0.9915	[-0.7938, 0.8025]	538	—
OLS Quadratic	Yes	-0.1321	0.1975	0.5034	[-0.5192, 0.2549]	538	—

Outcome: Log Total Domestic Gross (excluding films still in theaters)

Method	Controls	Coef.	Std. Err.	p-value	95% CI	N	BW
rdrobust	No	-0.1265	0.8800	0.8857	[-1.8514, 1.5983]	108	10.96
rdrobust	Yes	-0.2552	0.3773	0.4989	[-0.9946, 0.4843]	133	12.60
OLS Linear	No	-0.1073	0.2996	0.7201	[-0.6945, 0.4798]	516	—
OLS Linear	Yes	-0.0434	0.1574	0.7830	[-0.3519, 0.2652]	516	—
OLS Quadratic	No	-0.2545	0.4507	0.5723	[-1.1379, 0.6289]	516	—
OLS Quadratic	Yes	-0.1485	0.2332	0.5242	[-0.6055, 0.3085]	516	—

Panel B: Audience Score

Outcome: Log Opening Weekend Gross

Method	Controls	Coef.	Std. Err.	p-value	95% CI	N	BW
rdrobust	No	-0.4432	0.5440	0.4152	[-1.5094, 0.6230]	148	14.51
rdrobust	Yes	-0.0641	0.3230	0.8426	[-0.6972, 0.5689]	148	14.19
OLS Linear	No	0.6049**	0.2369	0.0107	[0.1405, 1.0692]	599	—
OLS Linear	Yes	0.0853	0.1371	0.5339	[-0.1834, 0.3539]	599	—
OLS Quadratic	No	-0.1539	0.3497	0.6598	[-0.8394, 0.5315]	599	—
OLS Quadratic	Yes	-0.0210	0.1995	0.9163	[-0.4121, 0.3701]	599	—

Outcome: Log Total Domestic Gross (excluding films still in theaters)

Method	Controls	Coef.	Std. Err.	p-value	95% CI	N	BW
rdrobust	No	-0.1114	0.6547	0.8649	[-1.3946, 1.1717]	143	14.96
rdrobust	Yes	0.1580	0.3543	0.6556	[-0.5363, 0.8523]	164	16.87
OLS Linear	No	0.8443***	0.2667	0.0015	[0.3216, 1.3671]	573	—
OLS Linear	Yes	0.2515	0.1535	0.1014	[-0.0494, 0.5525]	573	—
OLS Quadratic	No	0.0908	0.3962	0.8188	[-0.6858, 0.8673]	573	—
OLS Quadratic	Yes	0.1801	0.2280	0.4297	[-0.2669, 0.6270]	573	—

RDD Plots

Binned scatter plots with quadratic fits on each side of the cutoff. If the “Fresh” label were causing a jump in revenue, you’d see a visible gap at zero. Spoiler: you don’t.

What I Found

No discontinuity at the critic threshold. The rdrobust estimate for opening gross is 0.060 log points (p = 0.82) with controls, statistically and economically indistinguishable from zero. Total gross estimates are similarly null. Just like Nishijima et al., the Tomatometer label doesn’t seem to matter.
No discontinuity at the audience threshold either. The preferred estimate is −0.064 log points (p = 0.84) for opening gross. A naïve OLS linear spec looks significant, but that vanishes once you add controls or allow for a quadratic. Classic functional form misspecification rather than a real discontinuity.
Bottom line: even in the post-COVID era, crossing the 60% “Fresh” threshold on Rotten Tomatoes doesn’t appear to cause a detectable bump in box office revenue. People might respond to the underlying score or individual reviews, but the binary label itself? Not so much, or at least not enough to pick up in 621 films.

Notes: Shaded rows are the preferred spec (rdrobust with controls). *, **, *** = significance at 10%, 5%, 1%. rdrobust reports robust bias-corrected coefficients and CIs; N is the effective sample within the MSE-optimal bandwidth (BW). OLS uses the full score range. Coefficients are in log points (multiply by 100 for approximate % effect).

References:
Calonico, S., Cattaneo, M.D. & Titiunik, R. (2014). “Robust Nonparametric Confidence Intervals for Regression-Discontinuity Designs.” Econometrica, 82(6), 2295–2326.
Nishijima, M., Rodrigues, M. & Souza, T.L.D. (2021). “Is Rotten Tomatoes killing the movie industry? A regression discontinuity approach.” Applied Economics Letters, 29(13), 1187–1192.