Testing for underpowered literatures (Job Market Paper)
Scheduled to be presented at BITSS Annual Meeting March 2024
How many experimental studies would have come to different conclusions had they been run on larger samples? I show how to estimate the expected number of statistically significant results that a set of experiments would have reported had their sample sizes all been counterfactually increased by a chosen factor. The estimator is consistent and asymptotically normal. Unlike existing methods, my approach requires no assumptions about the distribution of true effects of the interventions being studied other than continuity. This method includes an adjustment for publication bias in the reported t-scores. An application to randomized controlled trials (RCTs) published in top economics journals finds that doubling every experiment's sample size would only increase the power of two-sided t-tests by 7.8 percentage points on average. This effect is small and is comparable to the effect for systematic replication projects in laboratory psychology where previous studies enabled accurate power calculations ex ante. These effects are both smaller than for non-RCTs. This comparison suggests that RCTs are on average relatively insensitive to sample size increases. The policy implication is that grant givers should generally fund more experiments rather than fewer, larger ones.
We study the problem of estimating the average causal effect of treating every member of a population, as opposed to none, using an experiment that treats only some. This is the policy-relevant estimand when deciding whether to scale up an intervention based on the results of an RCT, for example, but differs from the usual average treatment effect in the presence of spillovers. We find the optimal rate of convergence to the average global effect over all estimators linear in the outcomes and all cluster-randomized designs, as well as provide estimators and experimental designs that achieve this rate. We also provide an optimized weighting approach that minimizes mean squared error when a linearity assumption holds while remaining consistent and rate-optimal when it does not. We also provide inference methods. Arxiv
Social Effects, Spillovers, and Scale-up of Teacher Training in Uganda: an RCT (with Vesall Nourani, Moustafa El-Kashlan, and Sara Tamayo)
While nearly half of Ugandan schoolchildren enter secondary school, fewer than 10% complete it. Low teaching quality may be a factor. We study the effects and spillovers of training secondary school teachers in rural Uganda with an RCT. Teachers were randomly assigned to an innovative training program run by Kimanya-Ngeyo in November 2021 and training is ongoing in waves. Our RCT design allows us to study teacher-to-teacher spillovers over time by randomly assigning half of treated schools to treat teachers in "cliques", where treated teachers know each other well vs. the other half of treated schools who were assigned to treat teachers in "anti-cliques", where treated teachers do not know each other well. AEA Registration here.