An investigation of $p$-hacking in e-commerce A/B testing
Mixture Models
Expectation–Maximization
Monte Carlo Simulation
Research Publication
Project Overview:
I designed and executed a study of 2,270 e-commerce A/B tests run by 242 firms to ask a simple but high-stakes question: Do practitioners p-hack? Answering it effectively required developing a new statistical tool—the asymmetric caliper test—which has Pareto-improving characteristics for investigating this question compared to off-the-shelf techniques. I used Python to implement the new test and fit a beta-uniform mixture model using expectation-maximization techniques. The final result is a large-scale peer-reviewed study that demonstrates $p$-hacking may be less common in real-world retail experimentation than previous evidence suggests.

Read the article in Information Systems Research
Journal version, published online Jan 2025
Access the publication on INFORMS.org (.html)
Read the full manuscript with supplemental appendix here (.pdf)
Key Contributions & Technical Execution
🎯 Research Leadership
Owned the research agenda end-to-end: framed the question, gathered and processed raw data, and secured funding from two research centers.
🧮 Method Innovation
Developed the asymmetric caliper test, a purpose-built density–discontinuity test, and implemented an EM algorithm to fit and estimate it.
🎲 Simulation & Power Analysis
Ran 100k+ Monte Carlo counterfactuals to show the new test detects even modest p-hacking (~3% of experiments) at current sample size.
📢 Research Communication
Published in ISR (UTD24 & FT50 top outlet); presented findings at top academic and practitioner research conferences.
Key Research Findings
Null evidence of $p$-hacking
No discontinuity at the platform's default 95% significance threshold across target-metric $p$-values—or across 16k+ $p$-values on eight dashboard metrics.
Novel purpose-built statistical test
The asymmetric caliper test introduced in this paper controls false positives better than a standard caliper test and has significantly higher power than off-the-shelf state-of-the-art techniques like rddensity for the problem at hand.
Results reinforce the importance of culture & incentives
Discussion in the paper highlights how organizational incentives and implementation experience may matter more for sound statistical tests, as opposed to statistical expertise alone.
Recognition & Impact
📔 Journal Publication
Information Systems Research, Articles-in-Advance (Jan 2025)
Top-tier Information Systems journal
🎙️ Conference Presentations
- Conference on Information Systems & Technology – Seattle, WA
- Workshop on Information Systems & Economics – San Francisco, CA
- Conference on Digital Experimentation (CODE@MIT) – Cambridge, MA