An investigation of $p$-hacking in e-commerce A/B testing
Mixture Models
Expectation–Maximization
Monte Carlo Simulation
Research Publication
Project Overview:
I designed and executed a study of 2,270 e-commerce A/B tests run by 242 firms to ask a simple but high-stakes question: Do practitioners p-hack? Answering it effectively required developing a new statistical tool—the asymmetric caliper test—which has Pareto-improving characteristics for investigating this question compared to off-the-shelf techniques. I used Python to implement the new test and fit a beta-uniform mixture model using expectation-maximization techniques. The final result is a large-scale peer-reviewed study that demonstrates $p$-hacking may be less common in real-world retail experimentation than previous evidence suggests.

Read the article in Information Systems Research
Journal version, published online Jan 2025
Key Contributions & Technical Execution
🎯 Research Leadership
Owned the research agenda end-to-end: framed the question, gathered and processed raw data, and secured funding from two research centers.
🧮 Method Innovation
Developed the asymmetric caliper test, a purpose-built density–discontinuity test, and implemented an EM algorithm to fit and estimate it.
🎲 Simulation & Power Analysis
Ran 100k+ Monte Carlo counterfactuals to show the new test detects even modest p-hacking (~3% of experiments) at current sample size.
📢 Research Communication
Published in ISR (UTD24 & FT50 top outlet); presented findings at top academic and practitioner research conferences.
Key Research Findings
Null evidence of $p$-hacking
No discontinuity at the platform's default 95% significance threshold across target-metric $p$-values—or across 16k+ $p$-values on eight dashboard metrics.
Novel purpose-built statistical test
The asymmetric caliper test introduced in this paper controls false positives better than a standard caliper test and has significantly higher power than off-the-shelf state-of-the-art techniques like rddensity for the problem at hand.
Results reinforce the importance of culture & incentives
Discussion in the paper highlights how organizational incentives and implementation experience may matter more for sound statistical tests, as opposed to statistical expertise alone.
Recognition & Impact
📔 Journal Publication
Information Systems Research, Articles-in-Advance (Jan 2025)
Top-tier Information Systems journal
🎙️ Conference Presentations
- Conference on Information Systems & Technology – Seattle, WA
- Workshop on Information Systems & Economics – San Francisco, CA
- Conference on Digital Experimentation (CODE@MIT) – Cambridge, MA