Improving recommendation diversity with probabilistic item selection

Recommender Systems Collaborative Filtering Agent-Based Simulation Mathematical modeling

Project Overview:
Along with my advisor and collaborator, Kartik Hosanagar, we developed and studied what we call Probabilistic Item Selection (PI) — a drop-in replacement for the “top-item” step in $k$-nearest-neighbor collaborative filtering that samples recommendations with probabilities proportional to item popularity among nearest neighbors, rather than always picking the single most popular item. A dynamic agent-based simulation and an empirical study on archival data from 1,830 users × 522 artists from LastFM show PI boosts diversity while preserving accuracy.
Paper Thumbnail
Read the Research Paper
Latest draft (39 pp.) with theory, simulation, and empirical evaluation
Key Contributions & Technical Execution
🎯 Research Collaboration
Collaborated with advisor on mathematical proof on the theoretical limiting behavior of the probabilistic item selection algorithm, proving it converges to the unbiased preference distribution and eliminates popularity bias under mild assumptions
🎲 Simulation & Empirical Evaluation
Adapted advisor's MATLAB code to run probabilistic item selection algorithm in agent-based simulation. Implemented collaborative filtering recommendation algorithms in Python to conduct offline empirical test on the LastFM dataset.
☁️ Distributed computing
Utilized university centralized grid computing infrastructure to efficiently run large-scale, dynamic MATLAB based simulations
📢 Research Communication
Presented findings at WITS 2017 (Seoul) and CIST 2017 (Houston) and was offered fast-track review by editor at ACM TMIS journal.
Key Research Findings
Diversity ↑, Accuracy ↔
PI drops the Gini coefficient of the item sales distribution by ~10% while matching or improving upon the forecasted sales/consumption, resulting in an overall Pareto improvement in recommendation diversity without sacrificing accuracy.
Complementarity, Not Redundancy
Combining PI with existing diversity-increasing recommendation techniques yields super-additive increase in sales diversity, demonstrating that PI is a complementary technique rather than a replacement for existing diversity methods.
Multi-method robustness
Analytical model proves PI converges to true preference shares, avoiding the "rich-get-richer" loop of deterministic recommenders. Agent-based simulation and empirical evaluation on archival real-world data confirm the theoretical results, showing PI consistently improves diversity without sacrificing accuracy.
Recognition & Impact
🎙️ Conference Presentations
  • Workshop on Information Systems & Technologies (WITS)– Seoul, Korea
  • Conference on Information Systems & Technology (CIST) – Houston, TX