Improving recommendation diversity with probabilistic item selection

Recommender Systems Collaborative Filtering Agent-Based Simulation Mathematical modeling

Project Overview:

Along with my advisor and collaborator, Kartik Hosanagar, we developed and studied what we call Probabilistic Item Selection (PI) — a drop-in replacement for the “top-item” step in $k$-nearest-neighbor collaborative filtering that samples recommendations with probabilities proportional to item popularity among nearest neighbors, rather than always picking the single most popular item. A dynamic agent-based simulation and an empirical study on archival data from 1,830 users × 522 artists from LastFM show PI boosts diversity while preserving accuracy.

Read the Research Paper

Latest draft (39 pp.) with theory, simulation, and empirical evaluation

Key Contributions & Technical Execution

🎯 Research Collaboration

Collaborated with advisor on mathematical proof on the theoretical limiting behavior of the probabilistic item selection algorithm, proving it converges to the unbiased preference distribution and eliminates popularity bias under mild assumptions

🎲 Simulation & Empirical Evaluation

Adapted advisor's MATLAB code to run probabilistic item selection algorithm in agent-based simulation. Implemented collaborative filtering recommendation algorithms in Python to conduct offline empirical test on the LastFM dataset.

☁️ Distributed computing

Utilized university centralized grid computing infrastructure to efficiently run large-scale, dynamic MATLAB based simulations

📢 Research Communication

Presented findings at WITS 2017 (Seoul) and CIST 2017 (Houston) and was offered fast-track review by editor at ACM TMIS journal.

Key Research Findings

Diversity ↑, Accuracy ↔

PI drops the Gini coefficient of the item sales distribution by ~10% while matching or improving upon the forecasted sales/consumption, resulting in an overall Pareto improvement in recommendation diversity without sacrificing accuracy.

Complementarity, Not Redundancy

Combining PI with existing diversity-increasing recommendation techniques yields super-additive increase in sales diversity, demonstrating that PI is a complementary technique rather than a replacement for existing diversity methods.

Multi-method robustness

Analytical model proves PI converges to true preference shares, avoiding the "rich-get-richer" loop of deterministic recommenders. Agent-based simulation and empirical evaluation on archival real-world data confirm the theoretical results, showing PI consistently improves diversity without sacrificing accuracy.

Recognition & Impact

🎙️ Conference Presentations

Workshop on Information Systems & Technologies (WITS)– Seoul, Korea
Conference on Information Systems & Technology (CIST) – Houston, TX