If you are interested in building up your research on this work, please cite:
@inproceedings{sigir19a,
author = {Ga Wu and Maksims Volkovs and Chee Loong Soon and Scott Sanner and Himanshu Rai},
title = {Noise Contrastive Estimation for One-Class Collaborative Filtering},
booktitle = {Proceedings of the 42nd International {ACM} {SIGIR} Conference on Research and Development in Information Retrieval {(SIGIR-19)}},
address = {Paris, France},
year = {2019}
}
- Noise Contrastive Estimation Projected Linear Recommender(NCE-PLRec)
- Variational Autoencoder for Collaborative Filtering(VAE-CF)
- Collaborative Metric Learning(CML)
- Auto-encoder Recommender(AutoRec)
- Collaborative Denoising Auto-Encoders(CDAE)
- Weighted Regularized Matrix Factorization(WRMF)
- Pure SVD Recommender(PureSVD)
- Bayesian Personalized Ranking(BPR)
- Noise Contrastive Estimation through SVD(NCE-SVD)
- Popularity
- Movielens 1M,
- Movielens 20M,
- Yahoo 1R,
- Netflix,
- Amazon Digital Music (2018),
- Amazon Video Games (2018)
Experiment result for the validation datasets are listed under tables
folder. The experiments are conducted on Compute Canada cluster with multiple runs with all valid hyper-parameter combinations. Please use those as benchmark reference if needed.
Data is not suit to submit on github, so please prepare it yourself. It should be numpy npy file directly dumped from csr sparse matrix. It should be easy..
The above algorithm could be splitted into two major category based on the distance
measurement: Euclidean or Cosine. CML is a euclidean distance recommender. And, ALS
is a typical Cosine distance recommender. When doing evaluation, please select
similarity measurement before running with --similarity Euclidean
python main.py -d datax/ -m VAE-CF -i 200 -l 0.0000001 -r 100
Split data in experiment setting, and tune hyper parameters based on yaml files in config
folder.
Please note the reproduce_paper_results.py
will load pretrained model for Movielens1m from latent
folder if that folder is not given in parameter.
python getmovielens.py --implicit -r 0.5,0.2,0.3 -d datax/ -n ml-1m/ratings.csv
python tune_parameters.py -d datax/ -n movielens1m/autorec.csv -y config/autorec.yml -gpu
python tune_parameters.py -d datax/ -n movielens1m/bpr.csv -y config/bpr.yml -gpu
python tune_parameters.py -d datax/ -n movielens1m/cdae.csv -y config/cdae.yml -gpu
python tune_parameters.py -d datax/ -n movielens1m/cml.csv -y config/cml.yml -gpu
python tune_parameters.py -d datax/ -n movielens1m/vae.csv -y config/vae.yml -gpu
python tune_parameters.py -d datax/ -n movielens1m/wrmf.csv -y config/wrmf.yml -gpu
python tune_parameters.py -d datax/ -n movielens1m/puresvd.csv -y config/puresvd.yml -gpu
python tune_parameters.py -d datax/ -n movielens1m/nceplrec.csv -y config/nceplrec.yml -gpu
python tune_parameters.py -d datax/ -n movielens1m/plrec.csv -y config/plrec.yml -gpu
Please check out the cluster_bash
folder for further command details
Resplit data into two datasets: one for train, one for test. Note the train dataset includes validation set in previous split
python getmovielens.py --implicit -r 0.7,0.3,0.0 -d datax/ -n ml-1m/ratings.csv
python reproduce_paper_results.py -p tables/movielens1m -d datax/ -v Rvalid.npz -n movielens1m_test_result.csv -gpu
python reproduce_paper_results.py