viberec
/

ML-100k-SASRec-GRPO

Reinforcement Learning

recommender-system

Model card Files Files and versions

SASRec - ml-100k (Finetuned with GRPO)

Model Description

This model is fine-tuned from viberec/ML-100k-SASRec using Group Relative Policy Optimization (GRPO). The objective was to improve serendipity (Tail Percentage, Low Popularity) while maintaining ranking accuracy (NDCG).

Training Results

Baseline (Before Finetune)

ndcg@10: 0.0514
hit@10: 0.114
averagepopularity@10: 235.6415
giniindex@10: 0.9189
itemcoverage@10: 0.2124
shannonentropy@10: 0.0218
tailpercentage@10: 0.0004

Best Valid Results (GRPO)

ndcg@10: 0.0539
hit@10: 0.1171
averagepopularity@10: 217.3145
giniindex@10: 0.9117
itemcoverage@10: 0.2114
shannonentropy@10: 0.0222
tailpercentage@10: 0.0

Test Results (GRPO)

ndcg@10: 0.049
hit@10: 0.1044
averagepopularity@10: 174.0842
giniindex@10: 0.9072
itemcoverage@10: 0.2665
shannonentropy@10: 0.0179
tailpercentage@10: 0.0005

RL Hyperparameters

Alpha: 0.1 (Weight for Useful Reward vs Unexpected Reward)
KL Beta: 0.5
Group Size: 16
Learning Rate: 5e-05

Usage

Downloads last month: -; Downloads are not tracked for this model. How to track

Video Preview

Reinforcement Learning

loading