SASRec - ml-100k (Finetuned with GRPO)
Model Description
This model is fine-tuned from viberec/ML-100k-SASRec using Group Relative Policy Optimization (GRPO). The objective was to improve serendipity (Tail Percentage, Low Popularity) while maintaining ranking accuracy (NDCG).
Training Results
Baseline (Before Finetune)
- ndcg@10: 0.0514
- hit@10: 0.114
- averagepopularity@10: 235.6415
- giniindex@10: 0.9189
- itemcoverage@10: 0.2124
- shannonentropy@10: 0.0218
- tailpercentage@10: 0.0004
Best Valid Results (GRPO)
- ndcg@10: 0.0539
- hit@10: 0.1171
- averagepopularity@10: 217.3145
- giniindex@10: 0.9117
- itemcoverage@10: 0.2114
- shannonentropy@10: 0.0222
- tailpercentage@10: 0.0
Test Results (GRPO)
- ndcg@10: 0.049
- hit@10: 0.1044
- averagepopularity@10: 174.0842
- giniindex@10: 0.9072
- itemcoverage@10: 0.2665
- shannonentropy@10: 0.0179
- tailpercentage@10: 0.0005
RL Hyperparameters
- Alpha: 0.1 (Weight for Useful Reward vs Unexpected Reward)
- KL Beta: 0.5
- Group Size: 16
- Learning Rate: 5e-05