SASRec - ml-100k (Finetuned with GRPO)

Model Description

This model is fine-tuned from viberec/ML-100k-SASRec using Group Relative Policy Optimization (GRPO). The objective was to improve serendipity (Tail Percentage, Low Popularity) while maintaining ranking accuracy (NDCG).

Training Results

Baseline (Before Finetune)

  • ndcg@10: 0.0514
  • hit@10: 0.114
  • averagepopularity@10: 235.6415
  • giniindex@10: 0.9189
  • itemcoverage@10: 0.2124
  • shannonentropy@10: 0.0218
  • tailpercentage@10: 0.0004

Best Valid Results (GRPO)

  • ndcg@10: 0.0539
  • hit@10: 0.1171
  • averagepopularity@10: 217.3145
  • giniindex@10: 0.9117
  • itemcoverage@10: 0.2114
  • shannonentropy@10: 0.0222
  • tailpercentage@10: 0.0

Test Results (GRPO)

  • ndcg@10: 0.049
  • hit@10: 0.1044
  • averagepopularity@10: 174.0842
  • giniindex@10: 0.9072
  • itemcoverage@10: 0.2665
  • shannonentropy@10: 0.0179
  • tailpercentage@10: 0.0005

RL Hyperparameters

  • Alpha: 0.1 (Weight for Useful Reward vs Unexpected Reward)
  • KL Beta: 0.5
  • Group Size: 16
  • Learning Rate: 5e-05

Usage

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading