viberec
/

ML-100k-SASRec-GRPO

@@ -25,26 +25,26 @@ The objective was to improve serendipity (Tail Percentage, Low Popularity) while
 - **tailpercentage@10**: 0.0004
 ### Best Valid Results (GRPO)
-- **ndcg@10**: 0.0532
-- **hit@10**: 0.1118
-- **averagepopularity@10**: 218.9659
-- **giniindex@10**: 0.912
-- **itemcoverage@10**: 0.2144
-- **shannonentropy@10**: 0.0219
 - **tailpercentage@10**: 0.0
 ### Test Results (GRPO)
-- **ndcg@10**: 0.0489
-- **hit@10**: 0.099
-- **averagepopularity@10**: 180.7492
-- **giniindex@10**: 0.9097
-- **itemcoverage@10**: 0.2714
-- **shannonentropy@10**: 0.0175
 - **tailpercentage@10**: 0.0005
 ## RL Hyperparameters
-- **Alpha**: 0.9 (Weight for Useful Reward vs Unexpected Reward)
-- **KL Beta**: 0.1
 - **Group Size**: 16
 - **Learning Rate**: 5e-05

 - **tailpercentage@10**: 0.0004
 ### Best Valid Results (GRPO)
+- **ndcg@10**: 0.0539
+- **hit@10**: 0.1171
+- **averagepopularity@10**: 217.3145
+- **giniindex@10**: 0.9117
+- **itemcoverage@10**: 0.2114
+- **shannonentropy@10**: 0.0222
 - **tailpercentage@10**: 0.0
 ### Test Results (GRPO)
+- **ndcg@10**: 0.049
+- **hit@10**: 0.1044
+- **averagepopularity@10**: 174.0842
+- **giniindex@10**: 0.9072
+- **itemcoverage@10**: 0.2665
+- **shannonentropy@10**: 0.0179
 - **tailpercentage@10**: 0.0005
 ## RL Hyperparameters
+- **Alpha**: 0.1 (Weight for Useful Reward vs Unexpected Reward)
+- **KL Beta**: 0.5
 - **Group Size**: 16
 - **Learning Rate**: 5e-05