dmitry315
/

llm-course-hw2-reward-model

Text Classification

text-generation-inference

Model card Files Files and versions

dmitry315 commited on Mar 30

Commit

5d386c8

·

verified ·

1 Parent(s): 1f75659

update card

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -11,8 +11,6 @@ tags:
 Reward model trained for PPO for VK NLP course.
-Is likely to be overfitted.
 ## Model Details
 ### Model Description
@@ -20,6 +18,8 @@ Is likely to be overfitted.
 <!-- Provide a longer summary of what this model is. -->
 The model is LLM [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct).
 Model trained with TRL for PPO trainig.
 ### Model Sources

 Reward model trained for PPO for VK NLP course.
 ## Model Details
 ### Model Description
 <!-- Provide a longer summary of what this model is. -->
 The model is LLM [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct).
+Trained only last linear layer.
 Model trained with TRL for PPO trainig.
 ### Model Sources