dmitry315 commited on
Commit
5d386c8
·
verified ·
1 Parent(s): 1f75659

update card

Browse files
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -11,8 +11,6 @@ tags:
11
 
12
  Reward model trained for PPO for VK NLP course.
13
 
14
- Is likely to be overfitted.
15
-
16
  ## Model Details
17
 
18
  ### Model Description
@@ -20,6 +18,8 @@ Is likely to be overfitted.
20
  <!-- Provide a longer summary of what this model is. -->
21
  The model is LLM [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct).
22
 
 
 
23
  Model trained with TRL for PPO trainig.
24
 
25
  ### Model Sources
 
11
 
12
  Reward model trained for PPO for VK NLP course.
13
 
 
 
14
  ## Model Details
15
 
16
  ### Model Description
 
18
  <!-- Provide a longer summary of what this model is. -->
19
  The model is LLM [HuggingFaceTB/SmolLM-135M-Instruct](https://huggingface.co/HuggingFaceTB/SmolLM-135M-Instruct).
20
 
21
+ Trained only last linear layer.
22
+
23
  Model trained with TRL for PPO trainig.
24
 
25
  ### Model Sources