beyoru commited on
Commit
114debb
·
verified ·
1 Parent(s): df067df

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -6
README.md CHANGED
@@ -10,12 +10,11 @@ language:
10
  - en
11
  ---
12
 
13
- # Uploaded finetuned model
14
 
15
- - **Developed by:** beyoru
16
- - **License:** apache-2.0
17
- - **Finetuned from model :** beyoru/EvolLLM-Linh
18
 
19
- This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
20
 
21
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
10
  - en
11
  ---
12
 
13
+ This model is fine-tuned Qwen model using a custom reinforcement learning (RL) framework that rewards the model for producing solutions passing automated test cases — similar to the process of programming task evaluation on LeetCode.
14
 
15
+ <p align="center">
16
+ <img src="https://cdn-uploads.huggingface.co/production/uploads/65905af887944e494e37e09a/s4drmYGEYWZyt2ZUkxIpI.png" width="300">
17
+ </p>
18
 
 
19
 
20
+ Instead of relying on labeled ground truth answers, the model learns through test-case-based rewards, promoting generalization and reasoning ability in algorithmic problem-solving.