Akshint47 commited on
Commit
808df0f
·
verified ·
1 Parent(s): edd71ab

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +25 -0
README.md CHANGED
@@ -1,3 +1,28 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  # Fine-Tuning Qwen2.5-3B-Instruct with GRPO for GSM8K Dataset
2
 
3
  This notebook demonstrates the process of fine-tuning the **Qwen2.5-3B-Instruct** model using **GRPO (Generalized Reward Policy Optimization)** on the **GSM8K** dataset. The goal is to improve the model's ability to solve mathematical reasoning problems by leveraging reinforcement learning with custom reward functions.
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ base_model:
6
+ - unsloth/Qwen2.5-3B-Instruct-unsloth-bnb-4bit
7
+ library_name: adapter-transformers
8
+ tags:
9
+ - text-generation-inference
10
+ - transformers
11
+ - unsloth
12
+ - qwen2
13
+ - trl
14
+ - grpo
15
+ ---
16
+
17
+ # Uploaded model
18
+
19
+ - **Developed by:** Akshint47
20
+ - **License:** apache-2.0
21
+ - **Finetuned from model :** unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
22
+
23
+ This qwen2 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
24
+
25
+ [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
26
  # Fine-Tuning Qwen2.5-3B-Instruct with GRPO for GSM8K Dataset
27
 
28
  This notebook demonstrates the process of fine-tuning the **Qwen2.5-3B-Instruct** model using **GRPO (Generalized Reward Policy Optimization)** on the **GSM8K** dataset. The goal is to improve the model's ability to solve mathematical reasoning problems by leveraging reinforcement learning with custom reward functions.