purbeshmitra
/

vanillaGRPO

@@ -1,15 +1,16 @@
 ---
 base_model: unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit
-library_name: peft
-license: apache-2.0
 datasets:
 - openai/gsm8k
 - HuggingFaceH4/MATH-500
 - HuggingFaceH4/aime_2024
 language:
 - en
 metrics:
 - accuracy
 ---
 ## MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
@@ -49,7 +50,7 @@ from transformers import AutoModelForCausalLM
 base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit")
 model = PeftModel.from_pretrained(base_model, "purbeshmitra/vanillaGRPO")
-SYSTEM_PROMPT = "You are a helpful assistant. When the user asks a question, you first think about the reasoning process in mind and then provide the user with an answer. The reasoning process and the answer are enclosed within <reasoning> </reasoning> and <answer> </answer> tags, respectively. In your answer, you also enclose your final answer in the box: \boxed{}. Therefore, you respond in the following strict format:
 <reasoning> reasoning process here </reasoning> <answer> answer here </answer>."
 ```

 ---
 base_model: unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit
 datasets:
 - openai/gsm8k
 - HuggingFaceH4/MATH-500
 - HuggingFaceH4/aime_2024
 language:
 - en
+library_name: transformers
+license: apache-2.0
 metrics:
 - accuracy
+pipeline_tag: text-generation
 ---
 ## MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
 base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit")
 model = PeftModel.from_pretrained(base_model, "purbeshmitra/vanillaGRPO")
+SYSTEM_PROMPT = "You are a helpful assistant. When the user asks a question, you first think about the reasoning process in mind and then provide the user with an answer. The reasoning process and the answer are enclosed within <reasoning> </reasoning> and <answer> </answer> tags, respectively. In your answer, you also enclose your final answer in the box: \\boxed{}. Therefore, you respond in the following strict format:
 <reasoning> reasoning process here </reasoning> <answer> answer here </answer>."
 ```