GBMPO Fine-tuned Model on GSM8K

This model was fine-tuned using GBMPO (Group Bound Multi-Objective Policy Optimization) on the GSM8K dataset.

Training Details

  • Base Model: Qwen/Qwen2.5-1.5B-Instruct
  • Dataset: openai/gsm8k
  • Algorithm: GBMPO
  • Training Steps: Check training logs for details

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("LLMAligned/gbmpo_gsm8k_model")
tokenizer = AutoTokenizer.from_pretrained("LLMAligned/gbmpo_gsm8k_model")

prompt = "Solve: What is 25 * 4?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_length=512)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Model tree for LLMAligned/gbmpo_gsm8k_model

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(1411)
this model