--- license: apache-2.0 base_model: Qwen/Qwen2.5-1.5B-Instruct tags: - reinforcement-learning - gbmpo - gsm8k - math --- # GBMPO Fine-tuned Model on GSM8K This model was fine-tuned using GBMPO (Group Bound Multi-Objective Policy Optimization) on the GSM8K dataset. ## Training Details - Base Model: Qwen/Qwen2.5-1.5B-Instruct - Dataset: openai/gsm8k - Algorithm: GBMPO - Training Steps: Check training logs for details ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("LLMAligned/gbmpo_gsm8k_model") tokenizer = AutoTokenizer.from_pretrained("LLMAligned/gbmpo_gsm8k_model") prompt = "Solve: What is 25 * 4?" inputs = tokenizer(prompt, return_tensors="pt") outputs = model.generate(**inputs, max_length=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ```