| license: apache-2.0 | |
| base_model: Qwen/Qwen2.5-1.5B-Instruct | |
| tags: | |
| - reinforcement-learning | |
| - gbmpo | |
| - gsm8k | |
| - math | |
| # GBMPO Fine-tuned Model on GSM8K | |
| This model was fine-tuned using GBMPO (Group Bound Multi-Objective Policy Optimization) on the GSM8K dataset. | |
| ## Training Details | |
| - Base Model: Qwen/Qwen2.5-1.5B-Instruct | |
| - Dataset: openai/gsm8k | |
| - Algorithm: GBMPO | |
| - Training Steps: Check training logs for details | |
| ## Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| model = AutoModelForCausalLM.from_pretrained("LLMAligned/gbmpo_gsm8k_model") | |
| tokenizer = AutoTokenizer.from_pretrained("LLMAligned/gbmpo_gsm8k_model") | |
| prompt = "Solve: What is 25 * 4?" | |
| inputs = tokenizer(prompt, return_tensors="pt") | |
| outputs = model.generate(**inputs, max_length=512) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |