How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="morty649/qwen_finetune",
	filename="Qwen2.5-1.5B.Q4_K_M.gguf",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Qwen Reasoning Model (GRPO Fine-Tuned)

This repository contains a fine-tuned version of Qwen trained using GRPO (Group Relative Policy Optimization) with the Unsloth framework.

The model was trained to improve reasoning ability and structured responses.


Base Model

  • Base model: Qwen2.5
  • Parameter size: ~1.5B parameters
  • Quantization: GGUF Q4_K_M
  • Training framework: Unsloth
  • Optimization method: GRPO (Reinforcement Learning)

Training Details

The model was trained using reinforcement learning techniques to improve reasoning quality.

Training setup:

  • Trainer: GRPOTrainer (Unsloth)

  • Dataset: reasoning style prompts

  • Hardware: Kaggle GPU

  • Training approach:

    • LoRA fine-tuning
    • RL reward optimization
    • Quantized inference format (GGUF)

Files in this Repository

File Description
*.gguf Quantized model weights
config.json Model configuration
README.md Model card

How to Use

Run with llama.cpp

./main -m Qwen2.5-1.5B_Q4_K_M.gguf -p "Explain why the sky is blue."

Python Example

from llama_cpp import Llama

llm = Llama(
    model_path="Qwen2.5-1.5B_Q4_K_M.gguf",
    n_ctx=4096,
)

print(llm("Explain reinforcement learning simply."))

Intended Use

This model is intended for:

  • reasoning experiments
  • reinforcement learning research
  • local LLM experimentation

Limitations

  • Small parameter size (1.5B)
  • Limited training data
  • May produce incorrect reasoning

Author

Maruthi


License

Please follow the license of the original Qwen model.

Downloads last month
13
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support