--- base_model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit tags: - text-generation-inference - transformers - unsloth - qwen2 - gguf license: apache-2.0 language: - en - es datasets: - Kukedlc/dpo-orpo-spanish-15k library_name: transformers --- [](https://huggingface.co/fjmgAI) ## Fine-Tuned Model **`fjmgAI/b1-R1-Zero-3B-GGUF`** ## Base Model **`unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit`** ## Fine-Tuning Method Fine-tuning was performed using **[`unsloth`](https://github.com/unslothai/unsloth)**, an efficient fine-tuning framework optimized for low-resource environments and Huggingface's TRL library. ## Dataset **[`Kukedlc/dpo-orpo-spanish-15k`](https://huggingface.co/datasets/Kukedlc/dpo-orpo-spanish-15k)** ### Description A Spanish-language dataset containing **15,000 examples**, designed for **Direct Preference Optimization (DPO)** or **Outcome-Regularized Preference Optimization (ORPO).** ### Adaptation The dataset was adapted to a reasoning-based format for GPRO, enhancing its ability to guide preference-based decision-making during fine-tuning. This adaptation ensures better alignment with instruction-following tasks in Spanish. ## Fine-Tuning Details - The model was trained using the **GPRO algorithm**, leveraging structured preference data to refine its response generation. - The model was fine-tuned to maintain its **4-bit quantization (`bnb-4bit`)** for memory efficiency while aligning its outputs with the characteristics of the Spanish dataset. - The focus was on retaining the model's **instructional abilities** while improving its **understanding and generation** of Spanish text. ## Purpose This fine-tuned model is intended for **Spanish-language applications** that require efficient AI that follows instructions using a **lightweight reasoning process.** - **Developed by:** fjmgAI - **License:** apache-2.0 [](https://github.com/unslothai/unsloth) [](https://github.com/huggingface/trl?tab=readme-ov-file)