--- license: apache-2.0 base_model: Qwen/Qwen2.5-0.5B datasets: - trl-lib/Capybara language: - en tags: - conversational - multi-turn - instruction-following - reasoning - sft - trl pipeline_tag: text-generation library_name: transformers --- # Qwen2.5-0.5B-Capybara This model is a fine-tuned version of [Qwen/Qwen2.5-0.5B](https://huggingface.co/Qwen/Qwen2.5-0.5B) on the [trl-lib/Capybara](https://huggingface.co/datasets/trl-lib/Capybara) dataset using Supervised Fine-Tuning (SFT). ## Model Description **Qwen2.5-0.5B-Capybara** is trained on the Capybara dataset, which is a high-quality multi-turn conversational dataset emphasizing: - **Reasoning and Logic**: Strong focus on extrapolation and logical thinking - **Information Diversity**: Wide range of domains including STEM, pop-culture, and general knowledge - **Multi-turn Conversations**: Average of 3+ turns per conversation with 1,000+ tokens context - **Natural Prose**: Maintains conversational flow while exploring complex topics The Capybara dataset was created using the Amplify-Instruct synthesis method, combining techniques from Airoboros, Evol-Instruct (WizardLM), Orca, and other high-performing datasets. ## Training Details ### Training Configuration | Parameter | Value | |-----------|-------| | Base Model | Qwen/Qwen2.5-0.5B | | Training Method | Supervised Fine-Tuning (SFT) | | Dataset | trl-lib/Capybara (15,806 samples) | | Epochs | 3 | | Batch Size | 8 | | Gradient Accumulation | 4 | | Effective Batch Size | 32 | | Learning Rate | 2e-5 | | LR Scheduler | Linear | | Precision | BF16 | | Max Sequence Length | 1024 | | Optimizer | AdamW (fused) | ### Memory Optimizations - **Liger Kernel**: Enabled for ~60% VRAM reduction - **Gradient Checkpointing**: Enabled - **BF16 Mixed Precision**: Enabled ### Training Infrastructure - Framework: [TRL](https://github.com/huggingface/trl) (Transformer Reinforcement Learning) - Hardware: Single GPU (8GB VRAM) - Training Time: ~5 minutes per step ### Training Progress > **Note**: This is checkpoint at step 42 of 1,482 total steps (~3% of training). This is an early checkpoint from an ongoing training run. | Metric | Value | |--------|-------| | Steps Completed | 42 / 1,482 | | Training Progress | ~3% | | Final Loss (step 42) | 1.3058 | | Initial Loss (step 1) | 1.9030 | ## Dataset Information The [trl-lib/Capybara](https://huggingface.co/datasets/trl-lib/Capybara) dataset contains: - **15,806 training samples** of multi-turn conversations - **Sources**: GPT4LLM, GOAT, EverythingLM, Know-Logic, SuperCOT, Airoboros, Dove, TheoremQA, TaskSource, General-Instruct - **Format**: Conversation messages with user/assistant roles - **Quality**: Aggressively filtered to remove alignment artifacts and common undesirable behaviors ## Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained("BurnyCoder/Qwen2.5-0.5B-Capybara") tokenizer = AutoTokenizer.from_pretrained("BurnyCoder/Qwen2.5-0.5B-Capybara") messages = [ {"role": "user", "content": "Explain the concept of recursion in programming."} ] text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(text, return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=512) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Limitations - This is an early checkpoint (~3% trained) and may not reflect full model capabilities - Model size is 494M parameters - suitable for edge deployment but limited compared to larger models - Training was conducted on a single GPU with memory optimizations ## Citation If you use this model, please cite: ```bibtex @misc{qwen2.5-0.5b-capybara, author = {BurnyCoder}, title = {Qwen2.5-0.5B-Capybara: Multi-turn Conversational Fine-tuning}, year = {2026}, publisher = {Hugging Face}, url = {https://huggingface.co/BurnyCoder/Qwen2.5-0.5B-Capybara} } ``` ## Acknowledgments - [Qwen Team](https://huggingface.co/Qwen) for the base model - [TRL Library](https://github.com/huggingface/trl) for the training framework - [Capybara Dataset](https://huggingface.co/datasets/trl-lib/Capybara) creators