--- base_model: unsloth/Qwen3-4B-Base library_name: peft pipeline_tag: text-generation tags: - base_model:adapter:unsloth/Qwen3-4B-Base - grpo - lora - sft - transformers - trl - unsloth license: other datasets: - open-r1/OpenR1-Math-220k language: - pt - en --- # Model Card for DogeAI-v2.0-4B-Reasoning-LoRA This repository contains a LoRA (Low-Rank Adaptation) fine-tuned on top of Qwen3-4B-Base, focused on improving reasoning, chain-of-thought coherence, and analytical responses. The LoRA was trained using curated thinking-style datasets on Kaggle with the goal of enhancing logical consistency rather than factual memorization. # Model Details # Model Description This is a reasoning-oriented LoRA adapter designed to be applied to Qwen3-4B-Base. The training emphasizes structured thinking, multi-step reasoning, and clearer internal deliberation in responses. Developed by: AxionLab-Co Model type: LoRA adapter (PEFT) Language(s) (NLP): Primarily English License: Apache 2.0 (inherits base model license) Finetuned from model: Qwen3-4B-Base Model Sources Base Model: Qwen3-4B-Base Training Platform: Kaggle Frameworks: PyTorch, PEFT, Unsloth # Uses # Direct Use This LoRA is intended to be merged or loaded on top of Qwen3-4B-Base to improve: Logical reasoning Step-by-step problem solving Analytical and structured responses “Thinking-style” outputs for research and experimentation # Downstream Use Merging into a full model for GGUF or standard HF release Further fine-tuning on domain-specific reasoning tasks Research on symbolic + neural reasoning hybrids # Out-of-Scope Use Safety-critical decision making Medical, legal, or financial advice Tasks requiring guaranteed factual correctness Bias, Risks, and Limitations The model may overproduce reasoning steps, even when not strictly required Reasoning quality depends heavily on the base model (Qwen3-4B-Base) No formal safety fine-tuning was applied beyond the base model Possible amplification of biases present in the original training data # Recommendations # Users should: Apply external safety layers if deploying in production Evaluate outputs critically, especially for sensitive topics Avoid assuming reasoning chains are always correct How to Get Started with the Model from transformers import AutoModelForCausalLM, AutoTokenizer from peft import PeftModel base_model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen3-4B-Base", device_map="auto", load_in_4bit=True ) tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Base") model = PeftModel.from_pretrained( base_model, "AxionLab-Co/DogeAI-v2.0-4B-Reasoning-LoRA" ) # Training Details # Training Data The LoRA was trained on thinking-oriented datasets, focusing on: Chain-of-thought style reasoning Logical explanations Multi-step analytical prompts The datasets were curated and preprocessed manually for quality and consistency. # Training Procedure # Preprocessing Tokenization using the base Qwen tokenizer Filtering of low-quality or malformed reasoning examples Training Hyperparameters Training regime: fp16 mixed precision Fine-tuning method: LoRA (PEFT) Optimizer: AdamW Framework: Unsloth Speeds, Sizes, Times Training performed on Kaggle GPU environment LoRA size kept intentionally lightweight for fast loading and merging # Evaluation Testing Data, Factors & Metrics Testing Data Internal prompt-based reasoning tests Synthetic reasoning benchmarks (qualitative) # Factors Multi-step logic consistency Response clarity Hallucination tendencies Metrics Qualitative human evaluation Prompt-level comparison against base model # Results The LoRA shows clear improvements in reasoning depth and structure compared to the base model, especially on analytical prompts. Environmental Impact Hardware Type: NVIDIA GPU (Kaggle) Hours used: Few hours (single-session fine-tuning) Cloud Provider: Kaggle Compute Region: Unknown Carbon Emitted: Not formally measured # Technical Specifications # Model Architecture and Objective Transformer-based decoder-only architecture Objective: enhance reasoning behavior via parameter-efficient fine-tuning Compute Infrastructure Hardware Kaggle-provided NVIDIA GPU Software PyTorch Transformers PEFT 0.18.1 Unsloth Citation If you use this LoRA in research or derivative works, please cite the base model and this repository. # Model Card Authors **AxionLab-Co** # Model Card Contact For questions, experiments, or collaboration: **AxionLab-Co on Hugging Face**