|
|
--- |
|
|
base_model: unsloth/Qwen3-4B-Base |
|
|
library_name: peft |
|
|
pipeline_tag: text-generation |
|
|
tags: |
|
|
- base_model:adapter:unsloth/Qwen3-4B-Base |
|
|
- grpo |
|
|
- lora |
|
|
- sft |
|
|
- transformers |
|
|
- trl |
|
|
- unsloth |
|
|
license: other |
|
|
datasets: |
|
|
- open-r1/OpenR1-Math-220k |
|
|
language: |
|
|
- pt |
|
|
- en |
|
|
--- |
|
|
|
|
|
# Model Card for DogeAI-v2.0-4B-Reasoning-LoRA |
|
|
|
|
|
This repository contains a LoRA (Low-Rank Adaptation) fine-tuned on top of Qwen3-4B-Base, focused on improving reasoning, chain-of-thought coherence, and analytical responses. |
|
|
The LoRA was trained using curated thinking-style datasets on Kaggle with the goal of enhancing logical consistency rather than factual memorization. |
|
|
|
|
|
# Model Details |
|
|
# Model Description |
|
|
|
|
|
This is a reasoning-oriented LoRA adapter designed to be applied to Qwen3-4B-Base. |
|
|
The training emphasizes structured thinking, multi-step reasoning, and clearer internal deliberation in responses. |
|
|
|
|
|
Developed by: AxionLab-Co |
|
|
|
|
|
Model type: LoRA adapter (PEFT) |
|
|
|
|
|
Language(s) (NLP): Primarily English |
|
|
|
|
|
License: Apache 2.0 (inherits base model license) |
|
|
|
|
|
Finetuned from model: Qwen3-4B-Base |
|
|
|
|
|
Model Sources |
|
|
|
|
|
Base Model: Qwen3-4B-Base |
|
|
|
|
|
Training Platform: Kaggle |
|
|
|
|
|
Frameworks: PyTorch, PEFT, Unsloth |
|
|
|
|
|
# Uses |
|
|
# Direct Use |
|
|
|
|
|
This LoRA is intended to be merged or loaded on top of Qwen3-4B-Base to improve: |
|
|
|
|
|
Logical reasoning |
|
|
|
|
|
Step-by-step problem solving |
|
|
|
|
|
Analytical and structured responses |
|
|
|
|
|
“Thinking-style” outputs for research and experimentation |
|
|
|
|
|
# Downstream Use |
|
|
|
|
|
Merging into a full model for GGUF or standard HF release |
|
|
|
|
|
Further fine-tuning on domain-specific reasoning tasks |
|
|
|
|
|
Research on symbolic + neural reasoning hybrids |
|
|
|
|
|
# Out-of-Scope Use |
|
|
|
|
|
Safety-critical decision making |
|
|
|
|
|
Medical, legal, or financial advice |
|
|
|
|
|
Tasks requiring guaranteed factual correctness |
|
|
|
|
|
Bias, Risks, and Limitations |
|
|
|
|
|
The model may overproduce reasoning steps, even when not strictly required |
|
|
|
|
|
Reasoning quality depends heavily on the base model (Qwen3-4B-Base) |
|
|
|
|
|
No formal safety fine-tuning was applied beyond the base model |
|
|
|
|
|
Possible amplification of biases present in the original training data |
|
|
|
|
|
# Recommendations |
|
|
|
|
|
# Users should: |
|
|
|
|
|
Apply external safety layers if deploying in production |
|
|
|
|
|
Evaluate outputs critically, especially for sensitive topics |
|
|
|
|
|
Avoid assuming reasoning chains are always correct |
|
|
|
|
|
How to Get Started with the Model |
|
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
from peft import PeftModel |
|
|
|
|
|
|
|
|
base_model = AutoModelForCausalLM.from_pretrained( |
|
|
"Qwen/Qwen3-4B-Base", |
|
|
device_map="auto", |
|
|
load_in_4bit=True |
|
|
) |
|
|
|
|
|
|
|
|
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Base") |
|
|
|
|
|
|
|
|
model = PeftModel.from_pretrained( |
|
|
base_model, |
|
|
"AxionLab-Co/DogeAI-v2.0-4B-Reasoning-LoRA" |
|
|
) |
|
|
# Training Details |
|
|
# Training Data |
|
|
|
|
|
The LoRA was trained on thinking-oriented datasets, focusing on: |
|
|
|
|
|
Chain-of-thought style reasoning |
|
|
|
|
|
Logical explanations |
|
|
|
|
|
Multi-step analytical prompts |
|
|
|
|
|
The datasets were curated and preprocessed manually for quality and consistency. |
|
|
|
|
|
# Training Procedure |
|
|
# Preprocessing |
|
|
|
|
|
Tokenization using the base Qwen tokenizer |
|
|
|
|
|
Filtering of low-quality or malformed reasoning examples |
|
|
|
|
|
Training Hyperparameters |
|
|
|
|
|
Training regime: fp16 mixed precision |
|
|
|
|
|
Fine-tuning method: LoRA (PEFT) |
|
|
|
|
|
Optimizer: AdamW |
|
|
|
|
|
Framework: Unsloth |
|
|
|
|
|
Speeds, Sizes, Times |
|
|
|
|
|
Training performed on Kaggle GPU environment |
|
|
|
|
|
LoRA size kept intentionally lightweight for fast loading and merging |
|
|
|
|
|
# Evaluation |
|
|
Testing Data, Factors & Metrics |
|
|
Testing Data |
|
|
|
|
|
Internal prompt-based reasoning tests |
|
|
|
|
|
Synthetic reasoning benchmarks (qualitative) |
|
|
|
|
|
# Factors |
|
|
|
|
|
Multi-step logic consistency |
|
|
|
|
|
Response clarity |
|
|
|
|
|
Hallucination tendencies |
|
|
|
|
|
Metrics |
|
|
|
|
|
Qualitative human evaluation |
|
|
|
|
|
Prompt-level comparison against base model |
|
|
|
|
|
# Results |
|
|
|
|
|
The LoRA shows clear improvements in reasoning depth and structure compared to the base model, especially on analytical prompts. |
|
|
|
|
|
Environmental Impact |
|
|
|
|
|
Hardware Type: NVIDIA GPU (Kaggle) |
|
|
|
|
|
Hours used: Few hours (single-session fine-tuning) |
|
|
|
|
|
Cloud Provider: Kaggle |
|
|
|
|
|
Compute Region: Unknown |
|
|
|
|
|
Carbon Emitted: Not formally measured |
|
|
|
|
|
# Technical Specifications |
|
|
# Model Architecture and Objective |
|
|
|
|
|
Transformer-based decoder-only architecture |
|
|
|
|
|
Objective: enhance reasoning behavior via parameter-efficient fine-tuning |
|
|
|
|
|
Compute Infrastructure |
|
|
Hardware |
|
|
|
|
|
Kaggle-provided NVIDIA GPU |
|
|
|
|
|
Software |
|
|
|
|
|
PyTorch |
|
|
|
|
|
Transformers |
|
|
|
|
|
PEFT 0.18.1 |
|
|
|
|
|
Unsloth |
|
|
|
|
|
Citation |
|
|
|
|
|
If you use this LoRA in research or derivative works, please cite the base model and this repository. |
|
|
|
|
|
# Model Card Authors |
|
|
|
|
|
**AxionLab-Co** |
|
|
|
|
|
# Model Card Contact |
|
|
|
|
|
For questions, experiments, or collaboration: |
|
|
**AxionLab-Co on Hugging Face** |