AxionLab-official's picture
Update README.md
01f7d01 verified
---
base_model: unsloth/Qwen3-4B-Base
library_name: peft
pipeline_tag: text-generation
tags:
- base_model:adapter:unsloth/Qwen3-4B-Base
- grpo
- lora
- sft
- transformers
- trl
- unsloth
license: other
datasets:
- open-r1/OpenR1-Math-220k
language:
- pt
- en
---
# Model Card for DogeAI-v2.0-4B-Reasoning-LoRA
This repository contains a LoRA (Low-Rank Adaptation) fine-tuned on top of Qwen3-4B-Base, focused on improving reasoning, chain-of-thought coherence, and analytical responses.
The LoRA was trained using curated thinking-style datasets on Kaggle with the goal of enhancing logical consistency rather than factual memorization.
# Model Details
# Model Description
This is a reasoning-oriented LoRA adapter designed to be applied to Qwen3-4B-Base.
The training emphasizes structured thinking, multi-step reasoning, and clearer internal deliberation in responses.
Developed by: AxionLab-Co
Model type: LoRA adapter (PEFT)
Language(s) (NLP): Primarily English
License: Apache 2.0 (inherits base model license)
Finetuned from model: Qwen3-4B-Base
Model Sources
Base Model: Qwen3-4B-Base
Training Platform: Kaggle
Frameworks: PyTorch, PEFT, Unsloth
# Uses
# Direct Use
This LoRA is intended to be merged or loaded on top of Qwen3-4B-Base to improve:
Logical reasoning
Step-by-step problem solving
Analytical and structured responses
“Thinking-style” outputs for research and experimentation
# Downstream Use
Merging into a full model for GGUF or standard HF release
Further fine-tuning on domain-specific reasoning tasks
Research on symbolic + neural reasoning hybrids
# Out-of-Scope Use
Safety-critical decision making
Medical, legal, or financial advice
Tasks requiring guaranteed factual correctness
Bias, Risks, and Limitations
The model may overproduce reasoning steps, even when not strictly required
Reasoning quality depends heavily on the base model (Qwen3-4B-Base)
No formal safety fine-tuning was applied beyond the base model
Possible amplification of biases present in the original training data
# Recommendations
# Users should:
Apply external safety layers if deploying in production
Evaluate outputs critically, especially for sensitive topics
Avoid assuming reasoning chains are always correct
How to Get Started with the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-4B-Base",
device_map="auto",
load_in_4bit=True
)
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Base")
model = PeftModel.from_pretrained(
base_model,
"AxionLab-Co/DogeAI-v2.0-4B-Reasoning-LoRA"
)
# Training Details
# Training Data
The LoRA was trained on thinking-oriented datasets, focusing on:
Chain-of-thought style reasoning
Logical explanations
Multi-step analytical prompts
The datasets were curated and preprocessed manually for quality and consistency.
# Training Procedure
# Preprocessing
Tokenization using the base Qwen tokenizer
Filtering of low-quality or malformed reasoning examples
Training Hyperparameters
Training regime: fp16 mixed precision
Fine-tuning method: LoRA (PEFT)
Optimizer: AdamW
Framework: Unsloth
Speeds, Sizes, Times
Training performed on Kaggle GPU environment
LoRA size kept intentionally lightweight for fast loading and merging
# Evaluation
Testing Data, Factors & Metrics
Testing Data
Internal prompt-based reasoning tests
Synthetic reasoning benchmarks (qualitative)
# Factors
Multi-step logic consistency
Response clarity
Hallucination tendencies
Metrics
Qualitative human evaluation
Prompt-level comparison against base model
# Results
The LoRA shows clear improvements in reasoning depth and structure compared to the base model, especially on analytical prompts.
Environmental Impact
Hardware Type: NVIDIA GPU (Kaggle)
Hours used: Few hours (single-session fine-tuning)
Cloud Provider: Kaggle
Compute Region: Unknown
Carbon Emitted: Not formally measured
# Technical Specifications
# Model Architecture and Objective
Transformer-based decoder-only architecture
Objective: enhance reasoning behavior via parameter-efficient fine-tuning
Compute Infrastructure
Hardware
Kaggle-provided NVIDIA GPU
Software
PyTorch
Transformers
PEFT 0.18.1
Unsloth
Citation
If you use this LoRA in research or derivative works, please cite the base model and this repository.
# Model Card Authors
**AxionLab-Co**
# Model Card Contact
For questions, experiments, or collaboration:
**AxionLab-Co on Hugging Face**