DogeAI-v2.0-4B-Reasoning-LoRA / README.md

Update README.md

01f7d01 verified 11 days ago

4.56 kB

	---
	base_model: unsloth/Qwen3-4B-Base
	library_name: peft
	pipeline_tag: text-generation
	tags:
	- base_model:adapter:unsloth/Qwen3-4B-Base
	- grpo
	- lora
	- sft
	- transformers
	- trl
	- unsloth
	license: other
	datasets:
	- open-r1/OpenR1-Math-220k
	language:
	- pt
	- en
	---

	# Model Card for DogeAI-v2.0-4B-Reasoning-LoRA

	This repository contains a LoRA (Low-Rank Adaptation) fine-tuned on top of Qwen3-4B-Base, focused on improving reasoning, chain-of-thought coherence, and analytical responses.
	The LoRA was trained using curated thinking-style datasets on Kaggle with the goal of enhancing logical consistency rather than factual memorization.

	# Model Details
	# Model Description

	This is a reasoning-oriented LoRA adapter designed to be applied to Qwen3-4B-Base.
	The training emphasizes structured thinking, multi-step reasoning, and clearer internal deliberation in responses.

	Developed by: AxionLab-Co

	Model type: LoRA adapter (PEFT)

	Language(s) (NLP): Primarily English

	License: Apache 2.0 (inherits base model license)

	Finetuned from model: Qwen3-4B-Base

	Model Sources

	Base Model: Qwen3-4B-Base

	Training Platform: Kaggle

	Frameworks: PyTorch, PEFT, Unsloth

	# Uses
	# Direct Use

	This LoRA is intended to be merged or loaded on top of Qwen3-4B-Base to improve:

	Logical reasoning

	Step-by-step problem solving

	Analytical and structured responses

	“Thinking-style” outputs for research and experimentation

	# Downstream Use

	Merging into a full model for GGUF or standard HF release

	Further fine-tuning on domain-specific reasoning tasks

	Research on symbolic + neural reasoning hybrids

	# Out-of-Scope Use

	Safety-critical decision making

	Medical, legal, or financial advice

	Tasks requiring guaranteed factual correctness

	Bias, Risks, and Limitations

	The model may overproduce reasoning steps, even when not strictly required

	Reasoning quality depends heavily on the base model (Qwen3-4B-Base)

	No formal safety fine-tuning was applied beyond the base model

	Possible amplification of biases present in the original training data

	# Recommendations

	# Users should:

	Apply external safety layers if deploying in production

	Evaluate outputs critically, especially for sensitive topics

	Avoid assuming reasoning chains are always correct

	How to Get Started with the Model
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from peft import PeftModel


	base_model = AutoModelForCausalLM.from_pretrained(
	"Qwen/Qwen3-4B-Base",
	device_map="auto",
	load_in_4bit=True
	)


	tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-4B-Base")


	model = PeftModel.from_pretrained(
	base_model,
	"AxionLab-Co/DogeAI-v2.0-4B-Reasoning-LoRA"
	)
	# Training Details
	# Training Data

	The LoRA was trained on thinking-oriented datasets, focusing on:

	Chain-of-thought style reasoning

	Logical explanations

	Multi-step analytical prompts

	The datasets were curated and preprocessed manually for quality and consistency.

	# Training Procedure
	# Preprocessing

	Tokenization using the base Qwen tokenizer

	Filtering of low-quality or malformed reasoning examples

	Training Hyperparameters

	Training regime: fp16 mixed precision

	Fine-tuning method: LoRA (PEFT)

	Optimizer: AdamW

	Framework: Unsloth

	Speeds, Sizes, Times

	Training performed on Kaggle GPU environment

	LoRA size kept intentionally lightweight for fast loading and merging

	# Evaluation
	Testing Data, Factors & Metrics
	Testing Data

	Internal prompt-based reasoning tests

	Synthetic reasoning benchmarks (qualitative)

	# Factors

	Multi-step logic consistency

	Response clarity

	Hallucination tendencies

	Metrics

	Qualitative human evaluation

	Prompt-level comparison against base model

	# Results

	The LoRA shows clear improvements in reasoning depth and structure compared to the base model, especially on analytical prompts.

	Environmental Impact

	Hardware Type: NVIDIA GPU (Kaggle)

	Hours used: Few hours (single-session fine-tuning)

	Cloud Provider: Kaggle

	Compute Region: Unknown

	Carbon Emitted: Not formally measured

	# Technical Specifications
	# Model Architecture and Objective

	Transformer-based decoder-only architecture

	Objective: enhance reasoning behavior via parameter-efficient fine-tuning

	Compute Infrastructure
	Hardware

	Kaggle-provided NVIDIA GPU

	Software

	PyTorch

	Transformers

	PEFT 0.18.1

	Unsloth

	Citation

	If you use this LoRA in research or derivative works, please cite the base model and this repository.

	# Model Card Authors

	AxionLab-Co

	# Model Card Contact

	For questions, experiments, or collaboration:
	AxionLab-Co on Hugging Face