Uploaded Model
- Developed by: Harsha901
- License: apache-2.0
- Finetuned from model: unsloth/Qwen3-4B-Instruct-2507
This Qwen3 model was trained ~2× faster using Unsloth and Hugging Face’s TRL library.
📌 Model Overview
Qwen3-4B-Inst-Math-Reasoning-SFT is a supervised fine-tuned (SFT) variant of Qwen3-4B-Instruct, optimized for mathematical reasoning and step-by-step problem solving.
The model is trained to follow instructions precisely while producing clear, logically structured reasoning chains, making it suitable for:
- Math problem solving
- Educational assistants
- Reasoning benchmarks
- Downstream alignment (DPO / RLHF)
🧠 Key Capabilities
- Multi-step mathematical reasoning
- Algebra, arithmetic, and word problems
- Chain-of-thought style explanations
- Improved instruction adherence
- More stable reasoning compared to the base model
🏗️ Model Architecture
- Architecture: Decoder-only Transformer (Causal LM)
- Parameters: ~4B
- Base Model: Qwen3-4B-Instruct (Unsloth optimized)
- Tokenization: Qwen tokenizer
- Context Length: Same as base model
📚 Training Data
The model was fine-tuned on a curated dataset consisting of:
- Instruction-style math prompts
- Step-by-step mathematical solutions
- Reasoning-focused explanations
Data was filtered to emphasize:
- Logical consistency
- Clear intermediate steps
- Reduced ambiguity in solutions
While care was taken to ensure quality, the dataset may still contain noise or biases present in public mathematical corpora.
⚙️ Training Details
- Fine-tuning Method: Supervised Fine-Tuning (SFT)
- Frameworks: Hugging Face Transformers + TRL
- Acceleration: Unsloth (memory-efficient & faster training)
- Precision: FP16 / BF16 (hardware dependent)
- Optimizer: AdamW
- Loss Function: Cross-entropy
- Batching: Gradient accumulation for memory efficiency
🚀 Usage
Load the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto"
)
Example Inference
prompt = "Solve step by step: If 5x − 10 = 15, find x."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.2,
do_sample=False
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
📊 Evaluation
The model was evaluated qualitatively on:
- Math word problems
- Algebraic equations
- Multi-step reasoning tasks
Observed improvements vs base model:
- Better structured reasoning
- More consistent intermediate steps
- Fewer incomplete solutions
Formal benchmark results (e.g., GSM8K, MATH) are planned for future updates.
⚠️ Limitations
- Not guaranteed to be mathematically correct in all cases
- Can be verbose due to reasoning-style outputs
- Not optimized for creative or non-technical writing
- Performance may degrade on extremely long or ambiguous prompts
🔐 Ethical & Responsible Use
- Intended for research and educational purposes
- Outputs should be verified for correctness in critical applications
- Not suitable for high-stakes decision-making without human oversight
📜 License
Released under the Apache 2.0 License, consistent with the base Qwen3 model.
🙌 Acknowledgements
- Qwen Team for the base Qwen3 architecture
- Unsloth for efficient fine-tuning optimizations
- Hugging Face for Transformers and TRL
✉️ Author
Harsha Vardhan Mannem AI / ML Engineer Hugging Face & GitHub: Harsha901
🔮 Future Work
- Preference tuning with DPO
- Quantized inference (4-bit / 8-bit)
- Benchmark-based evaluation
- Deployment-optimized variants
- Downloads last month
- 73
Model tree for Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT
Base model
Qwen/Qwen3-4B-Instruct-2507