Uploaded Model

  • Developed by: Harsha901
  • License: apache-2.0
  • Finetuned from model: unsloth/Qwen3-4B-Instruct-2507

This Qwen3 model was trained ~2× faster using Unsloth and Hugging Face’s TRL library.


📌 Model Overview

Qwen3-4B-Inst-Math-Reasoning-SFT is a supervised fine-tuned (SFT) variant of Qwen3-4B-Instruct, optimized for mathematical reasoning and step-by-step problem solving.

The model is trained to follow instructions precisely while producing clear, logically structured reasoning chains, making it suitable for:

  • Math problem solving
  • Educational assistants
  • Reasoning benchmarks
  • Downstream alignment (DPO / RLHF)

🧠 Key Capabilities

  • Multi-step mathematical reasoning
  • Algebra, arithmetic, and word problems
  • Chain-of-thought style explanations
  • Improved instruction adherence
  • More stable reasoning compared to the base model

🏗️ Model Architecture

  • Architecture: Decoder-only Transformer (Causal LM)
  • Parameters: ~4B
  • Base Model: Qwen3-4B-Instruct (Unsloth optimized)
  • Tokenization: Qwen tokenizer
  • Context Length: Same as base model

📚 Training Data

The model was fine-tuned on a curated dataset consisting of:

  • Instruction-style math prompts
  • Step-by-step mathematical solutions
  • Reasoning-focused explanations

Data was filtered to emphasize:

  • Logical consistency
  • Clear intermediate steps
  • Reduced ambiguity in solutions

While care was taken to ensure quality, the dataset may still contain noise or biases present in public mathematical corpora.


⚙️ Training Details

  • Fine-tuning Method: Supervised Fine-Tuning (SFT)
  • Frameworks: Hugging Face Transformers + TRL
  • Acceleration: Unsloth (memory-efficient & faster training)
  • Precision: FP16 / BF16 (hardware dependent)
  • Optimizer: AdamW
  • Loss Function: Cross-entropy
  • Batching: Gradient accumulation for memory efficiency

🚀 Usage

Load the Model

from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype="auto"
)

Example Inference

prompt = "Solve step by step: If 5x − 10 = 15, find x."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=256,
    temperature=0.2,
    do_sample=False
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

📊 Evaluation

The model was evaluated qualitatively on:

  • Math word problems
  • Algebraic equations
  • Multi-step reasoning tasks

Observed improvements vs base model:

  • Better structured reasoning
  • More consistent intermediate steps
  • Fewer incomplete solutions

Formal benchmark results (e.g., GSM8K, MATH) are planned for future updates.


⚠️ Limitations

  • Not guaranteed to be mathematically correct in all cases
  • Can be verbose due to reasoning-style outputs
  • Not optimized for creative or non-technical writing
  • Performance may degrade on extremely long or ambiguous prompts

🔐 Ethical & Responsible Use

  • Intended for research and educational purposes
  • Outputs should be verified for correctness in critical applications
  • Not suitable for high-stakes decision-making without human oversight

📜 License

Released under the Apache 2.0 License, consistent with the base Qwen3 model.


🙌 Acknowledgements

  • Qwen Team for the base Qwen3 architecture
  • Unsloth for efficient fine-tuning optimizations
  • Hugging Face for Transformers and TRL

✉️ Author

Harsha Vardhan Mannem AI / ML Engineer Hugging Face & GitHub: Harsha901


🔮 Future Work

  • Preference tuning with DPO
  • Quantized inference (4-bit / 8-bit)
  • Benchmark-based evaluation
  • Deployment-optimized variants
Downloads last month
73
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT

Finetuned
(189)
this model