Instructions to use Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT") model = AutoModelForCausalLM.from_pretrained("Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT
- SGLang
How to use Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Unsloth Studio new
How to use Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT to start chatting
Load model with FastModel
pip install unsloth from unsloth import FastModel model, tokenizer = FastModel.from_pretrained( model_name="Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT", max_seq_length=2048, ) - Docker Model Runner
How to use Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT with Docker Model Runner:
docker model run hf.co/Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT
Uploaded Model
- Developed by: Harsha901
- License: apache-2.0
- Finetuned from model: unsloth/Qwen3-4B-Instruct-2507
This Qwen3 model was trained ~2× faster using Unsloth and Hugging Face’s TRL library.
📌 Model Overview
Qwen3-4B-Inst-Math-Reasoning-SFT is a supervised fine-tuned (SFT) variant of Qwen3-4B-Instruct, optimized for mathematical reasoning and step-by-step problem solving.
The model is trained to follow instructions precisely while producing clear, logically structured reasoning chains, making it suitable for:
- Math problem solving
- Educational assistants
- Reasoning benchmarks
- Downstream alignment (DPO / RLHF)
🧠 Key Capabilities
- Multi-step mathematical reasoning
- Algebra, arithmetic, and word problems
- Chain-of-thought style explanations
- Improved instruction adherence
- More stable reasoning compared to the base model
🏗️ Model Architecture
- Architecture: Decoder-only Transformer (Causal LM)
- Parameters: ~4B
- Base Model: Qwen3-4B-Instruct (Unsloth optimized)
- Tokenization: Qwen tokenizer
- Context Length: Same as base model
📚 Training Data
The model was fine-tuned on a curated dataset consisting of:
- Instruction-style math prompts
- Step-by-step mathematical solutions
- Reasoning-focused explanations
Data was filtered to emphasize:
- Logical consistency
- Clear intermediate steps
- Reduced ambiguity in solutions
While care was taken to ensure quality, the dataset may still contain noise or biases present in public mathematical corpora.
⚙️ Training Details
- Fine-tuning Method: Supervised Fine-Tuning (SFT)
- Frameworks: Hugging Face Transformers + TRL
- Acceleration: Unsloth (memory-efficient & faster training)
- Precision: FP16 / BF16 (hardware dependent)
- Optimizer: AdamW
- Loss Function: Cross-entropy
- Batching: Gradient accumulation for memory efficiency
🚀 Usage
Load the Model
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
device_map="auto",
torch_dtype="auto"
)
Example Inference
prompt = "Solve step by step: If 5x − 10 = 15, find x."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=256,
temperature=0.2,
do_sample=False
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
📊 Evaluation
The model was evaluated qualitatively on:
- Math word problems
- Algebraic equations
- Multi-step reasoning tasks
Observed improvements vs base model:
- Better structured reasoning
- More consistent intermediate steps
- Fewer incomplete solutions
Formal benchmark results (e.g., GSM8K, MATH) are planned for future updates.
⚠️ Limitations
- Not guaranteed to be mathematically correct in all cases
- Can be verbose due to reasoning-style outputs
- Not optimized for creative or non-technical writing
- Performance may degrade on extremely long or ambiguous prompts
🔐 Ethical & Responsible Use
- Intended for research and educational purposes
- Outputs should be verified for correctness in critical applications
- Not suitable for high-stakes decision-making without human oversight
📜 License
Released under the Apache 2.0 License, consistent with the base Qwen3 model.
🙌 Acknowledgements
- Qwen Team for the base Qwen3 architecture
- Unsloth for efficient fine-tuning optimizations
- Hugging Face for Transformers and TRL
✉️ Author
Harsha Vardhan Mannem AI / ML Engineer Hugging Face & GitHub: Harsha901
🔮 Future Work
- Preference tuning with DPO
- Quantized inference (4-bit / 8-bit)
- Benchmark-based evaluation
- Deployment-optimized variants
- Downloads last month
- 18
Model tree for Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT
Base model
Qwen/Qwen3-4B-Instruct-2507