Update README.md

Browse files

Updated Model card

Files changed (1) hide show

README.md +177 -5

README.md CHANGED Viewed

@@ -6,17 +6,189 @@ tags:
 - unsloth
 - qwen3
 - trl
 license: apache-2.0
 language:
 - en
 ---
-# Uploaded  model
-- **Developed by:** Harsha901
-- **License:** apache-2.0
-- **Finetuned from model :** unsloth/Qwen3-4B-Instruct-2507
-This qwen3 model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)

 - unsloth
 - qwen3
 - trl
+- math-reasoning
+- instruction-tuned
+- supervised-finetuning
+- chain-of-thought
+- reasoning
+- mathematics
+- causal-lm
 license: apache-2.0
 language:
 - en
+pipeline_tag: text-generation
 ---
+# Uploaded Model
+- **Developed by:** Harsha901
+- **License:** apache-2.0
+- **Finetuned from model:** unsloth/Qwen3-4B-Instruct-2507
+This Qwen3 model was trained **~2× faster** using [Unsloth](https://github.com/unslothai/unsloth) and Hugging Face’s **TRL** library.
 [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
+---
+## 📌 Model Overview
+**Qwen3-4B-Inst-Math-Reasoning-SFT** is a **supervised fine-tuned (SFT)** variant of **Qwen3-4B-Instruct**, optimized for **mathematical reasoning and step-by-step problem solving**.
+The model is trained to follow instructions precisely while producing **clear, logically structured reasoning chains**, making it suitable for:
+- Math problem solving
+- Educational assistants
+- Reasoning benchmarks
+- Downstream alignment (DPO / RLHF)
+---
+## 🧠 Key Capabilities
+- Multi-step mathematical reasoning
+- Algebra, arithmetic, and word problems
+- Chain-of-thought style explanations
+- Improved instruction adherence
+- More stable reasoning compared to the base model
+---
+## 🏗️ Model Architecture
+- **Architecture:** Decoder-only Transformer (Causal LM)
+- **Parameters:** ~4B
+- **Base Model:** Qwen3-4B-Instruct (Unsloth optimized)
+- **Tokenization:** Qwen tokenizer
+- **Context Length:** Same as base model
+---
+## 📚 Training Data
+The model was fine-tuned on a curated dataset consisting of:
+- Instruction-style math prompts
+- Step-by-step mathematical solutions
+- Reasoning-focused explanations
+Data was filtered to emphasize:
+- Logical consistency
+- Clear intermediate steps
+- Reduced ambiguity in solutions
+> While care was taken to ensure quality, the dataset may still contain noise or biases present in public mathematical corpora.
+---
+## ⚙️ Training Details
+- **Fine-tuning Method:** Supervised Fine-Tuning (SFT)
+- **Frameworks:** Hugging Face Transformers + TRL
+- **Acceleration:** Unsloth (memory-efficient & faster training)
+- **Precision:** FP16 / BF16 (hardware dependent)
+- **Optimizer:** AdamW
+- **Loss Function:** Cross-entropy
+- **Batching:** Gradient accumulation for memory efficiency
+---
+## 🚀 Usage
+### Load the Model
+```python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+model_id = "Harsha901/Qwen3-4B-Inst-Math-Reasoning-SFT"
+tokenizer = AutoTokenizer.from_pretrained(model_id)
+model = AutoModelForCausalLM.from_pretrained(
+    model_id,
+    device_map="auto",
+    torch_dtype="auto"
+)
+````
+### Example Inference
+```python
+prompt = "Solve step by step: If 5x − 10 = 15, find x."
+inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
+outputs = model.generate(
+    **inputs,
+    max_new_tokens=256,
+    temperature=0.2,
+    do_sample=False
+)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+```
+---
+## 📊 Evaluation
+The model was evaluated qualitatively on:
+* Math word problems
+* Algebraic equations
+* Multi-step reasoning tasks
+**Observed improvements vs base model:**
+* Better structured reasoning
+* More consistent intermediate steps
+* Fewer incomplete solutions
+Formal benchmark results (e.g., GSM8K, MATH) are planned for future updates.
+---
+## ⚠️ Limitations
+* Not guaranteed to be mathematically correct in all cases
+* Can be verbose due to reasoning-style outputs
+* Not optimized for creative or non-technical writing
+* Performance may degrade on extremely long or ambiguous prompts
+---
+## 🔐 Ethical & Responsible Use
+* Intended for **research and educational purposes**
+* Outputs should be verified for correctness in critical applications
+* Not suitable for high-stakes decision-making without human oversight
+---
+## 📜 License
+Released under the **Apache 2.0 License**, consistent with the base Qwen3 model.
+---
+## 🙌 Acknowledgements
+* **Qwen Team** for the base Qwen3 architecture
+* **Unsloth** for efficient fine-tuning optimizations
+* **Hugging Face** for Transformers and TRL
+---
+## ✉️ Author
+**Harsha Vardhan Mannem**
+AI / ML Engineer
+Hugging Face & GitHub: **Harsha901**
+---
+## 🔮 Future Work
+* Preference tuning with DPO
+* Quantized inference (4-bit / 8-bit)
+* Benchmark-based evaluation
+* Deployment-optimized variants