Prithwiraj731
/

Granite-3.1-2b-TwoWheeler

+---
+license: apache-2.0
+language:
+- en
+base_model:
+- ibm-granite/granite-3.1-2b-instruct
+tags:
+- unsloth
+- text-generation
+- granite
+- two-wheeler
+- bikes
+- motorcycles
+- scooters
+- automotive
+- gguf
+- safetensors
+---
+# 🏍️ Granite-3.1-2b-TwoWheeler
+This model is a fine-tuned version of **[IBM Granite 3.1 2B Instruct](https://huggingface.co/ibm-granite/granite-3.1-2b-instruct)**, specialized in **Two-Wheeler** knowledge (Motorcycles, Scooters, Superbikes, and Maintenance).
+It was trained using **[Unsloth](https://github.com/unslothai/unsloth)** on a custom dataset to provide expert advice on bike specifications, comparisons, comparisons, and troubleshooting.
+## 📂 Included Files
+This repository contains the full merged model and quantized versions:
+| Filename | Type | Description |
+|:--- |:--- |:--- |
+| `model.safetensors` | **Full Model** | The unquantized, merged weights. Use this for Python/Transformers training or inference. |
+| `granite-3.1-2b-instruct.Q4_K_M.gguf` | **GGUF (Q4)** | **Recommended.** 4-bit quantized. Fast & efficient (runs on 4GB+ RAM). |
+| `granite-3.1-2b-instruct.F16.gguf` | **GGUF (FP16)** | High-precision uncompressed GGUF. Best quality but larger file size. |
+---
+## 💻 How to Use (GGUF / Llama.cpp / LM Studio)
+You can use the `.gguf` files directly in **LM Studio**, **Ollama**, or **llama.cpp**.
+**CLI Command (llama.cpp):**
+```bash
+./llama-cli -m granite-3.1-2b-instruct.Q4_K_M.gguf -p "User: Which bike is best for daily city commute with high mileage in India?\nAssistant:" -cnv
+System Prompt (Recommended):
+    You are an expert automotive assistant specializing in two-wheelers. Provide detailed specifications, comparisons, and maintenance advice for bikes and scooters.
+🐍 How to Use (Python / Transformers)
+Since the model is merged, you can load it directly without needing LoRA adapters.
+Python
+from transformers import AutoModelForCausalLM, AutoTokenizer
+import torch
+model_name = "Prithwiraj731/Granite-3.1-2b-TwoWheeler"
+# Load Tokenizer & Model
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForCausalLM.from_pretrained(
+    model_name,
+    device_map="auto",
+    torch_dtype=torch.float16 # Use bfloat16 if your GPU supports it
+)
+# Format the Prompt (Granite Chat Format)
+messages = [
+    {"role": "user", "content": "Compare the Royal Enfield Classic 350 vs Honda CB350. Which one has better vibrations?"}
+]
+input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+# Generate
+inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
+outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
+print(tokenizer.decode(outputs[0], skip_special_tokens=True))
+🔧 Training Details
+    Base Architecture: IBM Granite 3.1 (2 Billion Parameters)
+    Framework: Unsloth (PyTorch)
+    Quantization: Q4_K_M & FP16 GGUF
+    Fine-tuning Method: Full LoRA Merge (16-bit)
+    Dataset Focus: Technical specifications, riding comfort, mileage, and maintenance of two-wheelers.
+Finetuned with ❤️ using Unsloth.