ποΈ Granite-3.1-2b-TwoWheeler
This model is a fine-tuned version of IBM Granite 3.1 2B Instruct, specialized in Two-Wheeler knowledge (Motorcycles, Scooters, Superbikes, and Maintenance).
It was trained using Unsloth on a custom dataset to provide expert advice on bike specifications, comparisons, comparisons, and troubleshooting.
π Included Files
This repository contains the full merged model and quantized versions:
| Filename | Type | Description |
|---|---|---|
model.safetensors |
Full Model | The unquantized, merged weights. Use this for Python/Transformers training or inference. |
granite-3.1-2b-instruct.Q4_K_M.gguf |
GGUF (Q4) | Recommended. 4-bit quantized. Fast & efficient (runs on 4GB+ RAM). |
granite-3.1-2b-instruct.F16.gguf |
GGUF (FP16) | High-precision uncompressed GGUF. Best quality but larger file size. |
π» How to Use (GGUF / Llama.cpp / LM Studio)
You can use the .gguf files directly in LM Studio, Ollama, or llama.cpp.
CLI Command (llama.cpp):
./llama-cli -m granite-3.1-2b-instruct.Q4_K_M.gguf -p "User: Which bike is best for daily city commute with high mileage in India?\nAssistant:" -cnv
System Prompt (Recommended):
You are an expert automotive assistant specializing in two-wheelers. Provide detailed specifications, comparisons, and maintenance advice for bikes and scooters.
π How to Use (Python / Transformers)
Since the model is merged, you can load it directly without needing LoRA adapters.
Python
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "Prithwiraj731/Granite-3.1-2b-TwoWheeler"
# Load Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
device_map="auto",
torch_dtype=torch.float16 # Use bfloat16 if your GPU supports it
)
# Format the Prompt (Granite Chat Format)
messages = [
{"role": "user", "content": "Compare the Royal Enfield Classic 350 vs Honda CB350. Which one has better vibrations?"}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
# Generate
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
π§ Training Details
Base Architecture: IBM Granite 3.1 (2 Billion Parameters)
Framework: Unsloth (PyTorch)
Quantization: Q4_K_M & FP16 GGUF
Fine-tuning Method: Full LoRA Merge (16-bit)
Dataset Focus: Technical specifications, riding comfort, mileage, and maintenance of two-wheelers.
Finetuned with β€οΈ using Unsloth.
- Downloads last month
- 59