🏍️ Granite-3.1-2b-TwoWheeler

This model is a fine-tuned version of IBM Granite 3.1 2B Instruct, specialized in Two-Wheeler knowledge (Motorcycles, Scooters, Superbikes, and Maintenance).

It was trained using Unsloth on a custom dataset to provide expert advice on bike specifications, comparisons, comparisons, and troubleshooting.

πŸ“‚ Included Files

This repository contains the full merged model and quantized versions:

Filename Type Description
model.safetensors Full Model The unquantized, merged weights. Use this for Python/Transformers training or inference.
granite-3.1-2b-instruct.Q4_K_M.gguf GGUF (Q4) Recommended. 4-bit quantized. Fast & efficient (runs on 4GB+ RAM).
granite-3.1-2b-instruct.F16.gguf GGUF (FP16) High-precision uncompressed GGUF. Best quality but larger file size.

πŸ’» How to Use (GGUF / Llama.cpp / LM Studio)

You can use the .gguf files directly in LM Studio, Ollama, or llama.cpp.

CLI Command (llama.cpp):

./llama-cli -m granite-3.1-2b-instruct.Q4_K_M.gguf -p "User: Which bike is best for daily city commute with high mileage in India?\nAssistant:" -cnv

System Prompt (Recommended):

    You are an expert automotive assistant specializing in two-wheelers. Provide detailed specifications, comparisons, and maintenance advice for bikes and scooters.

🐍 How to Use (Python / Transformers)

Since the model is merged, you can load it directly without needing LoRA adapters.
Python

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "Prithwiraj731/Granite-3.1-2b-TwoWheeler"

# Load Tokenizer & Model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    device_map="auto",
    torch_dtype=torch.float16 # Use bfloat16 if your GPU supports it
)

# Format the Prompt (Granite Chat Format)
messages = [
    {"role": "user", "content": "Compare the Royal Enfield Classic 350 vs Honda CB350. Which one has better vibrations?"}
]
input_text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Generate
inputs = tokenizer(input_text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.7)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

πŸ”§ Training Details

    Base Architecture: IBM Granite 3.1 (2 Billion Parameters)

    Framework: Unsloth (PyTorch)

    Quantization: Q4_K_M & FP16 GGUF

    Fine-tuning Method: Full LoRA Merge (16-bit)

    Dataset Focus: Technical specifications, riding comfort, mileage, and maintenance of two-wheelers.

Finetuned with ❀️ using Unsloth.
Downloads last month
59
Safetensors
Model size
3B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Prithwiraj731/Granite-3.1-2b-TwoWheeler

Quantized
(35)
this model
Quantizations
1 model