Llama-3.2-1B-Instruct-bnb-4bit-lima - Merged Model
Full-precision (16-bit) merged model with LoRA adapters integrated.
Model Details
- Base Model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
- Format: merged_16bit
- Dataset: GAIR/lima
- Size: ~8-16GB
- Usage: transformers
Related Models
LoRA Adapters: fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-lora - Smaller LoRA-only adapters
GGUF Quantized: fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima-GGUF - GGUF format for llama.cpp/Ollama
Prompt Format
This model uses the Llama 3.2 chat template.
Python Usage
Use the tokenizer's apply_chat_template() method:
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Your question here"}
]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt")
Training Details
- LoRA Rank: 64
- Training Steps: 480
- Training Loss: 1.1123
- Max Seq Length: 2048
- Training Scope: 1,278 samples (3.0 epoch(s), full dataset)
For complete training configuration, see the LoRA adapters repository/directory.
Benchmark Results
Evaluated: 2025-11-24 03:10 Comparison: Fine-tuned vs Base model
HuggingFace Transformers (16-bit merged model)
IFEval (Instruction Following)
| Model | Strict Prompt | Strict Inst | Loose Prompt | Loose Inst |
|---|---|---|---|---|
| Base | 0.4399 | 0.5731 | 0.4787 | 0.6067 |
| Fine-tuned | 0.3050 | 0.4376 | 0.3327 | 0.4700 |
| ฮ | โ -0.1349 | โ -0.1355 | โ -0.1460 | โ -0.1367 |
Summary
| Benchmark | What It Tests | Base | Fine-tuned | Improvement |
|---|---|---|---|---|
| IFEval | Tests ability to follow specific instructions | 43.99% | 30.50% | โ -13.49% (-30.7%) |
| GSM8K | Tests math reasoning and chain-of-thought | - | - | - |
| HellaSwag | Tests real-world knowledge and common sense | - | - | - |
| MMLU | Tests broad knowledge retention (detects catastrophic forgetting) | - | - | - |
| TruthfulQA | Tests tendency to generate truthful answers | - | - | - |
Usage
With Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"./outputs/Llama-3.2-1B-Instruct-bnb-4bit-lima/merged_16bit",
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("./outputs/Llama-3.2-1B-Instruct-bnb-4bit-lima/merged_16bit")
messages = [{"role": "user", "content": "Your question here"}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0]))
License
Based on unsloth/Llama-3.2-1B-Instruct-bnb-4bit and trained on GAIR/lima. Please refer to the original model and dataset licenses.
Credits
Trained by: Farhan Syah
Training pipeline:
- unsloth-finetuning by @farhan-syah
- Unsloth - 2x faster LLM fine-tuning
Base components:
- Base model: unsloth/Llama-3.2-1B-Instruct-bnb-4bit
- Training dataset: GAIR/lima by GAIR
- Downloads last month
- 4
Model tree for fs90/Llama-3.2-1B-Instruct-bnb-4bit-lima
Base model
meta-llama/Llama-3.2-1B-Instruct
Quantized
unsloth/Llama-3.2-1B-Instruct-bnb-4bit