---
license: unknown
language:
- en
base_model:
- mistralai/Mistral-7B-Instruct-v0.2
---
# Mistral LoRA - BitNet 1.58 Q&A Expert

This is a LoRA fine-tuned adapter for [`mistralai/Mistral-7B-Instruct-v0.2`] on a custom Q&A dataset derived from the paper **"The Era of 1-bit LLMs" (BitNet b1.58)**.

## Model Details

- Base model: `mistralai/Mistral-7B-Instruct-v0.2`
- LoRA fine-tuning: 4-bit quantization (bitsandbytes) + PEFT
- Target modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`
- Rank: 8, Alpha: 16, Dropout: 0.05

## Dataset

Q&A pairs were auto-generated from the BitNet b1.58 paper. Each instruction asked about architectural and performance details of 1-bit LLMs.

## Before vs. After Comparison

| Question | Base Model Output | Fine-tuned Model Output |
|---------|------------------|--------------------------|
| What is a 1-bit LLM? | ❌ Talks about hardware cache lines | ✅ Correctly defines quantized LLM |
| How does BitNet b1.58 differ from standard 1-bit models? | ❌ Talks about legacy networking | ✅ Talks about ternary weights (-1, 0, 1) |
| At what size does it outperform FP16? | ❌ Refers to wrong paper | ✅ Refers to performance table |
| Why is it more memory/latency efficient? | ❌ Talks about DHT routing | ✅ Highlights no FP multiplication |
| Edge deployment and hardware design? | ❌ Talks about old protocols | ✅ References new 1-bit hardware potential |

## Usage

```python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM

base = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model = PeftModel.from_pretrained(base, "ogflash/mistral-lora-qa-1bit")
tokenizer = AutoTokenizer.from_pretrained("ogflash/mistral-lora-qa-1bit")

prompt = "### Instruction:\nwhat is 1 bit llm\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))