ogflash's picture
Update README.md
7421411 verified
---
license: unknown
language:
- en
base_model:
- mistralai/Mistral-7B-Instruct-v0.2
---
# Mistral LoRA - BitNet 1.58 Q&A Expert
This is a LoRA fine-tuned adapter for [`mistralai/Mistral-7B-Instruct-v0.2`] on a custom Q&A dataset derived from the paper **"The Era of 1-bit LLMs" (BitNet b1.58)**.
## Model Details
- Base model: `mistralai/Mistral-7B-Instruct-v0.2`
- LoRA fine-tuning: 4-bit quantization (bitsandbytes) + PEFT
- Target modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`
- Rank: 8, Alpha: 16, Dropout: 0.05
## Dataset
Q&A pairs were auto-generated from the BitNet b1.58 paper. Each instruction asked about architectural and performance details of 1-bit LLMs.
## Before vs. After Comparison
| Question | Base Model Output | Fine-tuned Model Output |
|---------|------------------|--------------------------|
| What is a 1-bit LLM? | ❌ Talks about hardware cache lines | βœ… Correctly defines quantized LLM |
| How does BitNet b1.58 differ from standard 1-bit models? | ❌ Talks about legacy networking | βœ… Talks about ternary weights (-1, 0, 1) |
| At what size does it outperform FP16? | ❌ Refers to wrong paper | βœ… Refers to performance table |
| Why is it more memory/latency efficient? | ❌ Talks about DHT routing | βœ… Highlights no FP multiplication |
| Edge deployment and hardware design? | ❌ Talks about old protocols | βœ… References new 1-bit hardware potential |
## Usage
```python
from peft import PeftModel
from transformers import AutoTokenizer, AutoModelForCausalLM
base = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2")
model = PeftModel.from_pretrained(base, "ogflash/mistral-lora-qa-1bit")
tokenizer = AutoTokenizer.from_pretrained("ogflash/mistral-lora-qa-1bit")
prompt = "### Instruction:\nwhat is 1 bit llm\n\n### Response:\n"
inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))