--- license: unknown language: - en base_model: - mistralai/Mistral-7B-Instruct-v0.2 --- # Mistral LoRA - BitNet 1.58 Q&A Expert This is a LoRA fine-tuned adapter for [`mistralai/Mistral-7B-Instruct-v0.2`] on a custom Q&A dataset derived from the paper **"The Era of 1-bit LLMs" (BitNet b1.58)**. ## Model Details - Base model: `mistralai/Mistral-7B-Instruct-v0.2` - LoRA fine-tuning: 4-bit quantization (bitsandbytes) + PEFT - Target modules: `q_proj`, `k_proj`, `v_proj`, `o_proj` - Rank: 8, Alpha: 16, Dropout: 0.05 ## Dataset Q&A pairs were auto-generated from the BitNet b1.58 paper. Each instruction asked about architectural and performance details of 1-bit LLMs. ## Before vs. After Comparison | Question | Base Model Output | Fine-tuned Model Output | |---------|------------------|--------------------------| | What is a 1-bit LLM? | ❌ Talks about hardware cache lines | ✅ Correctly defines quantized LLM | | How does BitNet b1.58 differ from standard 1-bit models? | ❌ Talks about legacy networking | ✅ Talks about ternary weights (-1, 0, 1) | | At what size does it outperform FP16? | ❌ Refers to wrong paper | ✅ Refers to performance table | | Why is it more memory/latency efficient? | ❌ Talks about DHT routing | ✅ Highlights no FP multiplication | | Edge deployment and hardware design? | ❌ Talks about old protocols | ✅ References new 1-bit hardware potential | ## Usage ```python from peft import PeftModel from transformers import AutoTokenizer, AutoModelForCausalLM base = AutoModelForCausalLM.from_pretrained("mistralai/Mistral-7B-Instruct-v0.2") model = PeftModel.from_pretrained(base, "ogflash/mistral-lora-qa-1bit") tokenizer = AutoTokenizer.from_pretrained("ogflash/mistral-lora-qa-1bit") prompt = "### Instruction:\nwhat is 1 bit llm\n\n### Response:\n" inputs = tokenizer(prompt, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True))