ogflash commited on
Commit
17ca135
·
verified ·
1 Parent(s): e324dea

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +40 -3
README.md CHANGED
@@ -1,3 +1,40 @@
1
- ---
2
- license: unknown
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: unknown
3
+ ---
4
+ # Mistral-7B BitNet LoRA — 4-Bit Merged
5
+
6
+ This repository contains a 4-bit quantized and LoRA-merged version of [mistralai/Mistral-7B-Instruct-v0.2](https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2), fine-tuned on a small Q&A dataset related to 1-bit LLMs and BitNet B1.58.
7
+
8
+ The LoRA adapter was merged with the base model for easier deployment on constrained hardware or Hugging Face Spaces.
9
+
10
+ ---
11
+
12
+ ## Model Details
13
+
14
+ - Base model: `mistralai/Mistral-7B-Instruct-v0.2`
15
+ - Quantization: 4-bit (NF4 via bitsandbytes)
16
+ - Fine-tuning: LoRA (merged into base)
17
+ - Adapter repo: [ogflash/mistral-lora-qa-1bit](https://huggingface.co/ogflash/mistral-lora-qa-1bit)
18
+
19
+ ---
20
+
21
+ ## Usage
22
+
23
+ ```python
24
+ from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
25
+ import torch
26
+
27
+ model_id = "ogflash/mistral-merged-1bit-4bit"
28
+
29
+ bnb_config = BitsAndBytesConfig(
30
+ load_in_4bit=True,
31
+ bnb_4bit_quant_type="nf4",
32
+ bnb_4bit_compute_dtype=torch.float16
33
+ )
34
+
35
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
36
+ model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", quantization_config=bnb_config)
37
+
38
+ inputs = tokenizer("### Instruction:\nWhat is BitNet B1.58?\n\n### Response:\n", return_tensors="pt").to(model.device)
39
+ outputs = model.generate(**inputs, max_new_tokens=300)
40
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))