Spaces:

ogflash
/

QnA-bitnet-Lora

Sleeping

ogflash commited on Jul 27, 2025

Commit

2c2db68

verified ·

1 Parent(s): 0503dd6

Upload 3 files

Files changed (3) hide show

README.md CHANGED Viewed

@@ -1,13 +1,12 @@
----
-title: QnA Bitnet Lora
-emoji: 💻
-colorFrom: gray
-colorTo: yellow
-sdk: gradio
-sdk_version: 5.38.2
-app_file: app.py
-pinned: false
-license: unknown
----
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# BitNet QA - LoRA-Fine-Tuned Mistral
+This Space provides a chat interface to a LoRA fine-tuned version of `Mistral-7B-Instruct`, trained on Q&A from the BitNet b1.58 (1-bit LLM) paper.
+Ask any technical question about:
+- 1-bit vs FP16 models
+- BitNet architecture
+- Inference latency
+- Memory and energy savings
+- Edge deployment of LLMs
+Model: [ogflash/mistral-lora-qa-1bit](https://huggingface.co/ogflash/mistral-lora-qa-1bit)

app.py ADDED Viewed

+import torch
+from transformers import AutoTokenizer, AutoModelForCausalLM, BitsAndBytesConfig
+from peft import PeftModel
+import gradio as gr
+base_model_id = "mistralai/Mistral-7B-Instruct-v0.2"
+lora_model_id = "ogflash/mistral-lora-qa-1bit"
+tokenizer = AutoTokenizer.from_pretrained(lora_model_id)
+bnb_config = BitsAndBytesConfig(
+    load_in_4bit=True,
+    bnb_4bit_use_double_quant=True,
+    bnb_4bit_quant_type="nf4",
+    bnb_4bit_compute_dtype=torch.float16,
+)
+base_model = AutoModelForCausalLM.from_pretrained(
+    base_model_id,
+    device_map="auto",
+    quantization_config=bnb_config
+)
+model = PeftModel.from_pretrained(base_model, lora_model_id)
+def generate_response(user_input):
+    prompt = f"### Instruction:\n{user_input}\n\n### Response:\n"
+    inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
+    outputs = model.generate(
+        **inputs,
+        max_new_tokens=256,
+        do_sample=True,
+        top_p=0.95,
+        temperature=0.7,
+        pad_token_id=tokenizer.eos_token_id
+    )
+    return tokenizer.decode(outputs[0], skip_special_tokens=True)
+demo = gr.Interface(
+    fn=generate_response,
+    inputs=gr.Textbox(lines=2, placeholder="Ask something about 1-bit LLMs..."),
+    outputs="text",
+    title="BitNet QA - Mistral LoRA",
+    description="Ask questions related to 1-bit LLMs (BitNet b1.58)."
+)
+demo.launch()

requirements.txt ADDED Viewed

+transformers
+peft
+accelerate
+torch
+gradio
+bitsandbytes