Shlok307 commited on
Commit
2b32321
·
verified ·
1 Parent(s): bdb0388

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +63 -0
README.md ADDED
@@ -0,0 +1,63 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - meta-llama/Llama-3.2-1B-Instruct
7
+ pipeline_tag: text-generation
8
+ ---
9
+ # Llama-3.2-1B-Instruct (4-bit Quantized)
10
+
11
+ This repository contains a **4-bit quantized version** of the Llama-3.2-1B-Instruct model.
12
+ It has been quantized using **bitsandbytes NF4** for extremely low VRAM consumption and
13
+ fast inference, making it ideal for edge devices, low-resource systems, or fast evaluation
14
+ pipelines (e.g., interview Thinker models).
15
+
16
+ ---
17
+
18
+ ## Model Features
19
+
20
+ - **Base model:** Llama-3.2-1B-Instruct
21
+ - **Quantization:** 4-bit (NF4) using `bitsandbytes`
22
+ - **VRAM requirement:** ~1.0 GB
23
+ - **Perfect for:**
24
+ - Lightweight chatbots
25
+ - Reasoning/evaluation agents
26
+ - Interview Thinker modules
27
+ - Local inference on small GPUs
28
+ - Low-latency systems
29
+ - **Compatible with:**
30
+ - LoRA fine-tuning
31
+ - HuggingFace Transformers
32
+ - Text-generation inference engines
33
+
34
+ ---
35
+
36
+ ## Files Included
37
+
38
+ - `config.json`
39
+ - `generation_config.json`
40
+ - `model.safetensors` (4-bit quantized weights)
41
+ - `tokenizer.json`
42
+ - `tokenizer_config.json`
43
+ - `special_tokens_map.json`
44
+ - `chat_template.jinja`
45
+
46
+ These files allow you to load the model directly with `load_in_4bit=True`.
47
+
48
+ ---
49
+
50
+ ## How To Load This Model
51
+
52
+ ```python
53
+ from transformers import AutoModelForCausalLM, AutoTokenizer
54
+
55
+ model_name = "Shlok307/llama-1b-4bit"
56
+
57
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
58
+
59
+ model = AutoModelForCausalLM.from_pretrained(
60
+ model_name,
61
+ load_in_4bit=True,
62
+ device_map="auto"
63
+ )