mehta commited on
Commit
73cc915
·
verified ·
1 Parent(s): a254fb2

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md ADDED
@@ -0,0 +1,59 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ language:
4
+ - en
5
+ base_model:
6
+ - mehta/CooperLM-354M
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - toy-llm
11
+ - gpt2
12
+ - 4bit
13
+ - quantized
14
+ - casual-lm
15
+ - transformers
16
+ - small-llm
17
+ ---
18
+
19
+ # 🧠 CooperLM-354M (4-bit Quantized)
20
+
21
+ This is a 4-bit quantized version of [CooperLM-354M](https://huggingface.co/mehta/CooperLM-354M), a 354M parameter GPT-2 style language model trained from scratch on a subset of Wikipedia, BookCorpus, and OpenWebText.
22
+
23
+ The quantized model is intended for faster inference and smaller memory footprint, especially useful for CPU or limited-GPU setups.
24
+
25
+ ---
26
+
27
+ ## 📌 Model Details
28
+
29
+ - **Base Model**: [mehta/CooperLM-354M](https://huggingface.co/mehta/CooperLM-354M)
30
+ - **Architecture**: GPT-2 (24 layers, 16 heads, 1024 hidden size)
31
+ - **Quantization**: 4-bit integer weights via `AutoGPTQ` (safetensors)
32
+ - **Precision**: INT4
33
+
34
+ ---
35
+
36
+ ## 🛠️ How to Use
37
+
38
+ ```python
39
+ from transformers import AutoTokenizer, AutoModelForCausalLM
40
+ import torch
41
+
42
+ tokenizer = AutoTokenizer.from_pretrained("mehta/CooperLM-354M-4bit")
43
+ model = AutoModelForCausalLM.from_pretrained("mehta/CooperLM-354M-4bit")
44
+
45
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
46
+ model.to(device)
47
+
48
+ prompt = "In the distant future,"
49
+ inputs = tokenizer(prompt, return_tensors="pt").to(device)
50
+
51
+ outputs = model.generate(
52
+ **inputs,
53
+ max_length=100,
54
+ temperature=0.8,
55
+ top_p=0.95,
56
+ do_sample=True
57
+ )
58
+
59
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))