Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,58 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language:
|
| 3 |
+
- en
|
| 4 |
+
license: mit
|
| 5 |
+
tags:
|
| 6 |
+
- bitnet
|
| 7 |
+
- 1.58-bit
|
| 8 |
+
- ternary
|
| 9 |
+
- tinystories
|
| 10 |
+
- edge-device
|
| 11 |
+
datasets:
|
| 12 |
+
- roneneldan/TinyStories
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
# BitNet-TinyStories-V2 (3.9 MB)
|
| 16 |
+
|
| 17 |
+
This is an ultra-compressed **1.58-bit** language model trained entirely from scratch on the `TinyStories` dataset.
|
| 18 |
+
|
| 19 |
+
It implements the **BitNet (1.58b)** architecture, where all internal Linear layers are heavily quantized into ternary weights (`-1, 0, 1`). This version uses **Weight Tying**, allowing it to achieve a deep 12-Layer architecture while staying under a 4MB footprint!
|
| 20 |
+
|
| 21 |
+
## Model Details
|
| 22 |
+
- **Architecture:** BitNet (1.58b)
|
| 23 |
+
- **Parameters:** ~21 Million
|
| 24 |
+
- **Layers:** 12 (Tied)
|
| 25 |
+
- **Precision:** 1.58-bit (Ternary) for internal weights
|
| 26 |
+
- **File Size:** 3.96 MB
|
| 27 |
+
- **Tokenizer:** `arnir0/Tiny-LLM` SentencePiece (32,000 vocab size)
|
| 28 |
+
- **Dataset:** `roneneldan/TinyStories`
|
| 29 |
+
- **Validation Perplexity:** 23.7
|
| 30 |
+
|
| 31 |
+
## Usage
|
| 32 |
+
|
| 33 |
+
Because this model uses a highly customized ternary architecture, it cannot be loaded using standard HuggingFace `AutoModel`. You must use the `BitGPT` class implementation.
|
| 34 |
+
|
| 35 |
+
```python
|
| 36 |
+
import torch
|
| 37 |
+
from transformers import AutoTokenizer
|
| 38 |
+
from bitnet_test import BitGPT
|
| 39 |
+
|
| 40 |
+
# 1. Load Tokenizer
|
| 41 |
+
tokenizer = AutoTokenizer.from_pretrained("arnir0/Tiny-LLM")
|
| 42 |
+
|
| 43 |
+
# 2. Initialize Model
|
| 44 |
+
model = BitGPT(vocab_size=len(tokenizer), embed_dim=256, num_layers=12, num_heads=4, tie_weights=True)
|
| 45 |
+
|
| 46 |
+
# 3. Load 1.58-bit Weights
|
| 47 |
+
model.load_state_dict(torch.load("bitnet_tied.pt", map_location="cpu"))
|
| 48 |
+
model.eval()
|
| 49 |
+
|
| 50 |
+
# 4. Generate Text
|
| 51 |
+
prompt = "Once upon a time, there was a tiny cat named"
|
| 52 |
+
input_ids = tokenizer.encode(prompt, return_tensors="pt")
|
| 53 |
+
|
| 54 |
+
# ... Run standard auto-regressive generation loop
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
## Intended Use
|
| 58 |
+
This model is intended purely as a research demonstration of the viability of 1.58-bit LLMs on edge devices. Because it was trained exclusively on the TinyStories dataset, it is completely incapable of performing complex reasoning, answering factual questions, or following instructions. It will only generate children's storybooks.
|