nielsr HF Staff commited on
Commit
07484e2
·
verified ·
1 Parent(s): 389f302

Improve model card: add pipeline tag, move arxiv id, and link to code

Browse files

Hi! I'm Niels from the Hugging Face community team. This PR improves the model card for TernaryLM-132M by:
- Adding the `pipeline_tag: text-generation` to ensure the model is correctly categorized on the Hub.
- Moving the ArXiv ID from the YAML metadata to the Markdown content (as a link to the paper).
- Adding a link to the official GitHub repository for better accessibility to the code.
- Refining the Markdown structure for improved readability.

Files changed (1) hide show
  1. README.md +28 -25
README.md CHANGED
@@ -1,7 +1,10 @@
1
-
2
  ---
3
- language: en
 
 
 
4
  license: apache-2.0
 
5
  tags:
6
  - efficient-llm
7
  - quantization
@@ -10,47 +13,47 @@ tags:
10
  - pytorch
11
  - tinystories
12
  - language-modeling
13
- datasets:
14
- - roneneldan/TinyStories
15
- arxiv: 2602.07374
16
  ---
17
 
18
  # TernaryLM-132M
19
 
20
- TernaryLM-132M is a 132M parameter Transformer trained natively using ternary weights {-1, 0, +1}.
 
 
21
 
22
- Unlike post-training quantization methods, this model learns quantized representations during training.
 
 
23
 
24
  ## Architecture
25
 
26
- - Parameters: 132M
27
- - Layers: 12
28
- - Hidden Size: 768
29
- - Attention Heads: 12
30
- - Context Length: 512
31
- - Quantization: Native Ternary Training
32
 
33
  ## Training
34
 
35
- - Dataset: TinyStories (~60k stories)
36
- - Optimizer: AdamW (betas=(0.9, 0.98))
37
- - LR: 3e-4
38
- - Scheduler: OneCycleLR
39
- - Epochs: 15
40
- - Hardware: Multi-GPU T4 setup (Kaggle)
41
 
42
  ## Intended Use
43
 
44
  Research on:
45
- - Efficient Transformers
46
- - Quantization-aware training
47
- - Edge deployment
48
 
49
  ## Limitations
50
 
51
- - Not instruction-tuned
52
- - Limited dataset scale
53
- - Research prototype
54
 
55
  ## Citation
56
 
 
 
1
  ---
2
+ datasets:
3
+ - roneneldan/TinyStories
4
+ language:
5
+ - en
6
  license: apache-2.0
7
+ pipeline_tag: text-generation
8
  tags:
9
  - efficient-llm
10
  - quantization
 
13
  - pytorch
14
  - tinystories
15
  - language-modeling
 
 
 
16
  ---
17
 
18
  # TernaryLM-132M
19
 
20
+ [TernaryLM](https://huggingface.co/papers/2602.07374) is a 132M-parameter Transformer trained natively using ternary weights {-1, 0, +1} (approximately 1.58-bit effective precision).
21
+
22
+ Unlike post-training quantization (PTQ) methods that quantize pre-trained full-precision models, TernaryLM learns quantization-aware representations from scratch using straight-through estimators and adaptive per-layer scaling factors.
23
 
24
+ ## Resources
25
+ - **Paper:** [TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling](https://huggingface.co/papers/2602.07374)
26
+ - **GitHub Repository:** [1nisharg/TernaryLM-Memory-Efficient-Language-Modeling](https://github.com/1nisharg/TernaryLM-Memory-Efficient-Language-Modeling)
27
 
28
  ## Architecture
29
 
30
+ - **Parameters:** 132M
31
+ - **Layers:** 12
32
+ - **Hidden Size:** 768
33
+ - **Attention Heads:** 12
34
+ - **Context Length:** 512
35
+ - **Quantization:** Native Ternary Training
36
 
37
  ## Training
38
 
39
+ - **Dataset:** [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) (~60k stories)
40
+ - **Optimizer:** AdamW (betas=(0.9, 0.98))
41
+ - **Learning Rate:** 3e-4
42
+ - **Scheduler:** OneCycleLR
43
+ - **Epochs:** 15
44
+ - **Hardware:** Multi-GPU T4 setup (Kaggle)
45
 
46
  ## Intended Use
47
 
48
  Research on:
49
+ - Efficient Transformers and architecture design.
50
+ - Quantization-aware training (QAT) paradigms.
51
+ - Deployment of LLMs in resource-constrained or edge environments.
52
 
53
  ## Limitations
54
 
55
+ - The model is a research prototype and is not instruction-tuned.
56
+ - Pre-training was conducted on a relatively small dataset scale (TinyStories).
 
57
 
58
  ## Citation
59