TernaryLM / README.md
nielsr's picture
nielsr HF Staff
Improve model card: add pipeline tag, move arxiv id, and link to code
07484e2 verified
|
raw
history blame
2.21 kB
---
datasets:
- roneneldan/TinyStories
language:
- en
license: apache-2.0
pipeline_tag: text-generation
tags:
- efficient-llm
- quantization
- ternary
- bitnet
- pytorch
- tinystories
- language-modeling
---
# TernaryLM-132M
[TernaryLM](https://huggingface.co/papers/2602.07374) is a 132M-parameter Transformer trained natively using ternary weights {-1, 0, +1} (approximately 1.58-bit effective precision).
Unlike post-training quantization (PTQ) methods that quantize pre-trained full-precision models, TernaryLM learns quantization-aware representations from scratch using straight-through estimators and adaptive per-layer scaling factors.
## Resources
- **Paper:** [TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling](https://huggingface.co/papers/2602.07374)
- **GitHub Repository:** [1nisharg/TernaryLM-Memory-Efficient-Language-Modeling](https://github.com/1nisharg/TernaryLM-Memory-Efficient-Language-Modeling)
## Architecture
- **Parameters:** 132M
- **Layers:** 12
- **Hidden Size:** 768
- **Attention Heads:** 12
- **Context Length:** 512
- **Quantization:** Native Ternary Training
## Training
- **Dataset:** [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) (~60k stories)
- **Optimizer:** AdamW (betas=(0.9, 0.98))
- **Learning Rate:** 3e-4
- **Scheduler:** OneCycleLR
- **Epochs:** 15
- **Hardware:** Multi-GPU T4 setup (Kaggle)
## Intended Use
Research on:
- Efficient Transformers and architecture design.
- Quantization-aware training (QAT) paradigms.
- Deployment of LLMs in resource-constrained or edge environments.
## Limitations
- The model is a research prototype and is not instruction-tuned.
- Pre-training was conducted on a relatively small dataset scale (TinyStories).
## Citation
```bibtex
@misc{nargund2026ternarylmmemoryefficientlanguagemodeling,
title={TernaryLM: Memory-Efficient Language Modeling via Native 1-Bit Quantization with Adaptive Layer-wise Scaling},
author={Nisharg Nargund and Priyesh Shukla},
year={2026},
eprint={2602.07374},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.07374},
}
```