|
|
|
|
|
--- |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
tags: |
|
|
- efficient-llm |
|
|
- quantization |
|
|
- ternary |
|
|
- bitnet |
|
|
- pytorch |
|
|
- tinystories |
|
|
- language-modeling |
|
|
datasets: |
|
|
- roneneldan/TinyStories |
|
|
arxiv: 2602.07374 |
|
|
--- |
|
|
|
|
|
# TernaryLM-132M |
|
|
|
|
|
TernaryLM-132M is a 132M parameter Transformer trained natively using ternary weights {-1, 0, +1}. |
|
|
|
|
|
Unlike post-training quantization methods, this model learns quantized representations during training. |
|
|
|
|
|
## Architecture |
|
|
|
|
|
- Parameters: 132M |
|
|
- Layers: 12 |
|
|
- Hidden Size: 768 |
|
|
- Attention Heads: 12 |
|
|
- Context Length: 512 |
|
|
- Quantization: Native Ternary Training |
|
|
|
|
|
## Training |
|
|
|
|
|
- Dataset: TinyStories (~60k stories) |
|
|
- Optimizer: AdamW (betas=(0.9, 0.98)) |
|
|
- LR: 3e-4 |
|
|
- Scheduler: OneCycleLR |
|
|
- Epochs: 15 |
|
|
- Hardware: Multi-GPU T4 setup (Kaggle) |
|
|
|
|
|
## Intended Use |
|
|
|
|
|
Research on: |
|
|
- Efficient Transformers |
|
|
- Quantization-aware training |
|
|
- Edge deployment |
|
|
|
|
|
## Limitations |
|
|
|
|
|
- Not instruction-tuned |
|
|
- Limited dataset scale |
|
|
- Research prototype |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{nargund2026ternarylmmemoryefficientlanguagemodeling, |
|
|
title={TernaryLM: Memory-Efficient Language Modeling via Native 1-Bit Quantization with Adaptive Layer-wise Scaling}, |
|
|
author={Nisharg Nargund and Priyesh Shukla}, |
|
|
year={2026}, |
|
|
eprint={2602.07374}, |
|
|
archivePrefix={arXiv}, |
|
|
primaryClass={cs.CL}, |
|
|
url={https://arxiv.org/abs/2602.07374}, |
|
|
} |
|
|
``` |