metadata
datasets:
- roneneldan/TinyStories
language:
- en
license: apache-2.0
pipeline_tag: text-generation
tags:
- efficient-llm
- quantization
- ternary
- bitnet
- pytorch
- tinystories
- language-modeling
TernaryLM-132M
TernaryLM is a 132M-parameter Transformer trained natively using ternary weights {-1, 0, +1} (approximately 1.58-bit effective precision).
Unlike post-training quantization (PTQ) methods that quantize pre-trained full-precision models, TernaryLM learns quantization-aware representations from scratch using straight-through estimators and adaptive per-layer scaling factors.
Resources
- Paper: TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling
- GitHub Repository: 1nisharg/TernaryLM-Memory-Efficient-Language-Modeling
Architecture
- Parameters: 132M
- Layers: 12
- Hidden Size: 768
- Attention Heads: 12
- Context Length: 512
- Quantization: Native Ternary Training
Training
- Dataset: TinyStories (~60k stories)
- Optimizer: AdamW (betas=(0.9, 0.98))
- Learning Rate: 3e-4
- Scheduler: OneCycleLR
- Epochs: 15
- Hardware: Multi-GPU T4 setup (Kaggle)
Intended Use
Research on:
- Efficient Transformers and architecture design.
- Quantization-aware training (QAT) paradigms.
- Deployment of LLMs in resource-constrained or edge environments.
Limitations
- The model is a research prototype and is not instruction-tuned.
- Pre-training was conducted on a relatively small dataset scale (TinyStories).
Citation
@misc{nargund2026ternarylmmemoryefficientlanguagemodeling,
title={TernaryLM: Memory-Efficient Language Modeling via Native 1-Bit Quantization with Adaptive Layer-wise Scaling},
author={Nisharg Nargund and Priyesh Shukla},
year={2026},
eprint={2602.07374},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2602.07374},
}