--- language: en license: apache-2.0 tags: - efficient-llm - quantization - ternary - bitnet - pytorch - tinystories - language-modeling datasets: - roneneldan/TinyStories arxiv: 2602.07374 --- # TernaryLM-132M TernaryLM-132M is a 132M parameter Transformer trained natively using ternary weights {-1, 0, +1}. Unlike post-training quantization methods, this model learns quantized representations during training. ## Architecture - Parameters: 132M - Layers: 12 - Hidden Size: 768 - Attention Heads: 12 - Context Length: 512 - Quantization: Native Ternary Training ## Training - Dataset: TinyStories (~60k stories) - Optimizer: AdamW (betas=(0.9, 0.98)) - LR: 3e-4 - Scheduler: OneCycleLR - Epochs: 15 - Hardware: Multi-GPU T4 setup (Kaggle) ## Intended Use Research on: - Efficient Transformers - Quantization-aware training - Edge deployment ## Limitations - Not instruction-tuned - Limited dataset scale - Research prototype ## Citation ```bibtex @misc{nargund2026ternarylmmemoryefficientlanguagemodeling, title={TernaryLM: Memory-Efficient Language Modeling via Native 1-Bit Quantization with Adaptive Layer-wise Scaling}, author={Nisharg Nargund and Priyesh Shukla}, year={2026}, eprint={2602.07374}, archivePrefix={arXiv}, primaryClass={cs.CL}, url={https://arxiv.org/abs/2602.07374}, } ```