nielsr HF Staff

Improve model card: add pipeline tag, move arxiv id, and link to code

07484e2 verified 13 days ago

2.21 kB

	---
	datasets:
	- roneneldan/TinyStories
	language:
	- en
	license: apache-2.0
	pipeline_tag: text-generation
	tags:
	- efficient-llm
	- quantization
	- ternary
	- bitnet
	- pytorch
	- tinystories
	- language-modeling
	---

	# TernaryLM-132M

	[TernaryLM](https://huggingface.co/papers/2602.07374) is a 132M-parameter Transformer trained natively using ternary weights {-1, 0, +1} (approximately 1.58-bit effective precision).

	Unlike post-training quantization (PTQ) methods that quantize pre-trained full-precision models, TernaryLM learns quantization-aware representations from scratch using straight-through estimators and adaptive per-layer scaling factors.

	## Resources
	- Paper: [TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling](https://huggingface.co/papers/2602.07374)
	- GitHub Repository: [1nisharg/TernaryLM-Memory-Efficient-Language-Modeling](https://github.com/1nisharg/TernaryLM-Memory-Efficient-Language-Modeling)

	## Architecture

	- Parameters: 132M
	- Layers: 12
	- Hidden Size: 768
	- Attention Heads: 12
	- Context Length: 512
	- Quantization: Native Ternary Training

	## Training

	- Dataset: [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) (~60k stories)
	- Optimizer: AdamW (betas=(0.9, 0.98))
	- Learning Rate: 3e-4
	- Scheduler: OneCycleLR
	- Epochs: 15
	- Hardware: Multi-GPU T4 setup (Kaggle)

	## Intended Use

	Research on:
	- Efficient Transformers and architecture design.
	- Quantization-aware training (QAT) paradigms.
	- Deployment of LLMs in resource-constrained or edge environments.

	## Limitations

	- The model is a research prototype and is not instruction-tuned.
	- Pre-training was conducted on a relatively small dataset scale (TinyStories).

	## Citation

	```bibtex
	@misc{nargund2026ternarylmmemoryefficientlanguagemodeling,
	title={TernaryLM: Memory-Efficient Language Modeling via Native 1-Bit Quantization with Adaptive Layer-wise Scaling},
	author={Nisharg Nargund and Priyesh Shukla},
	year={2026},
	eprint={2602.07374},
	archivePrefix={arXiv},
	primaryClass={cs.CL},
	url={https://arxiv.org/abs/2602.07374},
	}
	```