TernaryLM / README.md

nielsr HF Staff

Improve model card: add pipeline tag, move arxiv id, and link to code

07484e2 verified 11 days ago

2.21 kB

datasets:
  - roneneldan/TinyStories
language:
  - en
license: apache-2.0
pipeline_tag: text-generation
tags:
  - efficient-llm
  - quantization
  - ternary
  - bitnet
  - pytorch
  - tinystories
  - language-modeling

TernaryLM-132M

TernaryLM is a 132M-parameter Transformer trained natively using ternary weights {-1, 0, +1} (approximately 1.58-bit effective precision).

Unlike post-training quantization (PTQ) methods that quantize pre-trained full-precision models, TernaryLM learns quantization-aware representations from scratch using straight-through estimators and adaptive per-layer scaling factors.

Resources

Paper: TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling
GitHub Repository: 1nisharg/TernaryLM-Memory-Efficient-Language-Modeling

Architecture

Parameters: 132M
Layers: 12
Hidden Size: 768
Attention Heads: 12
Context Length: 512
Quantization: Native Ternary Training

Training

Dataset: TinyStories (~60k stories)
Optimizer: AdamW (betas=(0.9, 0.98))
Learning Rate: 3e-4
Scheduler: OneCycleLR
Epochs: 15
Hardware: Multi-GPU T4 setup (Kaggle)

Intended Use

Research on:

Efficient Transformers and architecture design.
Quantization-aware training (QAT) paradigms.
Deployment of LLMs in resource-constrained or edge environments.

Limitations

The model is a research prototype and is not instruction-tuned.
Pre-training was conducted on a relatively small dataset scale (TinyStories).

Citation

@misc{nargund2026ternarylmmemoryefficientlanguagemodeling,
      title={TernaryLM: Memory-Efficient Language Modeling via Native 1-Bit Quantization with Adaptive Layer-wise Scaling}, 
      author={Nisharg Nargund and Priyesh Shukla},
      year={2026},
      eprint={2602.07374},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2602.07374}, 
}