Improve model card: add pipeline tag, move arxiv id, and link to code

Hi! I'm Niels from the Hugging Face community team. This PR improves the model card for TernaryLM-132M by:
- Adding the `pipeline_tag: text-generation` to ensure the model is correctly categorized on the Hub.
- Moving the ArXiv ID from the YAML metadata to the Markdown content (as a link to the paper).
- Adding a link to the official GitHub repository for better accessibility to the code.
- Refining the Markdown structure for improved readability.

Files changed (1) hide show

README.md +28 -25

README.md CHANGED Viewed

@@ -1,7 +1,10 @@
 ---
-language: en
 license: apache-2.0
 tags:
 - efficient-llm
 - quantization
@@ -10,47 +13,47 @@ tags:
 - pytorch
 - tinystories
 - language-modeling
-datasets:
-- roneneldan/TinyStories
-arxiv: 2602.07374
 ---
 # TernaryLM-132M
-TernaryLM-132M is a 132M parameter Transformer trained natively using ternary weights {-1, 0, +1}.
-Unlike post-training quantization methods, this model learns quantized representations during training.
 ## Architecture
-- Parameters: 132M
-- Layers: 12
-- Hidden Size: 768
-- Attention Heads: 12
-- Context Length: 512
-- Quantization: Native Ternary Training
 ## Training
-- Dataset: TinyStories (~60k stories)
-- Optimizer: AdamW (betas=(0.9, 0.98))
-- LR: 3e-4
-- Scheduler: OneCycleLR
-- Epochs: 15
-- Hardware: Multi-GPU T4 setup (Kaggle)
 ## Intended Use
 Research on:
-- Efficient Transformers
-- Quantization-aware training
-- Edge deployment
 ## Limitations
-- Not instruction-tuned
-- Limited dataset scale
-- Research prototype
 ## Citation

 ---
+datasets:
+- roneneldan/TinyStories
+language:
+- en
 license: apache-2.0
+pipeline_tag: text-generation
 tags:
 - efficient-llm
 - quantization
 - pytorch
 - tinystories
 - language-modeling
 ---
 # TernaryLM-132M
+[TernaryLM](https://huggingface.co/papers/2602.07374) is a 132M-parameter Transformer trained natively using ternary weights {-1, 0, +1} (approximately 1.58-bit effective precision).
+Unlike post-training quantization (PTQ) methods that quantize pre-trained full-precision models, TernaryLM learns quantization-aware representations from scratch using straight-through estimators and adaptive per-layer scaling factors.
+## Resources
+- **Paper:** [TernaryLM: Memory-Efficient Language Modeling via Native 1.5-Bit Quantization with Adaptive Layer-wise Scaling](https://huggingface.co/papers/2602.07374)
+- **GitHub Repository:** [1nisharg/TernaryLM-Memory-Efficient-Language-Modeling](https://github.com/1nisharg/TernaryLM-Memory-Efficient-Language-Modeling)
 ## Architecture
+- **Parameters:** 132M
+- **Layers:** 12
+- **Hidden Size:** 768
+- **Attention Heads:** 12
+- **Context Length:** 512
+- **Quantization:** Native Ternary Training
 ## Training
+- **Dataset:** [TinyStories](https://huggingface.co/datasets/roneneldan/TinyStories) (~60k stories)
+- **Optimizer:** AdamW (betas=(0.9, 0.98))
+- **Learning Rate:** 3e-4
+- **Scheduler:** OneCycleLR
+- **Epochs:** 15
+- **Hardware:** Multi-GPU T4 setup (Kaggle)
 ## Intended Use
 Research on:
+- Efficient Transformers and architecture design.
+- Quantization-aware training (QAT) paradigms.
+- Deployment of LLMs in resource-constrained or edge environments.
 ## Limitations
+- The model is a research prototype and is not instruction-tuned.
+- Pre-training was conducted on a relatively small dataset scale (TinyStories).
 ## Citation