Improve model card: Add `library_name`, `pipeline_tag`, paper, and GitHub links
Browse filesThis PR improves the model card for `LRC-4B-Base` by:
- Adding `pipeline_tag: text-generation` to properly categorize the model's primary function on the Hub.
- Adding `library_name: transformers` to correctly reflect its compatibility with the Hugging Face Transformers library, which enables the automated "how to use" widget.
- Integrating the official Hugging Face paper link: [A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone](https://huggingface.co/papers/2505.12781) at the top of the model card.
- Adding a direct link to the official GitHub repository (`https://github.com/CURRENTF/LowRankClone`) for easier access to the codebase.
- Removing the redundant `
README.md
CHANGED
|
@@ -1,14 +1,21 @@
|
|
| 1 |
---
|
| 2 |
-
license: mit
|
| 3 |
datasets:
|
| 4 |
- teknium/OpenHermes-2.5
|
| 5 |
- HuggingFaceTB/smollm-corpus
|
| 6 |
- mlfoundations/dclm-baseline-1.0
|
|
|
|
|
|
|
|
|
|
| 7 |
---
|
|
|
|
| 8 |
# Model Card for LRC-4B-Base
|
| 9 |
|
|
|
|
|
|
|
| 10 |
LRC-4B-Base is a Small Language Model (SLM) with approximately 4 billion parameters. It is the base pre-trained version, developed using the **Low-Rank Clone (LRC)** method, before any Supervised Fine-Tuning (SFT). The LRC method is an efficient knowledge distillation technique designed to construct SLMs that aspire to behavioral equivalence with larger, more powerful teacher models. This model was distilled from **Qwen2.5-7B-Instruct**.
|
| 11 |
|
|
|
|
|
|
|
| 12 |
The LRC approach trains a set of low-rank projection matrices that enable soft pruning by compressing teacher weights and an "activation clone" mechanism that aligns student activations (including FFN signals) with those of the teacher. LRC-4B-Base was trained on **18 billion tokens**, demonstrating significant training efficiency compared to models trained on trillions of tokens.
|
| 13 |
|
| 14 |
## Uses
|
|
|
|
| 1 |
---
|
|
|
|
| 2 |
datasets:
|
| 3 |
- teknium/OpenHermes-2.5
|
| 4 |
- HuggingFaceTB/smollm-corpus
|
| 5 |
- mlfoundations/dclm-baseline-1.0
|
| 6 |
+
license: mit
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
library_name: transformers
|
| 9 |
---
|
| 10 |
+
|
| 11 |
# Model Card for LRC-4B-Base
|
| 12 |
|
| 13 |
+
This repository hosts the **LRC-4B-Base** model, introduced in the paper [A Token is Worth over 1,000 Tokens: Efficient Knowledge Distillation through Low-Rank Clone](https://huggingface.co/papers/2505.12781).
|
| 14 |
+
|
| 15 |
LRC-4B-Base is a Small Language Model (SLM) with approximately 4 billion parameters. It is the base pre-trained version, developed using the **Low-Rank Clone (LRC)** method, before any Supervised Fine-Tuning (SFT). The LRC method is an efficient knowledge distillation technique designed to construct SLMs that aspire to behavioral equivalence with larger, more powerful teacher models. This model was distilled from **Qwen2.5-7B-Instruct**.
|
| 16 |
|
| 17 |
+
The official codebase is available at: [https://github.com/CURRENTF/LowRankClone](https://github.com/CURRENTF/LowRankClone)
|
| 18 |
+
|
| 19 |
The LRC approach trains a set of low-rank projection matrices that enable soft pruning by compressing teacher weights and an "activation clone" mechanism that aligns student activations (including FFN signals) with those of the teacher. LRC-4B-Base was trained on **18 billion tokens**, demonstrating significant training efficiency compared to models trained on trillions of tokens.
|
| 20 |
|
| 21 |
## Uses
|