Improve model card: Add metadata and GitHub link
Browse filesThis PR enhances the model card for the `Bochkov/bvv241-max` tokenizer by:
- Adding `license: apache-2.0`, `library_name: transformers`, and `pipeline_tag: feature-extraction` to the YAML metadata. This improves discoverability on the Hugging Face Hub and ensures the correct "how to use" widget appears for the tokenizer.
- Adding a link to the associated research paper for easy reference.
- Including a direct link to the GitHub repository where the code and research resources are hosted.
These changes provide more comprehensive information for users and integrate the model better within the Hugging Face ecosystem.
README.md
CHANGED
|
@@ -1,11 +1,15 @@
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
---
|
| 6 |
|
| 7 |
# bvv241-max: Unified Unicode Tokenizer (SOTA Intersection) with Frozen Embeddings
|
| 8 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 9 |
## Tokenizer Description
|
| 10 |
|
| 11 |
<!-- Provide a longer summary of what this model is. -->
|
|
@@ -76,4 +80,4 @@ If you use this model or the underlying concepts in your research, please cite o
|
|
| 76 |
}
|
| 77 |
```
|
| 78 |
|
| 79 |
-
This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.
|
|
|
|
| 1 |
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
library_name: transformers
|
| 4 |
+
pipeline_tag: feature-extraction
|
| 5 |
---
|
| 6 |
|
| 7 |
# bvv241-max: Unified Unicode Tokenizer (SOTA Intersection) with Frozen Embeddings
|
| 8 |
|
| 9 |
+
This model was presented in the paper [Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate](https://arxiv.org/abs/2507.07129).
|
| 10 |
+
|
| 11 |
+
Code: [https://github.com/Bochkov/BVV241](https://github.com/Bochkov/BVV241)
|
| 12 |
+
|
| 13 |
## Tokenizer Description
|
| 14 |
|
| 15 |
<!-- Provide a longer summary of what this model is. -->
|
|
|
|
| 80 |
}
|
| 81 |
```
|
| 82 |
|
| 83 |
+
This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.
|