Bochkov
/

bvv241-max

Feature Extraction

Model card Files Files and versions

Bochkov commited on Jul 14, 2025

Commit

8921471

·

verified ·

1 Parent(s): f60d3f0

Update README.md

Files changed (1) hide show

README.md +5 -15

README.md CHANGED Viewed

@@ -3,16 +3,15 @@ license: apache-2.0
 library_name: transformers
 pipeline_tag: feature-extraction
 ---
 # bvv241-max: Unified Unicode Tokenizer (SOTA Intersection) with Frozen Embeddings
-This model was presented in the paper [Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate](https://arxiv.org/abs/2507.07129).
-Code: [https://github.com/Bochkov/BVV241](https://github.com/Bochkov/BVV241)
-## Tokenizer Description
-<!-- Provide a longer summary of what this model is. -->
 This tokenizer is based on a hybrid vocabulary:
@@ -35,23 +34,15 @@ No training or adaptation; suitable for plug-and-play use in research on embeddi
 ## How to Get Started with the Tokenizer
 ```python
 from transformers import AutoTokenizer
 from huggingface_hub import hf_hub_download
 import torch
 tokenizer = AutoTokenizer.from_pretrained('Bochkov/bvv241-max')
 emb_path = hf_hub_download(
     repo_id="Bochkov/bvv241-max",
     filename="normalized_embeddings_weights.pt"
 )
 embeddings = torch.load(emb_path)
 ```
 ## 🧑‍🔬 Citation & Concept
@@ -68,7 +59,6 @@ If you use this model or the underlying concepts in your research, please cite o
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2507.04886},
 }
 @misc{bochkov2025growingtransformersmodularcomposition,
       title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate},
       author={A. Bochkov},
@@ -80,4 +70,4 @@ If you use this model or the underlying concepts in your research, please cite o
 }
 ```
-This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.

 library_name: transformers
 pipeline_tag: feature-extraction
 ---
 # bvv241-max: Unified Unicode Tokenizer (SOTA Intersection) with Frozen Embeddings
+## Tokenizer Description
+This repository contains the tokenizer and associated resources from the papers
+[📚 Paper (Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations)](https://huggingface.co/papers/2507.04886) -
+[💻 Code](https://github.com/AVBochkov/Embeddings)
 This tokenizer is based on a hybrid vocabulary:
 ## How to Get Started with the Tokenizer
 ```python
 from transformers import AutoTokenizer
 from huggingface_hub import hf_hub_download
 import torch
 tokenizer = AutoTokenizer.from_pretrained('Bochkov/bvv241-max')
 emb_path = hf_hub_download(
     repo_id="Bochkov/bvv241-max",
     filename="normalized_embeddings_weights.pt"
 )
 embeddings = torch.load(emb_path)
 ```
 ## 🧑‍🔬 Citation & Concept
       primaryClass={cs.CL},
       url={https://arxiv.org/abs/2507.04886},
 }
 @misc{bochkov2025growingtransformersmodularcomposition,
       title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate},
       author={A. Bochkov},
 }
 ```
+This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs — a step toward modular, fusable, multilingual LMs.