Feature Extraction
Transformers
gpt2
Bochkov commited on
Commit
8921471
Β·
verified Β·
1 Parent(s): f60d3f0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -15
README.md CHANGED
@@ -3,16 +3,15 @@ license: apache-2.0
3
  library_name: transformers
4
  pipeline_tag: feature-extraction
5
  ---
6
-
7
  # bvv241-max: Unified Unicode Tokenizer (SOTA Intersection) with Frozen Embeddings
8
 
9
- This model was presented in the paper [Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate](https://arxiv.org/abs/2507.07129).
10
 
11
- Code: [https://github.com/Bochkov/BVV241](https://github.com/Bochkov/BVV241)
12
 
13
- ## Tokenizer Description
14
 
15
- <!-- Provide a longer summary of what this model is. -->
16
 
17
  This tokenizer is based on a hybrid vocabulary:
18
 
@@ -35,23 +34,15 @@ No training or adaptation; suitable for plug-and-play use in research on embeddi
35
  ## How to Get Started with the Tokenizer
36
 
37
  ```python
38
-
39
  from transformers import AutoTokenizer
40
-
41
  from huggingface_hub import hf_hub_download
42
-
43
  import torch
44
-
45
  tokenizer = AutoTokenizer.from_pretrained('Bochkov/bvv241-max')
46
-
47
-
48
  emb_path = hf_hub_download(
49
  repo_id="Bochkov/bvv241-max",
50
  filename="normalized_embeddings_weights.pt"
51
  )
52
-
53
  embeddings = torch.load(emb_path)
54
-
55
  ```
56
 
57
  ## πŸ§‘β€πŸ”¬ Citation & Concept
@@ -68,7 +59,6 @@ If you use this model or the underlying concepts in your research, please cite o
68
  primaryClass={cs.CL},
69
  url={https://arxiv.org/abs/2507.04886},
70
  }
71
-
72
  @misc{bochkov2025growingtransformersmodularcomposition,
73
  title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate},
74
  author={A. Bochkov},
@@ -80,4 +70,4 @@ If you use this model or the underlying concepts in your research, please cite o
80
  }
81
  ```
82
 
83
- This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs β€” a step toward modular, fusable, multilingual LMs.
 
3
  library_name: transformers
4
  pipeline_tag: feature-extraction
5
  ---
 
6
  # bvv241-max: Unified Unicode Tokenizer (SOTA Intersection) with Frozen Embeddings
7
 
8
+ ## Tokenizer Description
9
 
10
+ This repository contains the tokenizer and associated resources from the papers
11
 
12
+ [πŸ“š Paper (Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations)](https://huggingface.co/papers/2507.04886) -
13
 
14
+ [πŸ’» Code](https://github.com/AVBochkov/Embeddings)
15
 
16
  This tokenizer is based on a hybrid vocabulary:
17
 
 
34
  ## How to Get Started with the Tokenizer
35
 
36
  ```python
 
37
  from transformers import AutoTokenizer
 
38
  from huggingface_hub import hf_hub_download
 
39
  import torch
 
40
  tokenizer = AutoTokenizer.from_pretrained('Bochkov/bvv241-max')
 
 
41
  emb_path = hf_hub_download(
42
  repo_id="Bochkov/bvv241-max",
43
  filename="normalized_embeddings_weights.pt"
44
  )
 
45
  embeddings = torch.load(emb_path)
 
46
  ```
47
 
48
  ## πŸ§‘β€πŸ”¬ Citation & Concept
 
59
  primaryClass={cs.CL},
60
  url={https://arxiv.org/abs/2507.04886},
61
  }
 
62
  @misc{bochkov2025growingtransformersmodularcomposition,
63
  title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate},
64
  author={A. Bochkov},
 
70
  }
71
  ```
72
 
73
+ This work demonstrates that transformer blocks, not token embeddings, carry the semantic burden in LLMs β€” a step toward modular, fusable, multilingual LMs.