File size: 5,762 Bytes
a3bb9c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d549104
 
a3bb9c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
fcf9573
a3bb9c4
 
 
 
 
 
 
 
 
 
fcf9573
 
 
a3bb9c4
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
---
license: apache-2.0
tags:
  - text-generation
  - causal-lm
  - transformer
  - research
  - interpretability
  - multilingual
  - unicode
  - frozen-embeddings
  - ablation
language:
  - multilingual
library_name: transformers
pipeline_tag: text-generation
---

# Emergent Semantics β€” Model_1024_FLOAT (335M)

This repository provides **Model_1024_FLOAT (335M)** β€” an **ablation model** from the paper:

[πŸ“š Paper (Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations)](https://huggingface.co/papers/2507.04886) -

[πŸ“š Paper (Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate)](https://huggingface.co/papers/2507.07129) -

[πŸ“š Blog Article](https://huggingface.co/blog/Bochkov/emergent-semantics-beyond-token-embeddings)

This checkpoint is designed to isolate the effect of **float-valued / normalized frozen embeddings** versus **binary frozen embeddings**, while keeping the Transformer backbone and training setup the same.

---

## What this ablation is

**Model_1024_FLOAT** uses a frozen embedding table where:

- **`n_embed = 1024`** (embedding dimensionality equals `d_model`)
- Each token embedding is a **float vector**
- The embedding vectors are derived from a **random (non-semantic) codebook** and then **normalized** (e.g., L2 normalization) to control scale
- The embedding weights are **frozen** (`requires_grad=False`) for the entire training run

This model is part of an ablation series that tests whether differences in training dynamics / downstream reasoning come from:
- semantic structure in embeddings (hypothesis: not required),
- *or simply* numeric properties like dtype/scale/normalization.

---

## Relation to other models in the collection

- Compared to **Model_1024_BIT (335M)**:
  - Same backbone (`d_model=1024`, 16 layers, 32 heads, RoPE, GELU)
  - Same embedding dimensionality (`n_embed=1024`)
  - Difference is the embedding representation:
    - **1024_BIT:** frozen random **binary** vectors
    - **1024_FLOAT:** frozen random **float** vectors with **normalization**

- Compared to **Model_UNI_GLYPH (335M)**:
  - Same embedding dimensionality and frozen setup
  - UNI_GLYPH embeddings come from glyph-rendering + PCA; here embeddings are random and intended to be non-semantic

- Compared to **Model_unfrozen (335M)**:
  - Same architecture
  - Here embeddings are frozen; in the baseline they are trainable

Because `n_embed=1024`, this model is in the same **parameter-count class (~335M)** as UNI_GLYPH and the unfrozen baseline.

---

## Model summary

- **Architecture:** decoder-only Transformer (GPT-like)
- **Hidden size (`d_model`):** 1024  
- **Layers:** 16  
- **Heads:** 32  
- **Positional encoding:** rotary embeddings  
- **Activation:** GELU  
- **Vocabulary size:** 65,536
- **Tokenizer:** `Bochkov/bvv241-2-3` compatible
- **Input embeddings:** frozen, random **float**, **normalized**, `n_embed=1024`
- **Output head:** **not tied** to the input embeddings (trained separately)

---

## Tokenizer

The intended tokenizer is **bvv241-2-3**:

- https://huggingface.co/Bochkov/bvv241-2-3

You can load the tokenizer either from this model repo (if included) or from the standalone tokenizer repo. The key requirement is **exact vocab alignment**.

---

## How to use (Transformers)

```python

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Bochkov/emergent-semantics-model-1024-float-335m")
model = AutoModelForCausalLM.from_pretrained("Bochkov/emergent-semantics-model-1024-float-335m", trust_remote_code=True).to('cuda')

inputs = torch.tensor([tokenizer.encode("Question: What is the capital of Japan?\nAnswer:")], dtype=torch.long, device='cuda')

outputs = model.generate(
    inputs, 
    max_new_tokens=10,
    do_sample=False
)
print(tokenizer.decode(outputs[0].tolist()))

#Question: What is the capital of Japan?
#Answer:Tokyo Metropolitan 

```

---

## Intended use

Research-only checkpoint intended for:

- Studying **emergent semantics** with a frozen random float codebook
- Isolating the impact of **normalization / vector scale** in frozen embeddings
- Comparisons against **1024_BIT** and **UNI_GLYPH** under identical backbone/training conditions

Not intended for production deployment (no safety/instruction tuning).

---

## Related links

- **Model collection (paper artifacts):**  
  https://huggingface.co/collections/Bochkov/emergent-semantics-beyond-token-embeddings
- **UNI_GLYPH model (frozen visual glyph embeddings):**  
  https://huggingface.co/Bochkov/emergent-semantics-model-uni-glyph-335m
- **1024_BIT model (binary random frozen embeddings):**  
  https://huggingface.co/Bochkov/emergent-semantics-model-1024-bit-335m
- **Tokenizer:**  
  https://huggingface.co/Bochkov/bvv241-2-3
- **Code (GitHub):**  
  https://github.com/AVBochkov/Embeddings

---

## πŸ§‘β€πŸ”¬ Citation & Concept
If you use this model or the underlying concepts in your research, please cite our work:
```
@article{
      bochkov2025emergent,
      title={Emergent Semantics Beyond Token Embeddings: Transformer {LM}s with Frozen Visual Unicode Representations},
      author={Andrey Bochkov},
      journal={Transactions on Machine Learning Research},
      issn={2835-8856},
      year={2025},
      url={https://openreview.net/forum?id=Odh8IynO1o},
      note={}
}
@misc{bochkov2025growingtransformersmodularcomposition,
      title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate}, 
      author={A. Bochkov},
      year={2025},
      eprint={2507.07129},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.07129}, 
}
```