Instructions to use Bochkov/emergent-semantics-model-256-bit-285m with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Bochkov/emergent-semantics-model-256-bit-285m with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Bochkov/emergent-semantics-model-256-bit-285m", trust_remote_code=True)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Bochkov/emergent-semantics-model-256-bit-285m", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Bochkov/emergent-semantics-model-256-bit-285m with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Bochkov/emergent-semantics-model-256-bit-285m"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bochkov/emergent-semantics-model-256-bit-285m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Bochkov/emergent-semantics-model-256-bit-285m

SGLang

How to use Bochkov/emergent-semantics-model-256-bit-285m with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Bochkov/emergent-semantics-model-256-bit-285m" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bochkov/emergent-semantics-model-256-bit-285m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Bochkov/emergent-semantics-model-256-bit-285m" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bochkov/emergent-semantics-model-256-bit-285m",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Bochkov/emergent-semantics-model-256-bit-285m with Docker Model Runner:
```
docker model run hf.co/Bochkov/emergent-semantics-model-256-bit-285m
```

Emergent Semantics — Model_256_BIT (285M)

This repository provides Model_256_BIT (285M) — an ablation model from the paper:

📚 Paper (Emergent Semantics Beyond Token Embeddings: Transformer LMs with Frozen Visual Unicode Representations) -

📚 Paper (Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate) -

📚 Blog Article

This checkpoint is designed to test whether a Transformer can learn semantics when the input embedding layer is frozen and carries no linguistic or visual information, only a stable per-token identifier.

Key idea (what this ablation tests)

Model_256_BIT uses a frozen embedding table with:

Embedding dimension: n_embed = 256
Components: binary
Initialization: randomly generated, but collision-free by construction
- i.e., each token is assigned a unique random 256-bit ID
- no collisions (“unique ID per token”) is guaranteed by the generation procedure, not by probability

The embedding is not trained (requires_grad=False). It provides a stable token identity code but does not encode glyph shape or semantics.

To match the Transformer hidden size, the 256-dim embedding is expanded to 1024 via a non-trainable repetition: repeat_interleave(4) → 256 * 4 = 1024.

Important: parameter count difference (vs 335M models)

This checkpoint has ~285M parameters, while models with a standard n_embed=1024 embedding table (e.g. UNI_GLYPH / unfrozen baselines) are ~335M.

The difference is primarily the embedding table size:

Standard embedding params: vocab_size * 1024 = 65536 * 1024 ≈ 67.1M
This model’s embedding params: vocab_size * 256 = 65536 * 256 ≈ 16.8M

So the Transformer backbone is the same (layers/heads/d_model), but the total parameter count is lower because the embedding matrix is smaller.

Model summary

Architecture: decoder-only Transformer (GPT-like)
Hidden size (d_model): 1024
Layers: 16
Heads: 32
Positional encoding: rotary embeddings
Activation: GELU
Tokenizer / vocab size: 65,536 (bvv241-2-3 compatible)
Input embeddings: frozen, random collision-free 256-bit IDs (n_embed=256), expanded to 1024 by repetition (non-trainable)
Output head: not tied to the input embeddings (trained separately)

Files in this repo (embedding reference)

This repository includes the frozen embedding definition used by the checkpoint (to keep model+embedding mapping self-contained). If you also provide a readable dump like for the 16-bit model, link it here, e.g.:

embeddings.txt (reference / inspectable mapping):
https://huggingface.co/Bochkov/emergent-semantics-model-256-bit-285m/blob/main/embeddings.txt

Note: Embeddings are stored in the model repo (even though the tokenizer exists as a separate HF repo) because the embedding tensor is part of the experimental condition and differs across ablations.

Tokenizer

The intended tokenizer is bvv241-2-3 (same vocab size and indexing):

https://huggingface.co/Bochkov/bvv241-2-3

You may load the tokenizer either from this model repo (if included) or from the standalone tokenizer repo. The key requirement is exact vocab alignment.

How to use (Transformers)


import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Bochkov/emergent-semantics-model-256-bit-285m")
model = AutoModelForCausalLM.from_pretrained("Bochkov/emergent-semantics-model-256-bit-285m", trust_remote_code=True).to('cuda')

inputs = torch.tensor([tokenizer.encode("Question: What is the capital of Japan?\nAnswer:")], dtype=torch.long, device='cuda')

outputs = model.generate(
    inputs, 
    max_new_tokens=10,
    do_sample=False
)
print(tokenizer.decode(outputs[0].tolist()))

#Question: What is the capital of Japan?
#Answer:Tokyo (Japan)

Verify the 256-bit frozen binary embeddings (sanity check)

The model uses a frozen nn.Embedding(vocab_size=65536, n_embed=256) whose values are strictly binary (0/1). Each 256-dim vector is then deterministically expanded to d_model=1024 via repeat_interleave(scale=4).

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

repo_id = "Bochkov/emergent-semantics-model-256-bit-285m"

tokenizer = AutoTokenizer.from_pretrained(repo_id)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)
model.eval()

print("vocab_size:", tokenizer.vocab_size)
print("config:", {k: getattr(model.config, k) for k in ["vocab_size", "n_embed", "d_model", "n_layer", "n_head", "scale"]})

# --- 1) Show embedding matrix shape (should be 65536 x 256) ---
W = model.token_embeddings.weight.detach().cpu()
print("token_embeddings.weight shape:", tuple(W.shape))  # (65536, 256)

# --- 2) Tokenize 'A' and show its token id  ---
text = "A"
ids = tokenizer.encode(text, add_special_tokens=False)
tokens = tokenizer.convert_ids_to_tokens(ids)

print(f"text={text!r}")
print("ids:", ids)
print("tokens:", tokens)

tid = ids[0]

# --- 3) Print the 256 dim vector and verify it is binary (0/1) ---
e256= W[tid]  # shape: (256)
print("256-dim embedding for token id", tid, ":", e256.tolist())

uniq = torch.unique(e256)
print("unique values in e256", uniq.tolist())

is_binary = torch.all((e256== 0) | (e256== 1)).item()
print("is strictly binary (0/1):", is_binary)

# --- 4) Show deterministic expansion to d_model=1024 via repeat_interleave ---
scale = model.config.scale  # should be 1024 // 256 = 4
e1024 = e256.repeat_interleave(scale)  # shape: (1024,)
print("expanded embedding shape:", tuple(e1024.shape))
print("expanded embedding first 512 values:", e1024[:512].tolist())

# --- 5) Global check: all embedding weights are exactly 0/1 ---
is_binary_global = torch.all((W == 0) | (W == 1)).item()
num_non_binary = torch.numel(W) - torch.sum((W == 0) | (W == 1)).item()
print("is binary globally (0/1):", is_binary_global)
print("non-binary entries:", int(num_non_binary))

Expected output highlights (example):

vocab_size: 65536
config: {'vocab_size': 65536, 'n_embed': 256, 'd_model': 1024, 'n_layer': 16, 'n_head': 32, 'scale': 4}
token_embeddings.weight shape: (65536, 256)
text='A'
ids: [65]
tokens: ['A']
256-dim embedding for token id 65 : [1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0]
unique values in e256 [0.0, 1.0]
is strictly binary (0/1): True
expanded embedding shape: (1024,)
expanded embedding first 512 values: [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0]
is binary globally (0/1): True
non-binary entries: 0

Intended use

This model is intended for research only, especially for:

Comparing against Model_UNI_GLYPH (frozen glyph/PCA embeddings) and against trainable-embedding baselines
Testing whether semantic structure can emerge when the embedding layer is frozen and random (but uniquely identifies tokens)
Studying optimization behavior when embeddings do not absorb semantic gradients

Not intended for production deployment (no instruction tuning, safety tuning, or factuality guarantees).

🧑‍🔬 Citation & Concept

If you use this model or the underlying concepts in your research, please cite our work:

@article{
      bochkov2025emergent,
      title={Emergent Semantics Beyond Token Embeddings: Transformer {LM}s with Frozen Visual Unicode Representations},
      author={Andrey Bochkov},
      journal={Transactions on Machine Learning Research},
      issn={2835-8856},
      year={2025},
      url={https://openreview.net/forum?id=Odh8IynO1o},
      note={}
}
@misc{bochkov2025growingtransformersmodularcomposition,
      title={Growing Transformers: Modular Composition and Layer-wise Expansion on a Frozen Substrate}, 
      author={A. Bochkov},
      year={2025},
      eprint={2507.07129},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2507.07129}, 
}