| --- |
| license: apache-2.0 |
| library_name: transformers |
| tags: |
| - causal-lm |
| - text-generation |
| - transformer |
| - decoder-only |
| - fixed-embeddings |
| - binary-token-codes |
| - research |
| language: |
| - en |
| --- |
| |
| # Fixed Minimal Binary Code Model |
|
|
| Research checkpoint for the paper: |
|
|
| **Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes** |
|
|
| ## Model variant |
|
|
| This repository contains the **fixed minimal binary token-code model**. |
|
|
| Instead of a trainable input embedding table, each token ID is represented by its exact minimal binary code. |
|
|
| For vocabulary size: |
|
|
| ```text |
| V = 65,536 |
| ``` |
|
|
| the minimal injective binary code width is: |
|
|
| ```text |
| K = ceil(log2(V)) = 16 |
| ``` |
|
|
| The 16-dimensional binary code is tiled to model width 1024. |
|
|
| The model therefore uses: |
|
|
| ```text |
| 0 trainable input-embedding parameters |
| ``` |
|
|
| The output projection remains standard and trainable. |
|
|
| ## Architecture |
|
|
| - decoder-only Transformer |
| - vocabulary size: 65,536 |
| - model width: 1024 |
| - number of layers: 32 |
| - number of attention heads: 32 |
| - context length: 1024 |
| - rotary positional embeddings |
| - GELU activations |
| - untied trainable output projection |
|
|
| ## Loading example |
|
|
| ```python |
| import torch |
| from transformers import AutoTokenizer, AutoModelForCausalLM |
| |
| repo_id = "Bochkov/llm-fix-min-fixed-minimal-binary-code" |
| |
| tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) |
| model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True) |
| model.eval() |
| |
| prompt = "Question: What is the capital of France?\nAnswer:" |
| input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long) |
| |
| with torch.no_grad(): |
| output_ids = model.generate(input_ids, max_new_tokens=3, do_sample=False) |
| |
| print(tokenizer.decode(output_ids[0].tolist())) |
| ``` |
|
|
| ## Intended use |
|
|
| This checkpoint is provided for reproducibility of the paper's main claim: a trainable input embedding table is not necessary for useful language modeling in the studied regime. |
|
|
| ## Limitations |
|
|
| This model is a research checkpoint. It is not intended for deployment. It may produce incorrect, biased, unsafe, or nonsensical outputs. |
|
|
| ## Training data |
|
|
| The model was trained on the same FineWeb-Edu + Cosmopedia mixture used for the matched comparisons in the paper. Dataset terms and licenses are those of the original datasets. |
|
|
| --- |
|
|
| ## 🧑🔬 Citation & Concept |
|
|
| If you use this model or the underlying concepts in your research, please cite our work: |
|
|
| ``` |
| @misc{bochkov2026languagemodelstrainableinput, |
| title={Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes}, |
| author={A. Bochkov}, |
| year={2026}, |
| eprint={2605.09751}, |
| archivePrefix={arXiv}, |
| primaryClass={cs.CL}, |
| url={https://arxiv.org/abs/2605.09751}, |
| } |
| ``` |