Instructions to use Bochkov/llm-fix-min-baseline-learned-input-table-model-classic with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Bochkov/llm-fix-min-baseline-learned-input-table-model-classic with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Bochkov/llm-fix-min-baseline-learned-input-table-model-classic", trust_remote_code=True)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("Bochkov/llm-fix-min-baseline-learned-input-table-model-classic", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Bochkov/llm-fix-min-baseline-learned-input-table-model-classic with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Bochkov/llm-fix-min-baseline-learned-input-table-model-classic"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bochkov/llm-fix-min-baseline-learned-input-table-model-classic",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/Bochkov/llm-fix-min-baseline-learned-input-table-model-classic

SGLang

How to use Bochkov/llm-fix-min-baseline-learned-input-table-model-classic with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Bochkov/llm-fix-min-baseline-learned-input-table-model-classic" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bochkov/llm-fix-min-baseline-learned-input-table-model-classic",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Bochkov/llm-fix-min-baseline-learned-input-table-model-classic" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Bochkov/llm-fix-min-baseline-learned-input-table-model-classic",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use Bochkov/llm-fix-min-baseline-learned-input-table-model-classic with Docker Model Runner:
```
docker model run hf.co/Bochkov/llm-fix-min-baseline-learned-input-table-model-classic
```

Learned Input Table Model Classic

Research checkpoint for the paper:

Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes

Model variant

This repository contains the learned input table baseline.

The model is a 32-layer decoder-only Transformer with:

vocabulary size: 65,536
model width: 1024
number of layers: 32
number of attention heads: 32
context length: 1024
rotary positional embeddings
GELU activations
untied trainable output projection

This baseline uses a standard trainable input embedding table of size:

65,536 x 1024 = 67,108,864 trainable input parameters

Intended use

This checkpoint is provided for reproducibility of the paper's controlled comparison. It is intended for research use only.

Loading example

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

repo_id = "Bochkov/llm-fix-min-baseline-learned-input-table-model-classic"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True)
model.eval()

prompt = "Question: What is the capital of United Kingdom?\nAnswer:"
input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long)

with torch.no_grad():
    output_ids = model.generate(input_ids, max_new_tokens=3, do_sample=False)

print(tokenizer.decode(output_ids[0].tolist()))

Limitations

This is a small research language model trained for architectural comparison. It is not instruction-tuned for safe deployment and should not be used as a production system.

Training data

The model was trained on the same FineWeb-Edu + Cosmopedia mixture used for the matched comparisons in the paper. Dataset terms and licenses are those of the original datasets.

🧑‍🔬 Citation & Concept

If you use this model or the underlying concepts in your research, please cite our work:

@misc{bochkov2026languagemodelstrainableinput,
      title={Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes}, 
      author={A. Bochkov},
      year={2026},
      eprint={2605.09751},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2605.09751}, 
}

Downloads last month: 3

Safetensors

Model size

0.5B params

Tensor type

F32

Collection including Bochkov/llm-fix-min-baseline-learned-input-table-model-classic

Language Models Without a Trainable Input Embedding Table

Collection

This collection is provided for reproducibility of the paper's main claim • 3 items • Updated May 12

Paper for Bochkov/llm-fix-min-baseline-learned-input-table-model-classic

Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes

Paper • 2605.09751 • Published May 10