--- license: apache-2.0 library_name: transformers tags: - causal-lm - text-generation - transformer - decoder-only - research language: - en --- # Learned Input Table Model Classic This is an anonymized research checkpoint for the paper: **Language Models Without a Trainable Input Embedding Table: Learning from Fixed Minimal Binary Token Codes** ## Model variant This repository contains the **learned input table baseline**. The model is a 32-layer decoder-only Transformer with: - vocabulary size: 65,536 - model width: 1024 - number of layers: 32 - number of attention heads: 32 - context length: 1024 - rotary positional embeddings - GELU activations - untied trainable output projection This baseline uses a standard trainable input embedding table of size: ```text 65,536 x 1024 = 67,108,864 trainable input parameters ``` ## Intended use This checkpoint is provided for anonymous review and reproducibility of the paper's controlled comparison. It is intended for research use only. ## Loading example ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM repo_id = "E6E831728/learned-input-table-model-classic" tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained(repo_id, trust_remote_code=True) model.eval() prompt = "Question: What is the capital of United Kingdom?\nAnswer:" input_ids = torch.tensor([tokenizer.encode(prompt)], dtype=torch.long) with torch.no_grad(): output_ids = model.generate(input_ids, max_new_tokens=3, do_sample=False) print(tokenizer.decode(output_ids[0].tolist())) ``` ## Limitations This is a small research language model trained for architectural comparison. It is not instruction-tuned for safe deployment and should not be used as a production system. ## Training data The model was trained on the same FineWeb-Edu + Cosmopedia mixture used for the matched comparisons in the paper. Dataset terms and licenses are those of the original datasets.