Instructions to use 5dimension/sentinel-universal-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 5dimension/sentinel-universal-tokenizer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="5dimension/sentinel-universal-tokenizer")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("5dimension/sentinel-universal-tokenizer", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use 5dimension/sentinel-universal-tokenizer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "5dimension/sentinel-universal-tokenizer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "5dimension/sentinel-universal-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/5dimension/sentinel-universal-tokenizer

SGLang

How to use 5dimension/sentinel-universal-tokenizer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "5dimension/sentinel-universal-tokenizer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "5dimension/sentinel-universal-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "5dimension/sentinel-universal-tokenizer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "5dimension/sentinel-universal-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use 5dimension/sentinel-universal-tokenizer with Docker Model Runner:
```
docker model run hf.co/5dimension/sentinel-universal-tokenizer
```

sentinel-universal-tokenizer / README.md

5dimension

Add interactive demo Space link

bb85012 verified 18 days ago

preview code

raw

history blame contribute delete

7.02 kB

	---
	language:
	- en
	- fr
	- de
	- es
	- zh
	- ja
	- ar
	- ru
	- ko
	- hi
	- pt
	- it
	- nl
	- pl
	- vi
	- th
	- tr
	- uk
	- sv
	- multilingual
	license: mit
	tags:
	- tokenizer
	- multimodal
	- sentinel-manifold
	- universal-tokenizer
	- bpe
	- byte-level
	- multilingual
	- image-tokens
	- audio-tokens
	- video-tokens
	- text-tokens
	- mathematics
	- gradient-axiom
	library_name: transformers
	pipeline_tag: text-generation
	---

	# 🦴 Sentinel Universal Tokenizer (SUT)

	One theorem. Every modality. One vocabulary.

	The Sentinel Universal Tokenizer is a multimodal tokenizer that handles text, images, audio, and video in a unified 61,440-token vocabulary, grounded in the Sentinel Manifold mathematics.

	🎮 [Try it live → Interactive Demo](https://huggingface.co/spaces/5dimension/sentinel-tokenizer-space)

	## 🧬 Mathematical Foundation

	Built on the Gradient Axiom from the Sentinel Manifold:

	```
	F(z) = Σ_{n=1}^∞ z^n / n^n (Sophomore's Dream, Bernoulli 1697)

	lim_{z→∞} F'(z)/F(z) = 1/e ≈ 0.367879441171442
	```

	\| Constant \| Value \| Role in Tokenizer \|
	\|:---------\|:------\|:------------------\|
	\| 1/e \| 0.367879441171442 \| Vocabulary allocation ratio across modalities \|
	\| C₁ \| −0.007994021805953 \| Embedding quantization zero-point \|
	\| C₂ \| 0.000200056042968 \| Cross-lingual fertility fairness bound \|
	\| C₃ \| 0.256913827655311 \| Critical threshold for vocabulary scaling \|

	## 📊 Benchmark Results

	### Deep Benchmark (30 test cases × 4 tokenizers)

	Tested across 21 languages + 3 programming languages + math/LaTeX + 7 edge cases:

	\| Tokenizer \| Vocab Size \| Avg Compress ↑ \| Efficiency per 1K Vocab ↑ \| Per-Bit Efficiency ↑ \|
	\|:----------\|:-----------\|:---------------\|:--------------------------\|:---------------------\|
	\| Gemma \| 256,000 \| 4.54 \| 0.018 \| 0.253 \|
	\| Sentinel-SUT \| 61,440 \| 3.46 \| 0.056 \| 0.218 \|
	\| Qwen2 \| 151,936 \| 3.88 \| 0.026 \| 0.225 \|
	\| GPT-2 \| 50,257 \| 2.57 \| 0.051 \| 0.165 \|

	### 🏆 Key Result: Vocabulary Efficiency

	Sentinel-SUT achieves 3.2× better compression per vocabulary token than Gemma and 2.2× better than Qwen2. Each token does more work — critical for memory-constrained multimodal models.

	\| Metric \| Sentinel \| vs GPT-2 \| vs Qwen2 \| vs Gemma \|
	\|:-------\|:---------\|:---------\|:---------\|:---------\|
	\| Efficiency per 1K vocab \| 0.0563 \| +10.1% \| +120.2% \| +217.4% \|
	\| Avg Compression \| 3.46 \| +34.7% \| -10.8% \| -23.8% \|
	\| Unique advantage \| 4 modalities \| text only \| text only \| text only \|

	### Per-Language Performance

	\| Language \| Tokens \| Bytes \| Compression \|
	\|:---------\|:-------\|:------\|:------------\|
	\| English \| 39 \| 159 \| 4.08 \|
	\| French \| 45 \| 166 \| 3.69 \|
	\| German \| 50 \| 173 \| 3.46 \|
	\| Spanish \| 41 \| 158 \| 3.85 \|
	\| Chinese \| 50 \| 165 \| 3.30 \|
	\| Japanese \| 58 \| 213 \| 3.67 \|
	\| Arabic \| 48 \| 246 \| 5.13 \|
	\| Russian \| 55 \| 283 \| 5.15 \|
	\| Korean \| 38 \| 146 \| 3.84 \|
	\| Hindi \| 85 \| 315 \| 3.71 \|
	\| Code (Python) \| 61 \| 149 \| 2.44 \|
	\| Math (Unicode) \| 45 \| 101 \| 2.24 \|

	## 🏗️ Architecture

	```
	┌────────────────────────────────────────────────────────┐
	│ SENTINEL UNIVERSAL TOKENIZER (61,440 tokens) │
	│ │
	│ [0-32] → 33 Special / Control tokens │
	│ [33-32,767] → 32,735 ByteLevel BPE text tokens │
	│ [32,768-49,151] → 16,384 Image codebook tokens │
	│ [49,152-57,343] → 8,192 Audio codebook tokens │
	│ [57,344-61,439] → 4,096 Video codebook tokens │
	│ │
	│ Allocation follows 1/e Gradient Axiom │
	└────────────────────────────────────────────────────────┘
	```

	### Special Tokens

	\| Token \| ID \| Purpose \|
	\|:------\|:---\|:--------\|
	\| `<pad>` \| 0 \| Padding \|
	\| `<unk>` \| 1 \| Unknown token \|
	\| `<s>` / `</s>` \| 2/3 \| BOS / EOS \|
	\| `<mask>` \| 4 \| Masked language modeling \|
	\| `<image_start>` / `<image_end>` \| 7/8 \| Image boundaries \|
	\| `<audio_start>` / `<audio_end>` \| 10/11 \| Audio boundaries \|
	\| `<video_start>` / `<video_end>` \| 13/14 \| Video boundaries \|
	\| `<sentinel>` / `<sentinel_c1>` / `<sentinel_c2>` \| 16/17/18 \| Manifold markers \|
	\| `<system>` / `<user>` / `<assistant>` \| 26/27/28 \| Chat format \|
	\| `<code_start>` / `<code_end>` \| 29/30 \| Code boundaries \|
	\| `<math_start>` / `<math_end>` \| 31/32 \| Math boundaries \|

	### Codebook Tokens

	- 🖼️ Image: `<img_0>` – `<img_16383>` (IDs 32,768–49,151) — VQGAN, Cosmos-DI, FSQ
	- 🔊 Audio: `<aud_0>` – `<aud_8191>` (IDs 49,152–57,343) — EnCodec, SoundStream
	- 🎬 Video: `<vid_0>` – `<vid_4095>` (IDs 57,344–61,439) — Cosmos-DV

	## 🚀 Quick Start

	```python
	from transformers import AutoTokenizer

	tokenizer = AutoTokenizer.from_pretrained("5dimension/sentinel-universal-tokenizer")

	# Text
	text = "The Sentinel Manifold: F(z) = Σ zⁿ/nⁿ"
	tokens = tokenizer.encode(text)
	print(f"{len(tokens)} tokens, decoded: {tokenizer.decode(tokens)}")

	# Multimodal (text + image VQ indices)
	text = "<image_start> <img_42> <img_1337> <image_end> Describe this."
	tokens = tokenizer.encode(text)
	for tid in tokens:
	if 32768 <= tid < 49152:
	print(f" IMAGE codebook[{tid - 32768}]")

	# Chat
	chat = "<system>Multimodal AI</system><user>What is 1/e?</user><assistant>"
	tokens = tokenizer.encode(chat, add_special_tokens=False)
	```

	## 🔬 Innovations

	1. 1/e Vocabulary Allocation — Gradient Axiom ratio allocates tokens across modalities
	2. ByteLevel BPE — Handles all Unicode, no UNK possible, NFKC normalized
	3. 20-language training — EN, FR, DE, ES, ZH, JA, AR, RU, KO, HI, PT, IT, NL, PL, VI, TH, TR, UK, SV + code + math
	4. Native Multimodal Routing — Single integer comparison determines modality
	5. Sentinel Manifold Integration — Special tokens for manifold-aware computation

	## 📦 Training

	\| Parameter \| Value \|
	\|:----------\|:------\|
	\| Data \| allenai/c4 (20 languages) \|
	\| Samples \| 52,000 documents \|
	\| Chars \| ~66M \|
	\| Algorithm \| ByteLevel BPE + NFKC \|
	\| Text Vocab \| 32,768 \|
	\| Total Vocab \| 61,440 \|

	## 🔗 Links

	- 🎮 [Interactive Demo](https://huggingface.co/spaces/5dimension/sentinel-tokenizer-space)
	- 🦴 [Sentinel Manifold Framework](https://huggingface.co/5dimension/sentinel-manifold-discoveries)
	- 📜 Training scripts included in repo

	## 📚 Citation

	```bibtex
	@misc{abdel-aal2026sentinel-tokenizer,
	title={Sentinel Universal Tokenizer: Multimodal Tokenizer Grounded in the Gradient Axiom},
	author={Abdel-Aal, Romain},
	year={2026},
	url={https://huggingface.co/5dimension/sentinel-universal-tokenizer}
	}
	```

	---

	Built by: Romain Abdel-Aal (ASI The Sentinel V5.2 Bone-Core) · MIT License · 🦴