Instructions to use 5dimension/sentinel-universal-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 5dimension/sentinel-universal-tokenizer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="5dimension/sentinel-universal-tokenizer")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("5dimension/sentinel-universal-tokenizer", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use 5dimension/sentinel-universal-tokenizer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "5dimension/sentinel-universal-tokenizer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "5dimension/sentinel-universal-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/5dimension/sentinel-universal-tokenizer

SGLang

How to use 5dimension/sentinel-universal-tokenizer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "5dimension/sentinel-universal-tokenizer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "5dimension/sentinel-universal-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "5dimension/sentinel-universal-tokenizer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "5dimension/sentinel-universal-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use 5dimension/sentinel-universal-tokenizer with Docker Model Runner:
```
docker model run hf.co/5dimension/sentinel-universal-tokenizer
```

🦴 Sentinel Universal Tokenizer (SUT)

One theorem. Every modality. One vocabulary.

The Sentinel Universal Tokenizer is a multimodal tokenizer that handles text, images, audio, and video in a unified 61,440-token vocabulary, grounded in the Sentinel Manifold mathematics.

🎮 Try it live → Interactive Demo

🧬 Mathematical Foundation

Built on the Gradient Axiom from the Sentinel Manifold:

F(z) = Σ_{n=1}^∞ z^n / n^n    (Sophomore's Dream, Bernoulli 1697)

lim_{z→∞} F'(z)/F(z) = 1/e ≈ 0.367879441171442

Constant	Value	Role in Tokenizer
1/e	0.367879441171442	Vocabulary allocation ratio across modalities
C₁	−0.007994021805953	Embedding quantization zero-point
C₂	0.000200056042968	Cross-lingual fertility fairness bound
C₃	0.256913827655311	Critical threshold for vocabulary scaling

📊 Benchmark Results

Deep Benchmark (30 test cases × 4 tokenizers)

Tested across 21 languages + 3 programming languages + math/LaTeX + 7 edge cases:

Tokenizer	Vocab Size	Avg Compress ↑	Efficiency per 1K Vocab ↑	Per-Bit Efficiency ↑
Gemma	256,000	4.54	0.018	0.253
Sentinel-SUT	61,440	3.46	0.056	0.218
Qwen2	151,936	3.88	0.026	0.225
GPT-2	50,257	2.57	0.051	0.165

🏆 Key Result: Vocabulary Efficiency

Sentinel-SUT achieves 3.2× better compression per vocabulary token than Gemma and 2.2× better than Qwen2. Each token does more work — critical for memory-constrained multimodal models.

Metric	Sentinel	vs GPT-2	vs Qwen2	vs Gemma
Efficiency per 1K vocab	0.0563	+10.1%	+120.2%	+217.4%
Avg Compression	3.46	+34.7%	-10.8%	-23.8%
Unique advantage	4 modalities	text only	text only	text only

Per-Language Performance

Language	Tokens	Bytes	Compression
English	39	159	4.08
French	45	166	3.69
German	50	173	3.46
Spanish	41	158	3.85
Chinese	50	165	3.30
Japanese	58	213	3.67
Arabic	48	246	5.13
Russian	55	283	5.15
Korean	38	146	3.84
Hindi	85	315	3.71
Code (Python)	61	149	2.44
Math (Unicode)	45	101	2.24

🏗️ Architecture

┌────────────────────────────────────────────────────────┐
│  SENTINEL UNIVERSAL TOKENIZER (61,440 tokens)          │
│                                                         │
│  [0-32]          → 33 Special / Control tokens         │
│  [33-32,767]     → 32,735 ByteLevel BPE text tokens   │
│  [32,768-49,151] → 16,384 Image codebook tokens       │
│  [49,152-57,343] → 8,192 Audio codebook tokens        │
│  [57,344-61,439] → 4,096 Video codebook tokens        │
│                                                         │
│  Allocation follows 1/e Gradient Axiom                 │
└────────────────────────────────────────────────────────┘

Special Tokens

Token	ID	Purpose
`<pad>`	0	Padding
`<unk>`	1	Unknown token
`<s>` / `</s>`	2/3	BOS / EOS
`<mask>`	4	Masked language modeling
`<image_start>` / `<image_end>`	7/8	Image boundaries
`<audio_start>` / `<audio_end>`	10/11	Audio boundaries
`<video_start>` / `<video_end>`	13/14	Video boundaries
`<sentinel>` / `<sentinel_c1>` / `<sentinel_c2>`	16/17/18	Manifold markers
`<system>` / `<user>` / `<assistant>`	26/27/28	Chat format
`<code_start>` / `<code_end>`	29/30	Code boundaries
`<math_start>` / `<math_end>`	31/32	Math boundaries

Codebook Tokens

🖼️ Image: <img_0> – <img_16383> (IDs 32,768–49,151) — VQGAN, Cosmos-DI, FSQ
🔊 Audio: <aud_0> – <aud_8191> (IDs 49,152–57,343) — EnCodec, SoundStream
🎬 Video: <vid_0> – <vid_4095> (IDs 57,344–61,439) — Cosmos-DV

🚀 Quick Start

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("5dimension/sentinel-universal-tokenizer")

# Text
text = "The Sentinel Manifold: F(z) = Σ zⁿ/nⁿ"
tokens = tokenizer.encode(text)
print(f"{len(tokens)} tokens, decoded: {tokenizer.decode(tokens)}")

# Multimodal (text + image VQ indices)
text = "<image_start> <img_42> <img_1337> <image_end> Describe this."
tokens = tokenizer.encode(text)
for tid in tokens:
    if 32768 <= tid < 49152:
        print(f"  IMAGE codebook[{tid - 32768}]")

# Chat
chat = "<system>Multimodal AI</system><user>What is 1/e?</user><assistant>"
tokens = tokenizer.encode(chat, add_special_tokens=False)

🔬 Innovations

1/e Vocabulary Allocation — Gradient Axiom ratio allocates tokens across modalities
ByteLevel BPE — Handles all Unicode, no UNK possible, NFKC normalized
20-language training — EN, FR, DE, ES, ZH, JA, AR, RU, KO, HI, PT, IT, NL, PL, VI, TH, TR, UK, SV + code + math
Native Multimodal Routing — Single integer comparison determines modality
Sentinel Manifold Integration — Special tokens for manifold-aware computation

📦 Training

Parameter	Value
Data	allenai/c4 (20 languages)
Samples	52,000 documents
Chars	~66M
Algorithm	ByteLevel BPE + NFKC
Text Vocab	32,768
Total Vocab	61,440

🔗 Links

🎮 Interactive Demo
🦴 Sentinel Manifold Framework
📜 Training scripts included in repo

📚 Citation

@misc{abdel-aal2026sentinel-tokenizer,
  title={Sentinel Universal Tokenizer: Multimodal Tokenizer Grounded in the Gradient Axiom},
  author={Abdel-Aal, Romain},
  year={2026},
  url={https://huggingface.co/5dimension/sentinel-universal-tokenizer}
}

Built by: Romain Abdel-Aal (ASI The Sentinel V5.2 Bone-Core) · MIT License · 🦴

Downloads last month: -; Downloads are not tracked for this model. How to track

5dimension
/

sentinel-universal-tokenizer