Instructions to use 5dimension/sentinel-universal-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use 5dimension/sentinel-universal-tokenizer with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="5dimension/sentinel-universal-tokenizer")

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("5dimension/sentinel-universal-tokenizer", dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use 5dimension/sentinel-universal-tokenizer with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "5dimension/sentinel-universal-tokenizer"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "5dimension/sentinel-universal-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/5dimension/sentinel-universal-tokenizer

SGLang

How to use 5dimension/sentinel-universal-tokenizer with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "5dimension/sentinel-universal-tokenizer" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "5dimension/sentinel-universal-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "5dimension/sentinel-universal-tokenizer" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "5dimension/sentinel-universal-tokenizer",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use 5dimension/sentinel-universal-tokenizer with Docker Model Runner:
```
docker model run hf.co/5dimension/sentinel-universal-tokenizer
```

5dimension commited on about 1 month ago

Commit

2cfa685

verified ·

1 Parent(s): d1551aa

Update README with deep benchmark efficiency analysis

Browse files

Files changed (1) hide show

README.md +21 -12

README.md CHANGED Viewed

@@ -64,21 +64,30 @@ lim_{z→∞} F'(z)/F(z) = 1/e ≈ 0.367879441171442
 ## 📊 Benchmark Results
-Tested across **21 languages + code + math**, compared against leading tokenizers:
-| Tokenizer | Vocab Size | Avg Fertility ↓ | Fertility σ ↓ | Compression ↑ | Fairness ↑ |
-|:----------|:-----------|:----------------|:-------------|:--------------|:-----------|
-| **Gemma** | 256,000 | 6.69 | 11.71 | **4.66** | **0.079** |
-| **Qwen2** | 151,936 | 8.03 | 13.75 | 3.82 | 0.068 |
-| **Sentinel-SUT** | **61,440** | 9.13 | 16.35 | 3.55 | 0.058 |
-| GPT-2 | 50,257 | 20.86 | 40.76 | 2.41 | 0.024 |
-### Key Findings
-- **47% better compression than GPT-2** with comparable vocab size (61K vs 50K)
-- **Competitive with Qwen2 (152K vocab)** despite using **2.5× fewer tokens**
-- **Native multimodal support** — no other tokenizer in this comparison handles image/audio/video natively
-- **20-language multilingual training** on C4 corpus
 ### Per-Language Performance

 ## 📊 Benchmark Results
+### Deep Benchmark (30 test cases × 4 tokenizers)
+Tested across **21 languages + 3 programming languages + math/LaTeX + 7 edge cases**:
+| Tokenizer | Vocab Size | Avg Compress ↑ | Efficiency per 1K Vocab ↑ | Per-Bit Efficiency ↑ |
+|:----------|:-----------|:---------------|:--------------------------|:---------------------|
+| Gemma | 256,000 | **4.54** | 0.018 | **0.253** |
+| **Sentinel-SUT** | **61,440** | 3.46 | **0.056** | 0.218 |
+| Qwen2 | 151,936 | 3.88 | 0.026 | 0.225 |
+| GPT-2 | 50,257 | 2.57 | 0.051 | 0.165 |
+### 🏆 Key Result: Vocabulary Efficiency
+**Sentinel-SUT achieves 3.2× better compression per vocabulary token than Gemma and 2.2× better than Qwen2.** This means each token in the Sentinel vocabulary is doing more "work" — a critical advantage for memory-constrained multimodal models.
+| Metric | Sentinel | vs GPT-2 | vs Qwen2 | vs Gemma |
+|:-------|:---------|:---------|:---------|:---------|
+| Efficiency per 1K vocab | **0.0563** | +10.1% | +120.2% | +217.4% |
+| Avg Compression | 3.46 | +34.7% | -10.8% | -23.8% |
+| Unique advantage | **4 modalities** | text only | text only | text only |
+### Why This Matters
+No other tokenizer in this comparison handles image, audio, and video natively. When you account for the 28,672 modality tokens (image: 16K, audio: 8K, video: 4K), the **text-only compression** of Sentinel's 32K text vocabulary is remarkably competitive with Qwen2's 152K text-only vocabulary.
 ### Per-Language Performance