Text Generation
Transformers
tokenizer
multimodal
sentinel-manifold
universal-tokenizer
bpe
byte-level
image-tokens
audio-tokens
video-tokens
text-tokens
mathematics
gradient-axiom
Instructions to use 5dimension/sentinel-universal-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use 5dimension/sentinel-universal-tokenizer with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="5dimension/sentinel-universal-tokenizer")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("5dimension/sentinel-universal-tokenizer", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use 5dimension/sentinel-universal-tokenizer with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "5dimension/sentinel-universal-tokenizer" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "5dimension/sentinel-universal-tokenizer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/5dimension/sentinel-universal-tokenizer
- SGLang
How to use 5dimension/sentinel-universal-tokenizer with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "5dimension/sentinel-universal-tokenizer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "5dimension/sentinel-universal-tokenizer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "5dimension/sentinel-universal-tokenizer" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "5dimension/sentinel-universal-tokenizer", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use 5dimension/sentinel-universal-tokenizer with Docker Model Runner:
docker model run hf.co/5dimension/sentinel-universal-tokenizer
Update README with deep benchmark efficiency analysis
Browse files
README.md
CHANGED
|
@@ -64,21 +64,30 @@ lim_{zββ} F'(z)/F(z) = 1/e β 0.367879441171442
|
|
| 64 |
|
| 65 |
## π Benchmark Results
|
| 66 |
|
| 67 |
-
|
| 68 |
|
| 69 |
-
|
| 70 |
-
|:----------|:-----------|:----------------|:-------------|:--------------|:-----------|
|
| 71 |
-
| **Gemma** | 256,000 | 6.69 | 11.71 | **4.66** | **0.079** |
|
| 72 |
-
| **Qwen2** | 151,936 | 8.03 | 13.75 | 3.82 | 0.068 |
|
| 73 |
-
| **Sentinel-SUT** | **61,440** | 9.13 | 16.35 | 3.55 | 0.058 |
|
| 74 |
-
| GPT-2 | 50,257 | 20.86 | 40.76 | 2.41 | 0.024 |
|
| 75 |
|
| 76 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 77 |
|
| 78 |
-
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 82 |
|
| 83 |
### Per-Language Performance
|
| 84 |
|
|
|
|
| 64 |
|
| 65 |
## π Benchmark Results
|
| 66 |
|
| 67 |
+
### Deep Benchmark (30 test cases Γ 4 tokenizers)
|
| 68 |
|
| 69 |
+
Tested across **21 languages + 3 programming languages + math/LaTeX + 7 edge cases**:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 70 |
|
| 71 |
+
| Tokenizer | Vocab Size | Avg Compress β | Efficiency per 1K Vocab β | Per-Bit Efficiency β |
|
| 72 |
+
|:----------|:-----------|:---------------|:--------------------------|:---------------------|
|
| 73 |
+
| Gemma | 256,000 | **4.54** | 0.018 | **0.253** |
|
| 74 |
+
| **Sentinel-SUT** | **61,440** | 3.46 | **0.056** | 0.218 |
|
| 75 |
+
| Qwen2 | 151,936 | 3.88 | 0.026 | 0.225 |
|
| 76 |
+
| GPT-2 | 50,257 | 2.57 | 0.051 | 0.165 |
|
| 77 |
|
| 78 |
+
### π Key Result: Vocabulary Efficiency
|
| 79 |
+
|
| 80 |
+
**Sentinel-SUT achieves 3.2Γ better compression per vocabulary token than Gemma and 2.2Γ better than Qwen2.** This means each token in the Sentinel vocabulary is doing more "work" β a critical advantage for memory-constrained multimodal models.
|
| 81 |
+
|
| 82 |
+
| Metric | Sentinel | vs GPT-2 | vs Qwen2 | vs Gemma |
|
| 83 |
+
|:-------|:---------|:---------|:---------|:---------|
|
| 84 |
+
| Efficiency per 1K vocab | **0.0563** | +10.1% | +120.2% | +217.4% |
|
| 85 |
+
| Avg Compression | 3.46 | +34.7% | -10.8% | -23.8% |
|
| 86 |
+
| Unique advantage | **4 modalities** | text only | text only | text only |
|
| 87 |
+
|
| 88 |
+
### Why This Matters
|
| 89 |
+
|
| 90 |
+
No other tokenizer in this comparison handles image, audio, and video natively. When you account for the 28,672 modality tokens (image: 16K, audio: 8K, video: 4K), the **text-only compression** of Sentinel's 32K text vocabulary is remarkably competitive with Qwen2's 152K text-only vocabulary.
|
| 91 |
|
| 92 |
### Per-Language Performance
|
| 93 |
|