Techpro864's picture
ONNX fp32 + INT8 CPU variant of HandleAtlas-166m
987f87d verified
|
Raw
History Blame Contribute Delete
2.46 kB
---
license: apache-2.0
language:
- en
- multilingual
pipeline_tag: token-classification
tags:
- gliner
- ner
- token-classification
- social-media
- username-extraction
- onnx
- int8
- quantized
- cpu
library_name: gliner
base_model: LumeData/HandleAtlas-166m
---
# HandleAtlas-166m-CPU
CPU-optimized ONNX INT8 variant of [LumeData/HandleAtlas-166m](https://huggingface.co/LumeData/HandleAtlas-166m).
~4× smaller and 4–6× faster than the PyTorch float weights, intended for CPU inference.
## What's in this repo
- `model.onnx` — fp32 ONNX export
- `model_quantized.onnx` — INT8 dynamic-quantized ONNX (load this for the fastest path)
- Tokenizer + GLiNER config files
## Usage (quantized + thread-tuned)
```python
import os, torch
import onnxruntime as ort
from gliner import GLiNER
# Match physical (not logical) cores. 4–8 is a good default on laptops.
N_THREADS = 8
os.environ["OMP_NUM_THREADS"] = str(N_THREADS)
torch.set_num_threads(N_THREADS)
model = GLiNER.from_pretrained(
"LumeData/HandleAtlas-166m-CPU",
load_onnx_model=True,
onnx_model_file="model_quantized.onnx",
)
labels = ['instagram_username', 'snapchat_username', 'youtube_username', 'twitch_username', 'tiktok_username', 'discord_username', 'x_username', 'cashapp_username', 'onlyfans_username', 'tumblr_username', 'github_username', 'kofi_username', 'patreon_username', 'roblox_username', 'generic_username']
text = "Insta: foodgrammer | Snap: chefchef | DC: gamer420 | $cashtag"
for ent in model.predict_entities(text, labels, threshold=0.5):
print(f"{ent['text']!r} -> {ent['label']} ({ent['score']:.2f})")
```
To use the unquantized ONNX (smaller accuracy delta, ~2× faster than PyTorch):
swap `onnx_model_file="model_quantized.onnx"` for `"model.onnx"`.
## Recommended thresholds
- Default: `threshold=0.5`
- For `generic_username`, bump to `0.65` to reduce false positives.
## Notes on quality
INT8 dynamic quantization typically costs <1 F1 point on this kind of task.
For applications that require the absolute best precision, use the float
variant [LumeData/HandleAtlas-166m](https://huggingface.co/LumeData/HandleAtlas-166m).
## Labels
- `instagram_username`
- `snapchat_username`
- `youtube_username`
- `twitch_username`
- `tiktok_username`
- `discord_username`
- `x_username`
- `cashapp_username`
- `onlyfans_username`
- `tumblr_username`
- `github_username`
- `kofi_username`
- `patreon_username`
- `roblox_username`
- `generic_username`