File size: 2,464 Bytes
987f87d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
---
license: apache-2.0
language:
- en
- multilingual
pipeline_tag: token-classification
tags:
- gliner
- ner
- token-classification
- social-media
- username-extraction
- onnx
- int8
- quantized
- cpu
library_name: gliner
base_model: LumeData/HandleAtlas-166m
---

# HandleAtlas-166m-CPU

CPU-optimized ONNX INT8 variant of [LumeData/HandleAtlas-166m](https://huggingface.co/LumeData/HandleAtlas-166m).
~4× smaller and 4–6× faster than the PyTorch float weights, intended for CPU inference.

## What's in this repo

- `model.onnx` — fp32 ONNX export
- `model_quantized.onnx` — INT8 dynamic-quantized ONNX (load this for the fastest path)
- Tokenizer + GLiNER config files

## Usage (quantized + thread-tuned)

```python
import os, torch
import onnxruntime as ort
from gliner import GLiNER

# Match physical (not logical) cores. 4–8 is a good default on laptops.
N_THREADS = 8
os.environ["OMP_NUM_THREADS"] = str(N_THREADS)
torch.set_num_threads(N_THREADS)

model = GLiNER.from_pretrained(
    "LumeData/HandleAtlas-166m-CPU",
    load_onnx_model=True,
    onnx_model_file="model_quantized.onnx",
)

labels = ['instagram_username', 'snapchat_username', 'youtube_username', 'twitch_username', 'tiktok_username', 'discord_username', 'x_username', 'cashapp_username', 'onlyfans_username', 'tumblr_username', 'github_username', 'kofi_username', 'patreon_username', 'roblox_username', 'generic_username']

text = "Insta: foodgrammer | Snap: chefchef | DC: gamer420 | $cashtag"
for ent in model.predict_entities(text, labels, threshold=0.5):
    print(f"{ent['text']!r} -> {ent['label']} ({ent['score']:.2f})")
```

To use the unquantized ONNX (smaller accuracy delta, ~2× faster than PyTorch):
swap `onnx_model_file="model_quantized.onnx"` for `"model.onnx"`.

## Recommended thresholds

- Default: `threshold=0.5`
- For `generic_username`, bump to `0.65` to reduce false positives.

## Notes on quality

INT8 dynamic quantization typically costs <1 F1 point on this kind of task.
For applications that require the absolute best precision, use the float
variant [LumeData/HandleAtlas-166m](https://huggingface.co/LumeData/HandleAtlas-166m).

## Labels

- `instagram_username`
- `snapchat_username`
- `youtube_username`
- `twitch_username`
- `tiktok_username`
- `discord_username`
- `x_username`
- `cashapp_username`
- `onlyfans_username`
- `tumblr_username`
- `github_username`
- `kofi_username`
- `patreon_username`
- `roblox_username`
- `generic_username`