--- license: apache-2.0 language: - en - multilingual pipeline_tag: token-classification tags: - gliner - ner - token-classification - social-media - username-extraction - onnx - int8 - quantized - cpu library_name: gliner base_model: LumeData/HandleAtlas-166m --- # HandleAtlas-166m-CPU CPU-optimized ONNX INT8 variant of [LumeData/HandleAtlas-166m](https://huggingface.co/LumeData/HandleAtlas-166m). ~4× smaller and 4–6× faster than the PyTorch float weights, intended for CPU inference. ## What's in this repo - `model.onnx` — fp32 ONNX export - `model_quantized.onnx` — INT8 dynamic-quantized ONNX (load this for the fastest path) - Tokenizer + GLiNER config files ## Usage (quantized + thread-tuned) ```python import os, torch import onnxruntime as ort from gliner import GLiNER # Match physical (not logical) cores. 4–8 is a good default on laptops. N_THREADS = 8 os.environ["OMP_NUM_THREADS"] = str(N_THREADS) torch.set_num_threads(N_THREADS) model = GLiNER.from_pretrained( "LumeData/HandleAtlas-166m-CPU", load_onnx_model=True, onnx_model_file="model_quantized.onnx", ) labels = ['instagram_username', 'snapchat_username', 'youtube_username', 'twitch_username', 'tiktok_username', 'discord_username', 'x_username', 'cashapp_username', 'onlyfans_username', 'tumblr_username', 'github_username', 'kofi_username', 'patreon_username', 'roblox_username', 'generic_username'] text = "Insta: foodgrammer | Snap: chefchef | DC: gamer420 | $cashtag" for ent in model.predict_entities(text, labels, threshold=0.5): print(f"{ent['text']!r} -> {ent['label']} ({ent['score']:.2f})") ``` To use the unquantized ONNX (smaller accuracy delta, ~2× faster than PyTorch): swap `onnx_model_file="model_quantized.onnx"` for `"model.onnx"`. ## Recommended thresholds - Default: `threshold=0.5` - For `generic_username`, bump to `0.65` to reduce false positives. ## Notes on quality INT8 dynamic quantization typically costs <1 F1 point on this kind of task. For applications that require the absolute best precision, use the float variant [LumeData/HandleAtlas-166m](https://huggingface.co/LumeData/HandleAtlas-166m). ## Labels - `instagram_username` - `snapchat_username` - `youtube_username` - `twitch_username` - `tiktok_username` - `discord_username` - `x_username` - `cashapp_username` - `onlyfans_username` - `tumblr_username` - `github_username` - `kofi_username` - `patreon_username` - `roblox_username` - `generic_username`