Experimental ONNX build - Unofficial ONNX export of fastino/gliner2-multi-v1.

gliner2-onnx

GLiNER2 ONNX runtime for Python. Runs GLiNER2 models without PyTorch.

This library is experimental. The API may change between versions.

Features

Zero-shot NER and text classification
Runs with ONNX Runtime (no PyTorch dependency)
FP32 and FP16 precision support
GPU acceleration via CUDA

All other GLiNER2 features such as JSON export are not supported.

Installation

pip install gliner2-onnx

NER

from gliner2_onnx import GLiNER2ONNXRuntime

runtime = GLiNER2ONNXRuntime.from_pretrained("lmo3/gliner2-large-v1-onnx")

entities = runtime.extract_entities(
    "John works at Google in Seattle",
    ["person", "organization", "location"]
)
# [
#   Entity(text='John', label='person', start=0, end=4, score=0.98),
#   Entity(text='Google', label='organization', start=14, end=20, score=0.97),
#   Entity(text='Seattle', label='location', start=24, end=31, score=0.96)
# ]

Classification

from gliner2_onnx import GLiNER2ONNXRuntime

runtime = GLiNER2ONNXRuntime.from_pretrained("lmo3/gliner2-large-v1-onnx")

# Single-label classification
result = runtime.classify(
    "Buy milk from the store",
    ["shopping", "work", "entertainment"]
)
# {'shopping': 0.95}

# Multi-label classification
result = runtime.classify(
    "Buy milk and finish the report",
    ["shopping", "work", "entertainment"],
    threshold=0.3,
    multi_label=True
)
# {'shopping': 0.85, 'work': 0.72}

CUDA

To use CUDA for GPU acceleration:

runtime = GLiNER2ONNXRuntime.from_pretrained(
    "lmo3/gliner2-large-v1-onnx",
    providers=["CUDAExecutionProvider", "CPUExecutionProvider"]
)

Precision

Both FP32 and FP16 models are supported. Only the requested precision is downloaded.

runtime = GLiNER2ONNXRuntime.from_pretrained(
    "lmo3/gliner2-large-v1-onnx",
    precision="fp16"
)

Models

Pre-exported ONNX models:

Model	HuggingFace
gliner2-large-v1	lmo3/gliner2-large-v1-onnx
gliner2-multi-v1	lmo3/gliner2-multi-v1-onnx

Note: gliner2-base-v1 is not supported (uses a different architecture).

Exporting Models

To export your own models, clone the repository and use make:

git clone https://github.com/lmoe/gliner2-onnx
cd gliner2-onnx

# FP32 only
make onnx-export MODEL=fastino/gliner2-large-v1

# FP32 + FP16
make onnx-export MODEL=fastino/gliner2-large-v1 QUANTIZE=fp16

Output is saved to model_out/<model-name>/.

JavaScript/TypeScript

For Node.js, see @lmoe/gliner-onnx.js.

Credits

fastino-ai/GLiNER2 - Original GLiNER2 implementation
fastino/gliner2-large-v1 - Pre-trained models

License

MIT

Downloads last month: 1,575

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for lmo3/gliner2-multi-v1-onnx

Base model

fastino/gliner2-multi-v1

Quantized

(5)

this model