GLiNER-bi-Encoder: Scalable Zero-Shot Named Entity Recognition
About
GLiNER-bi-Encoder is a novel architecture for Named Entity Recognition (NER) that combines zero-shot flexibility with industrial-scale efficiency. Unlike the original GLiNER, which uses joint encoding, the bi-encoder design decouples text and entity-type encoding, enabling the recognition of thousands of entity types simultaneously with minimal computational overhead.
Key Advantages
Massive Scalability: Handle 1000+ entity types with near-constant inference speed when using pre-computed label embeddings
130× Faster: Up to 130× throughput improvement compared to uni-encoder approaches at 1024 entity types
State-of-the-Art Performance: Achieves 61.5% Micro-F1 on CrossNER benchmark in zero-shot setting
Efficient Caching: Pre-compute and cache entity type embeddings for instant reuse across millions of documents
Architecture
The bi-encoder architecture employs two specialized, independent transformers:
- Text Encoder: Processes input sequences using ModernBERT-based encoders (Ettin family)
- Label Encoder: Embeds entity type descriptions using specialized sentence transformers (BGE, MiniLM)
This separation removes the context-window bottleneck and enables:
- Pre-computation of entity type embeddings
- Constant memory usage for text encoding regardless of entity count
- Efficient nearest-neighbor search for entity matching
Model Variants
GLiNER-bi-V2 Models:
| Model name | Params | Text Encoder | Label Encoder | Avg. CrossNER | Inference Speed (H100, ex/s) | Inference Speed (pre-computed) |
|---|---|---|---|---|---|---|
| gliner-bi-edge-v2.0 | 60 M | ettin-encoder-32m | all-MiniLM-L6-v2 | 54.0% | 13.64 | 24.62 |
| gliner-bi-small-v2.0 | 108 M | ettin-encoder-68m | all-MiniLM-L12-v2 | 57.2% | 7.99 | 15.22 |
| gliner-bi-base-v2.0 | 194 M | ettin-encoder-150m | bge-small-en-v1.5 | 60.3% | 5.91 | 9.51 |
| gliner-bi-large-v2.0 | 530 M | ettin-encoder-400m | bge-base-en-v1.5 | 61.5% | 2.68 | 3.60 |
Recommendation: The base variant (194M) achieves 98% of large model performance while operating 2.6× faster, making it optimal for most production scenarios.
Installation & Usage
Installation
pip install gliner -U
pip install transformers>=4.48.0
For flash attention support:
pip install flash-attn triton
Basic Usage
from gliner import GLiNER
# Load model
model = GLiNER.from_pretrained("knowledgator/gliner-bi-base-v2.0")
text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards, a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player.
"""
labels = ["person", "award", "date", "competitions", "teams"]
entities = model.predict_entities(text, labels, threshold=0.3)
for entity in entities:
print(entity["text"], "=>", entity["label"])
Output:
Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Al Nassr => teams
Portugal national team => teams
Ballon d'Or => award
UEFA Men's Player of the Year Awards => award
European Golden Shoes => award
Advanced Usage: Pre-computing Entity Embeddings
For scenarios with large, static entity taxonomies (hundreds to millions of types):
from gliner import GLiNER
model = GLiNER.from_pretrained("knowledgator/gliner-bi-base-v2.0")
# Pre-compute embeddings for thousands of entity types
entity_types = ["person", "organization", "location", ...] # Can be thousands
texts = ["Your documents here", ...]
# Encode entity types once
entity_embeddings = model.encode_labels(entity_types, batch_size=8)
# Use pre-computed embeddings for fast inference
outputs = model.batch_predict_with_embeds(texts, entity_embeddings, entity_types)
This approach provides:
- 130× speedup at 1024 entity types
- Constant inference time regardless of entity count
- Efficient caching for repeated use
Flash Attention & Extended Context
model = GLiNER.from_pretrained(
"knowledgator/gliner-bi-base-v2.0",
_attn_implementation='flash_attention_2',
max_len=2048
).to('cuda:0')
Zero-Shot NER Performance
Comprehensive evaluation across 19 diverse NER datasets:
| Dataset | gliner-bi-edge-v2.0 | gliner-bi-small-v2.0 | gliner-bi-base-v2.0 | gliner-bi-large-v2.0 |
|---|---|---|---|---|
| ACE 2004 | 26.4% | 27.5% | 28.9% | 31.9% |
| ACE 2005 | 26.2% | 28.1% | 30.0% | 31.4% |
| AnatEM | 39.1% | 43.6% | 35.4% | 39.5% |
| Broad Tweet Corpus | 70.0% | 71.7% | 72.1% | 70.9% |
| CoNLL 2003 | 61.6% | 64.2% | 65.6% | 66.5% |
| FabNER | 22.4% | 23.2% | 24.3% | 22.7% |
| FindVehicle | 35.6% | 40.3% | 40.6% | 39.1% |
| GENIA_NER | 50.1% | 53.8% | 56.8% | 60.1% |
| HarveyNER | 15.0% | 10.6% | 12.6% | 14.7% |
| MultiNERD | 64.6% | 66.0% | 68.0% | 64.0% |
| Ontonotes | 31.4% | 31.9% | 33.3% | 32.5% |
| PolyglotNER | 45.1% | 46.3% | 46.6% | 46.8% |
| TweetNER7 | 36.9% | 40.9% | 40.4% | 41.7% |
| WikiANN en | 52.3% | 54.0% | 54.9% | 56.3% |
| WikiNeural | 78.0% | 79.9% | 80.0% | 76.6% |
| bc2gm | 58.1% | 59.9% | 62.7% | 61.4% |
| bc4chemd | 45.8% | 49.1% | 53.6% | 50.5% |
| bc5cdr | 68.5% | 71.5% | 73.0% | 71.7% |
| ncbi | 65.9% | 65.4% | 65.2% | 65.9% |
| Average | 47.0% | 48.8% | 49.7% | 49.7% |
CrossNER Zero-Shot Benchmark
| Dataset | gliner-bi-edge-v2.0 | gliner-bi-small-v2.0 | gliner-bi-base-v2.0 | gliner-bi-large-v2.0 |
|---|---|---|---|---|
| CrossNER_AI | 53.8% | 54.7% | 58.3% | 57.4% |
| CrossNER_literature | 56.2% | 62.6% | 65.2% | 63.2% |
| CrossNER_music | 68.2% | 72.3% | 73.4% | 74.0% |
| CrossNER_politics | 68.7% | 70.0% | 70.8% | 73.0% |
| CrossNER_science | 63.2% | 66.1% | 68.0% | 67.6% |
| mit-movie | 30.5% | 35.2% | 46.2% | 51.0% |
| mit-restaurant | 37.1% | 39.5% | 40.3% | 44.3% |
| Average (Zero-Shot Benchmark) | 54.0% | 57.2% | 60.3% | 61.5% |
Inference Speed Comparison
Throughput (examples/second) by number of entity types on H100 GPU (batch_size=1):
| Model | 1 | 2 | 4 | 8 | 16 | 32 | 64 | 128 | 256 | 512 | 1024 | Avg |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| gliner-bi-edge-v2.0 | 17.0 | 27.0 | 5.05 | 22.4 | 17.5 | 13.9 | 15.2 | 12.5 | 10.8 | 5.43 | 3.23 | 13.64 |
| gliner-bi-edge-v2.0 (pre-computed) | 19.3 | 25.0 | 28.2 | 32.6 | 31.0 | 32.6 | 22.2 | 22.7 | 22.2 | 16.9 | 18.3 | 24.62 |
| gliner-bi-small-v2.0 | 12.5 | 12.8 | 5.98 | 11.6 | 10.6 | 9.43 | 6.94 | 7.35 | 5.74 | 3.33 | 1.60 | 7.99 |
| gliner-bi-small-v2.0 (pre-computed) | 14.7 | 15.9 | 14.3 | 15.3 | 15.4 | 15.4 | 15.6 | 15.3 | 15.5 | 15.7 | 14.3 | 15.22 |
| gliner-bi-base-v2.0 | 8.13 | 8.62 | 4.85 | 8.00 | 7.52 | 6.76 | 5.71 | 5.21 | 4.64 | 3.21 | 2.30 | 5.91 |
| gliner-bi-base-v2.0 (pre-computed) | 9.52 | 10.2 | 9.80 | 9.95 | 10.0 | 9.93 | 8.93 | 6.71 | 9.35 | 9.71 | 10.5 | 9.51 |
| gliner-bi-large-v2.0 | 3.52 | 2.53 | 3.87 | 3.50 | 3.66 | 3.19 | 1.90 | 2.46 | 2.39 | 1.62 | 0.87 | 2.68 |
| gliner-bi-large-v2.0 (pre-computed) | 4.37 | 4.07 | 4.53 | 4.54 | 4.47 | 3.46 | 3.85 | 3.04 | 2.82 | 1.84 | 2.64 | 3.60 |
| gliner_small-v2.5 (uni-encoder) | 10.7 | 14.6 | 14.1 | 13.2 | 11.9 | 10.3 | 7.91 | 4.26 | 1.29 | 0.43 | 0.14 | 8.08 |
| gliner_medium-v2.5 (uni-encoder) | 7.81 | 8.51 | 8.39 | 7.58 | 7.12 | 5.62 | 4.18 | 2.19 | 0.68 | 0.23 | 0.07 | 4.76 |
| gliner_large-v2.5 (uni-encoder) | 2.89 | 3.28 | 3.29 | 2.90 | 2.61 | 2.33 | 1.71 | 1.12 | 0.31 | 0.09 | 0.03 | 1.87 |
Key Insight: Bi-encoder with pre-computed embeddings maintains near-constant speed (5.2% degradation from 1→1024 labels) while uni-encoder shows 98.7% degradation.
Use Cases
Biomedical Entity Linking
Process millions of documents against UMLS (4M+ concepts), SNOMED CT, or other large medical ontologies with pre-computed embeddings.
Enterprise Knowledge Extraction
Deploy dynamic taxonomies that evolve without model retraining. Add new entity types instantly by computing their embeddings.
Scientific Literature Mining
Extract entities across multiple specialized domains (chemistry, biology, physics) with domain-specific label encoders.
Entity Linking with GLiNKER
GLiNER-bi-Encoder extends naturally to entity linking through the GLiNKER framework—a modular DAG-based pipeline for:
- Mention extraction with GLiNER
- Candidate retrieval from knowledge bases via pre-computed embeddings
- Entity disambiguation using bi-encoder scoring
Learn more: GLiNKER Repository
Model Details
Training Data
- Pre-training: 8M samples (Large/Base/Small), 10M samples (Edge) from FineFineWeb, annotated with GPT-4o
- Post-training: 40K high-quality samples with sequences up to 2048 tokens for long-context refinement
Training Configuration
- Focal Loss: α=0.7 (pre-training), α=0.8 (post-training), γ=2.0
- Optimizer: AdamW with differential learning rates (encoder: 1e-5, other: 3e-5)
- Context Length: 1024 tokens (pre-training), 2048 tokens (post-training)
- Maximum Span Width: 12 tokens
- Dropout: 0.35
Citation
If you use GLiNER-bi-Encoder in your research, please cite:
@misc{stepanov2024glinermultitask,
title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks},
author={Ihor Stepanov and Mykhailo Shtopko},
year={2024},
eprint={2406.12925},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Acknowledgments
We sincerely thank Urchade Zaratiana (creator of GLiNER) and Tom Aarsen (maintainer of Sentence Transformers) for their foundational work.
Join Our Community
Connect with our community on Discord for news, support, and discussions: Join Discord
Resources
- Paper: arXiv preprint (coming soon)
- GLiNKER Framework: GLiNKER
- Model Collection: HuggingFace Collection
Knowledgator Engineering © 2026
- Downloads last month
- 17
Model tree for knowledgator/gliner-bi-base-v2.0
Base model
BAAI/bge-base-en-v1.5