GLiNER-bi-Encoder: Scalable Zero-Shot Named Entity Recognition

image

About

GLiNER-bi-Encoder is a novel architecture for Named Entity Recognition (NER) that combines zero-shot flexibility with industrial-scale efficiency. Unlike the original GLiNER, which uses joint encoding, the bi-encoder design decouples text and entity-type encoding, enabling the recognition of thousands of entity types simultaneously with minimal computational overhead.

Key Advantages

Massive Scalability: Handle 1000+ entity types with near-constant inference speed when using pre-computed label embeddings

130× Faster: Up to 130× throughput improvement compared to uni-encoder approaches at 1024 entity types

State-of-the-Art Performance: Achieves 61.5% Micro-F1 on CrossNER benchmark in zero-shot setting

Efficient Caching: Pre-compute and cache entity type embeddings for instant reuse across millions of documents

Architecture

The bi-encoder architecture employs two specialized, independent transformers:

  • Text Encoder: Processes input sequences using ModernBERT-based encoders (Ettin family)
  • Label Encoder: Embeds entity type descriptions using specialized sentence transformers (BGE, MiniLM)

This separation removes the context-window bottleneck and enables:

  • Pre-computation of entity type embeddings
  • Constant memory usage for text encoding regardless of entity count
  • Efficient nearest-neighbor search for entity matching

Model Variants

GLiNER-bi-V2 Models:

Model name Params Text Encoder Label Encoder Avg. CrossNER Inference Speed (H100, ex/s) Inference Speed (pre-computed)
gliner-bi-edge-v2.0 60 M ettin-encoder-32m all-MiniLM-L6-v2 54.0% 13.64 24.62
gliner-bi-small-v2.0 108 M ettin-encoder-68m all-MiniLM-L12-v2 57.2% 7.99 15.22
gliner-bi-base-v2.0 194 M ettin-encoder-150m bge-small-en-v1.5 60.3% 5.91 9.51
gliner-bi-large-v2.0 530 M ettin-encoder-400m bge-base-en-v1.5 61.5% 2.68 3.60

Recommendation: The base variant (194M) achieves 98% of large model performance while operating 2.6× faster, making it optimal for most production scenarios.

Installation & Usage

Installation

pip install gliner -U
pip install transformers>=4.48.0

For flash attention support:

pip install flash-attn triton

Basic Usage

from gliner import GLiNER

# Load model
model = GLiNER.from_pretrained("knowledgator/gliner-bi-base-v2.0")

text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards, a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player.
"""

labels = ["person", "award", "date", "competitions", "teams"]

entities = model.predict_entities(text, labels, threshold=0.3)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

Output:

Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Al Nassr => teams
Portugal national team => teams
Ballon d'Or => award
UEFA Men's Player of the Year Awards => award
European Golden Shoes => award

Advanced Usage: Pre-computing Entity Embeddings

For scenarios with large, static entity taxonomies (hundreds to millions of types):

from gliner import GLiNER

model = GLiNER.from_pretrained("knowledgator/gliner-bi-base-v2.0")

# Pre-compute embeddings for thousands of entity types
entity_types = ["person", "organization", "location", ...] # Can be thousands
texts = ["Your documents here", ...]

# Encode entity types once
entity_embeddings = model.encode_labels(entity_types, batch_size=8)

# Use pre-computed embeddings for fast inference
outputs = model.batch_predict_with_embeds(texts, entity_embeddings, entity_types)

This approach provides:

  • 130× speedup at 1024 entity types
  • Constant inference time regardless of entity count
  • Efficient caching for repeated use

Flash Attention & Extended Context

model = GLiNER.from_pretrained(
    "knowledgator/gliner-bi-base-v2.0",
    _attn_implementation='flash_attention_2',
    max_len=2048
).to('cuda:0')

Zero-Shot NER Performance

Comprehensive evaluation across 19 diverse NER datasets:

Dataset gliner-bi-edge-v2.0 gliner-bi-small-v2.0 gliner-bi-base-v2.0 gliner-bi-large-v2.0
ACE 2004 26.4% 27.5% 28.9% 31.9%
ACE 2005 26.2% 28.1% 30.0% 31.4%
AnatEM 39.1% 43.6% 35.4% 39.5%
Broad Tweet Corpus 70.0% 71.7% 72.1% 70.9%
CoNLL 2003 61.6% 64.2% 65.6% 66.5%
FabNER 22.4% 23.2% 24.3% 22.7%
FindVehicle 35.6% 40.3% 40.6% 39.1%
GENIA_NER 50.1% 53.8% 56.8% 60.1%
HarveyNER 15.0% 10.6% 12.6% 14.7%
MultiNERD 64.6% 66.0% 68.0% 64.0%
Ontonotes 31.4% 31.9% 33.3% 32.5%
PolyglotNER 45.1% 46.3% 46.6% 46.8%
TweetNER7 36.9% 40.9% 40.4% 41.7%
WikiANN en 52.3% 54.0% 54.9% 56.3%
WikiNeural 78.0% 79.9% 80.0% 76.6%
bc2gm 58.1% 59.9% 62.7% 61.4%
bc4chemd 45.8% 49.1% 53.6% 50.5%
bc5cdr 68.5% 71.5% 73.0% 71.7%
ncbi 65.9% 65.4% 65.2% 65.9%
Average 47.0% 48.8% 49.7% 49.7%

CrossNER Zero-Shot Benchmark

Dataset gliner-bi-edge-v2.0 gliner-bi-small-v2.0 gliner-bi-base-v2.0 gliner-bi-large-v2.0
CrossNER_AI 53.8% 54.7% 58.3% 57.4%
CrossNER_literature 56.2% 62.6% 65.2% 63.2%
CrossNER_music 68.2% 72.3% 73.4% 74.0%
CrossNER_politics 68.7% 70.0% 70.8% 73.0%
CrossNER_science 63.2% 66.1% 68.0% 67.6%
mit-movie 30.5% 35.2% 46.2% 51.0%
mit-restaurant 37.1% 39.5% 40.3% 44.3%
Average (Zero-Shot Benchmark) 54.0% 57.2% 60.3% 61.5%

Inference Speed Comparison

Throughput (examples/second) by number of entity types on H100 GPU (batch_size=1):

Model 1 2 4 8 16 32 64 128 256 512 1024 Avg
gliner-bi-edge-v2.0 17.0 27.0 5.05 22.4 17.5 13.9 15.2 12.5 10.8 5.43 3.23 13.64
gliner-bi-edge-v2.0 (pre-computed) 19.3 25.0 28.2 32.6 31.0 32.6 22.2 22.7 22.2 16.9 18.3 24.62
gliner-bi-small-v2.0 12.5 12.8 5.98 11.6 10.6 9.43 6.94 7.35 5.74 3.33 1.60 7.99
gliner-bi-small-v2.0 (pre-computed) 14.7 15.9 14.3 15.3 15.4 15.4 15.6 15.3 15.5 15.7 14.3 15.22
gliner-bi-base-v2.0 8.13 8.62 4.85 8.00 7.52 6.76 5.71 5.21 4.64 3.21 2.30 5.91
gliner-bi-base-v2.0 (pre-computed) 9.52 10.2 9.80 9.95 10.0 9.93 8.93 6.71 9.35 9.71 10.5 9.51
gliner-bi-large-v2.0 3.52 2.53 3.87 3.50 3.66 3.19 1.90 2.46 2.39 1.62 0.87 2.68
gliner-bi-large-v2.0 (pre-computed) 4.37 4.07 4.53 4.54 4.47 3.46 3.85 3.04 2.82 1.84 2.64 3.60
gliner_small-v2.5 (uni-encoder) 10.7 14.6 14.1 13.2 11.9 10.3 7.91 4.26 1.29 0.43 0.14 8.08
gliner_medium-v2.5 (uni-encoder) 7.81 8.51 8.39 7.58 7.12 5.62 4.18 2.19 0.68 0.23 0.07 4.76
gliner_large-v2.5 (uni-encoder) 2.89 3.28 3.29 2.90 2.61 2.33 1.71 1.12 0.31 0.09 0.03 1.87

Key Insight: Bi-encoder with pre-computed embeddings maintains near-constant speed (5.2% degradation from 1→1024 labels) while uni-encoder shows 98.7% degradation.

Use Cases

Biomedical Entity Linking

Process millions of documents against UMLS (4M+ concepts), SNOMED CT, or other large medical ontologies with pre-computed embeddings.

Enterprise Knowledge Extraction

Deploy dynamic taxonomies that evolve without model retraining. Add new entity types instantly by computing their embeddings.

Scientific Literature Mining

Extract entities across multiple specialized domains (chemistry, biology, physics) with domain-specific label encoders.

Entity Linking with GLiNKER

GLiNER-bi-Encoder extends naturally to entity linking through the GLiNKER framework—a modular DAG-based pipeline for:

  • Mention extraction with GLiNER
  • Candidate retrieval from knowledge bases via pre-computed embeddings
  • Entity disambiguation using bi-encoder scoring

Learn more: GLiNKER Repository

Model Details

Training Data

  • Pre-training: 8M samples (Large/Base/Small), 10M samples (Edge) from FineFineWeb, annotated with GPT-4o
  • Post-training: 40K high-quality samples with sequences up to 2048 tokens for long-context refinement

Training Configuration

  • Focal Loss: α=0.7 (pre-training), α=0.8 (post-training), γ=2.0
  • Optimizer: AdamW with differential learning rates (encoder: 1e-5, other: 3e-5)
  • Context Length: 1024 tokens (pre-training), 2048 tokens (post-training)
  • Maximum Span Width: 12 tokens
  • Dropout: 0.35

Citation

If you use GLiNER-bi-Encoder in your research, please cite:

@misc{stepanov2024glinermultitask,
      title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks}, 
      author={Ihor Stepanov and Mykhailo Shtopko},
      year={2024},
      eprint={2406.12925},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Acknowledgments

We sincerely thank Urchade Zaratiana (creator of GLiNER) and Tom Aarsen (maintainer of Sentence Transformers) for their foundational work.

Join Our Community

Connect with our community on Discord for news, support, and discussions: Join Discord

Resources


Knowledgator Engineering © 2026

Downloads last month
17
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for knowledgator/gliner-bi-base-v2.0

Finetuned
(441)
this model

Collection including knowledgator/gliner-bi-base-v2.0

Paper for knowledgator/gliner-bi-base-v2.0