GLiNER-bi-Encoder: Scalable Zero-Shot Named Entity Recognition

About

GLiNER-bi-Encoder is a novel architecture for Named Entity Recognition (NER) that combines zero-shot flexibility with industrial-scale efficiency. Unlike the original GLiNER, which uses joint encoding, the bi-encoder design decouples text and entity-type encoding, enabling the recognition of thousands of entity types simultaneously with minimal computational overhead.

Key Advantages

Massive Scalability: Handle 1000+ entity types with near-constant inference speed when using pre-computed label embeddings

130× Faster: Up to 130× throughput improvement compared to uni-encoder approaches at 1024 entity types

State-of-the-Art Performance: Achieves 61.5% Micro-F1 on CrossNER benchmark in zero-shot setting

Efficient Caching: Pre-compute and cache entity type embeddings for instant reuse across millions of documents

Architecture

The bi-encoder architecture employs two specialized, independent transformers:

Text Encoder: Processes input sequences using ModernBERT-based encoders (Ettin family)
Label Encoder: Embeds entity type descriptions using specialized sentence transformers (BGE, MiniLM)

This separation removes the context-window bottleneck and enables:

Pre-computation of entity type embeddings
Constant memory usage for text encoding regardless of entity count
Efficient nearest-neighbor search for entity matching

Model Variants

GLiNER-bi-V2 Models:

Model name	Params	Text Encoder	Label Encoder	Avg. CrossNER	Inference Speed (H100, ex/s)	Inference Speed (pre-computed)
gliner-bi-edge-v2.0	60 M	ettin-encoder-32m	all-MiniLM-L6-v2	54.0%	13.64	24.62
gliner-bi-small-v2.0	108 M	ettin-encoder-68m	all-MiniLM-L12-v2	57.2%	7.99	15.22
gliner-bi-base-v2.0	194 M	ettin-encoder-150m	bge-small-en-v1.5	60.3%	5.91	9.51
gliner-bi-large-v2.0	530 M	ettin-encoder-400m	bge-base-en-v1.5	61.5%	2.68	3.60

Recommendation: The base variant (194M) achieves 98% of large model performance while operating 2.6× faster, making it optimal for most production scenarios.

Installation & Usage

Installation

pip install gliner -U
pip install transformers>=4.48.0

For flash attention support:

pip install flash-attn triton

Basic Usage

from gliner import GLiNER

# Load model
model = GLiNER.from_pretrained("knowledgator/gliner-bi-base-v2.0")

text = """
Cristiano Ronaldo dos Santos Aveiro (Portuguese pronunciation: [kɾiʃˈtjɐnu ʁɔˈnaldu]; born 5 February 1985) is a Portuguese professional footballer who plays as a forward for and captains both Saudi Pro League club Al Nassr and the Portugal national team. Widely regarded as one of the greatest players of all time, Ronaldo has won five Ballon d'Or awards, a record three UEFA Men's Player of the Year Awards, and four European Golden Shoes, the most by a European player.
"""

labels = ["person", "award", "date", "competitions", "teams"]

entities = model.predict_entities(text, labels, threshold=0.3)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

Output:

Cristiano Ronaldo dos Santos Aveiro => person
5 February 1985 => date
Al Nassr => teams
Portugal national team => teams
Ballon d'Or => award
UEFA Men's Player of the Year Awards => award
European Golden Shoes => award

Advanced Usage: Pre-computing Entity Embeddings

For scenarios with large, static entity taxonomies (hundreds to millions of types):

from gliner import GLiNER

model = GLiNER.from_pretrained("knowledgator/gliner-bi-base-v2.0")

# Pre-compute embeddings for thousands of entity types
entity_types = ["person", "organization", "location", ...] # Can be thousands
texts = ["Your documents here", ...]

# Encode entity types once
entity_embeddings = model.encode_labels(entity_types, batch_size=8)

# Use pre-computed embeddings for fast inference
outputs = model.batch_predict_with_embeds(texts, entity_embeddings, entity_types)

This approach provides:

130× speedup at 1024 entity types
Constant inference time regardless of entity count
Efficient caching for repeated use

Flash Attention & Extended Context

model = GLiNER.from_pretrained(
    "knowledgator/gliner-bi-base-v2.0",
    _attn_implementation='flash_attention_2',
    max_len=2048
).to('cuda:0')

Zero-Shot NER Performance

Comprehensive evaluation across 19 diverse NER datasets:

Dataset	gliner-bi-edge-v2.0	gliner-bi-small-v2.0	gliner-bi-base-v2.0	gliner-bi-large-v2.0
ACE 2004	26.4%	27.5%	28.9%	31.9%
ACE 2005	26.2%	28.1%	30.0%	31.4%
AnatEM	39.1%	43.6%	35.4%	39.5%
Broad Tweet Corpus	70.0%	71.7%	72.1%	70.9%
CoNLL 2003	61.6%	64.2%	65.6%	66.5%
FabNER	22.4%	23.2%	24.3%	22.7%
FindVehicle	35.6%	40.3%	40.6%	39.1%
GENIA_NER	50.1%	53.8%	56.8%	60.1%
HarveyNER	15.0%	10.6%	12.6%	14.7%
MultiNERD	64.6%	66.0%	68.0%	64.0%
Ontonotes	31.4%	31.9%	33.3%	32.5%
PolyglotNER	45.1%	46.3%	46.6%	46.8%
TweetNER7	36.9%	40.9%	40.4%	41.7%
WikiANN en	52.3%	54.0%	54.9%	56.3%
WikiNeural	78.0%	79.9%	80.0%	76.6%
bc2gm	58.1%	59.9%	62.7%	61.4%
bc4chemd	45.8%	49.1%	53.6%	50.5%
bc5cdr	68.5%	71.5%	73.0%	71.7%
ncbi	65.9%	65.4%	65.2%	65.9%
Average	47.0%	48.8%	49.7%	49.7%

CrossNER Zero-Shot Benchmark

Dataset	gliner-bi-edge-v2.0	gliner-bi-small-v2.0	gliner-bi-base-v2.0	gliner-bi-large-v2.0
CrossNER_AI	53.8%	54.7%	58.3%	57.4%
CrossNER_literature	56.2%	62.6%	65.2%	63.2%
CrossNER_music	68.2%	72.3%	73.4%	74.0%
CrossNER_politics	68.7%	70.0%	70.8%	73.0%
CrossNER_science	63.2%	66.1%	68.0%	67.6%
mit-movie	30.5%	35.2%	46.2%	51.0%
mit-restaurant	37.1%	39.5%	40.3%	44.3%
Average (Zero-Shot Benchmark)	54.0%	57.2%	60.3%	61.5%

Inference Speed Comparison

Throughput (examples/second) by number of entity types on H100 GPU (batch_size=1):

Model	1	2	4	8	16	32	64	128	256	512	1024	Avg
gliner-bi-edge-v2.0	17.0	27.0	5.05	22.4	17.5	13.9	15.2	12.5	10.8	5.43	3.23	13.64
gliner-bi-edge-v2.0 (pre-computed)	19.3	25.0	28.2	32.6	31.0	32.6	22.2	22.7	22.2	16.9	18.3	24.62
gliner-bi-small-v2.0	12.5	12.8	5.98	11.6	10.6	9.43	6.94	7.35	5.74	3.33	1.60	7.99
gliner-bi-small-v2.0 (pre-computed)	14.7	15.9	14.3	15.3	15.4	15.4	15.6	15.3	15.5	15.7	14.3	15.22
gliner-bi-base-v2.0	8.13	8.62	4.85	8.00	7.52	6.76	5.71	5.21	4.64	3.21	2.30	5.91
gliner-bi-base-v2.0 (pre-computed)	9.52	10.2	9.80	9.95	10.0	9.93	8.93	6.71	9.35	9.71	10.5	9.51
gliner-bi-large-v2.0	3.52	2.53	3.87	3.50	3.66	3.19	1.90	2.46	2.39	1.62	0.87	2.68
gliner-bi-large-v2.0 (pre-computed)	4.37	4.07	4.53	4.54	4.47	3.46	3.85	3.04	2.82	1.84	2.64	3.60

gliner_small-v2.5 (uni-encoder)	10.7	14.6	14.1	13.2	11.9	10.3	7.91	4.26	1.29	0.43	0.14	8.08
gliner_medium-v2.5 (uni-encoder)	7.81	8.51	8.39	7.58	7.12	5.62	4.18	2.19	0.68	0.23	0.07	4.76
gliner_large-v2.5 (uni-encoder)	2.89	3.28	3.29	2.90	2.61	2.33	1.71	1.12	0.31	0.09	0.03	1.87

Key Insight: Bi-encoder with pre-computed embeddings maintains near-constant speed (5.2% degradation from 1→1024 labels) while uni-encoder shows 98.7% degradation.

Use Cases

Biomedical Entity Linking

Process millions of documents against UMLS (4M+ concepts), SNOMED CT, or other large medical ontologies with pre-computed embeddings.

Enterprise Knowledge Extraction

Deploy dynamic taxonomies that evolve without model retraining. Add new entity types instantly by computing their embeddings.

Scientific Literature Mining

Extract entities across multiple specialized domains (chemistry, biology, physics) with domain-specific label encoders.

Entity Linking with GLiNKER

GLiNER-bi-Encoder extends naturally to entity linking through the GLiNKER framework—a modular DAG-based pipeline for:

Mention extraction with GLiNER
Candidate retrieval from knowledge bases via pre-computed embeddings
Entity disambiguation using bi-encoder scoring

Learn more: GLiNKER Repository

Model Details

Training Data

Pre-training: 8M samples (Large/Base/Small), 10M samples (Edge) from FineFineWeb, annotated with GPT-4o
Post-training: 40K high-quality samples with sequences up to 2048 tokens for long-context refinement

Training Configuration

Focal Loss: α=0.7 (pre-training), α=0.8 (post-training), γ=2.0
Optimizer: AdamW with differential learning rates (encoder: 1e-5, other: 3e-5)
Context Length: 1024 tokens (pre-training), 2048 tokens (post-training)
Maximum Span Width: 12 tokens
Dropout: 0.35

Citation

If you use GLiNER-bi-Encoder in your research, please cite:

@misc{stepanov2024glinermultitask,
      title={GLiNER multi-task: Generalist Lightweight Model for Various Information Extraction Tasks}, 
      author={Ihor Stepanov and Mykhailo Shtopko},
      year={2024},
      eprint={2406.12925},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}