GLiNER Multitask v1.0

A token-level GLiNER model fine-tuned from knowledgator/gliner-multitask-v1.0 on the DeBERTa-v2-xlarge backbone.

Model Details

Property Value
Backbone microsoft/deberta-v2-xlarge
Hidden size 1536 (encoder) / 1024 (GLiNER head)
Num layers 24
Attention heads 24
Span mode token_level
Max sequence length 1024
Max entity types per sample 30
Max span width 12 tokens
Subtoken pooling first
Precision bf16
Model size ~3.5 GB

Architecture

This model uses the GLiNER architecture with:

  • DeBERTa-v2-xlarge as the text encoder with relative position attention
  • Token-level span representation (not span-level) for entity extraction
  • RNN post-fusion layer (1 layer) for contextual refinement
  • Embedded entity tokens (<<ENT>>, <<SEP>>) with a custom vocabulary size of 128,003
  • Focal loss (alpha=0.75, gamma=0) for handling class imbalance
  • Global masking strategy for negative sampling

Training Configuration

Parameter Value
Base model knowledgator/gliner-multitask-v1.0
Optimizer AdamW (beta1=0.9, beta2=0.999, eps=1e-8)
Encoder LR 9e-6
Other LR 5e-5 / 7e-6
Scheduler Cosine with linear warmup (10%)
Batch size 40 per device
Gradient accumulation 1
Max steps 1,000 (150,000 total planned)
Weight decay 0.01 (encoder) / 0.001 (other)
Max grad norm 1.0
Gradient checkpointing Enabled
Dropout 0.35
Seed 42

Evaluation Status

Evaluated. Mean F1 = 0.280 across 9 benchmark datasets.

Evaluated across 9 benchmark datasets.

Full evaluation results available at arthrod/gliner_review_comparison.

Usage

from gliner import GLiNER

model = GLiNER.from_pretrained("arthrod/gliner-multitask-v1.0")

text = "Apple Inc. was founded by Steve Jobs in Cupertino, California."
labels = ["company", "person", "location"]

entities = model.predict_entities(text, labels, threshold=0.5)

for entity in entities:
    print(f"{entity['text']} => {entity['label']} (score: {entity['score']:.2f})")

Intended Use

This model is designed for zero-shot and few-shot Named Entity Recognition across arbitrary entity types. It can extract entities from text without requiring fine-tuning for specific entity categories.

Limitations

  • Performance depends on the quality and specificity of the entity type labels provided
  • Maximum input length is 1024 tokens
  • Span width is limited to 12 tokens
  • Primarily trained on English text

Citation

If you use this model, please cite the original GLiNER paper:

@article{zaratiana2023gliner,
  title={GLiNER: Generalist Model for Named Entity Recognition using Bidirectional Transformer},
  author={Zaratiana, Urchade and Nzeyimana, Nabil and Holat, Pierre and Py, Olivier},
  journal={arXiv preprint arXiv:2311.08526},
  year={2023}
}
Downloads last month
88
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for arthrod/gliner-multitask-v1.0