GLiNER-Medium for Climate Research NER

This model is a GLiNER model fine-tuned for fine-grained Named Entity Recognition (NER) in the climate change research domain. It identifies 28 distinct entity types using a generalist span-based architecture.

📌 Model Details

  • Model Type: GLiNER
  • Base Model: gliner-community/gliner_medium-v2.5
  • Architecture: Bi-encoder architecture with a focal loss objective.
  • Language: English
  • License: cc-by-sa-4.0

Entity Typology (28 Classes)

The model is trained to recognize: Asset, Body Part, Body of Water, Chemical, Disease, Ecosystem, Energy Source, Field of Study, Geographical Feature, Intellectual Artefact, Location, Mathematical Expression, Measuring Device, Meteorological Phenomenon, Method, Natural Disaster, Natural Phenomenon, Organism, Organization, Other, Person, Physical Artefact, Physical Phenomenon, Policy, Quantity, Satellite, System, and Time Period.


🚀 Main Results (Selected Checkpoint)

This repository provides the best-performing checkpoint selected from 5 runs with different random seeds. While the internal training logs tracked performance on the validation split of CliReNERsilver, the final model selection and the metrics below are evaluated on the independent, expert-annotated CliReNERgold dataset.

Metric Score
Precision 61.78
Recall 62.54
F1 62.16

This checkpoint corresponds to the seed with the highest strict F1 on the gold evaluation set (Seed 3).


📊 Results Across Seeds

We fine-tuned the model using 5 different random seeds to assess the stability and robustness of the architecture on the domain-specific text.

Seed Precision Recall Strict F1
1 61.06 61.68 61.37
2 61.63 62.21 61.92
3 61.78 62.54 62.16
4 61.27 62.05 61.66
5 60.90 62.09 61.49

Summary:

  • F1: mean = 61.72, std = 0.32
  • Precision: mean = 61.33, std = 0.37
  • Recall: mean = 62.12, std = 0.31

📂 Dataset & Evaluation

  • Training Dataset: CliReNERsilver (80% split).
  • Evaluation Dataset: CliReNERgold (expert consensus via Weighted Expert Voting).
  • Metric Details: Strict F1 (Entity-level exact span and label match).

⚙️ Usage

Direct Use for Inference

Requires the gliner library:

pip install gliner
from gliner import GLiNER

# Load model
model = GLiNER.from_pretrained(
    "P0L3/CliReNER-gliner_medium-v2.5",
    load_tokenizer=True
)

# Input text
text = """
In recent years, climate NLP has seen its nascent with the introduction of domain-specific models such as ClimateBERT (Webersinke et al. 2022), ClimateGPT (Thulke et al. 2024), and CliReBERT (Poleksić and Martinčić-Ipšić 2025), alongside related efforts (Bhattacharjee et al. 2024; Schimanski et al. 2024)
"""

# Labels
labels = [
    "Ecosystem", "Energy Source", "Natural Disaster", 
    "Meteorological Phenomenon", "Quantity", "Intellectual Artefact", 
    "Body of Water", "Disease", "Location", 
    "Physical Phenomenon", "Chemical", "Time Period", 
    "Organization", "Natural Phenomenon", "Field of Study", 
    "Mathematical Expression", "Measuring Device", "Geographical Feature", 
    "System", "Satellite", "Organism", 
    "Method", "Other", "Person", 
    "Physical Artefact", "Body Part", "Asset",
    "Policy",
]

# Predict entities
entities = model.predict_entities(text, labels)

# Match SpanMarker-style output
for entity in entities:
    span = entity["text"]
    label = entity["label"]
    score = entity["score"]  # fallback if score not present

    print(f"Entity: {span} | Label: {label} | Score: {score:.4f}")

# Entity: recent years | Label: Time Period | Score: 0.8569
# Entity: climate NLP | Label: Method | Score: 0.6270
# Entity: domain-specific models | Label: Method | Score: 0.7560
# Entity: ClimateBERT | Label: Method | Score: 0.8053
# Entity: Webersinke et al. | Label: Person | Score: 0.7729
# Entity: 2022 | Label: Time Period | Score: 0.8569
# Entity: ClimateGPT | Label: Method | Score: 0.7935
# Entity: Thulke et al. | Label: Person | Score: 0.7884
# Entity: 2024 | Label: Time Period | Score: 0.8953
# Entity: CliReBERT | Label: Method | Score: 0.8261
# Entity: Poleksić and Martinčić-Ipšić | Label: Person | Score: 0.8301
# Entity: 2025 | Label: Time Period | Score: 0.9154
# Entity: related efforts | Label: Other | Score: 0.5821
# Entity: Bhattacharjee et al. | Label: Person | Score: 0.7508
# Entity: 2024 | Label: Time Period | Score: 0.8572
# Entity: Schimanski et al. | Label: Person | Score: 0.7806
# Entity: 2024 | Label: Time Period | Score: 0.9111

📉 Training Hyperparameters

  • Learning Rate: 5e-6
  • Seed: 3012
  • Encoder Learning Rate: 1e-5
  • Focal Loss: α=0.75, γ=2
  • Batch Size: 8 (Gradient Accumulation: 2)
  • Epochs: 20
  • Warmup Ratio: 0.1
  • Optimizer: AdamW

📚 Citation

@misc{poleksic2026named,
  author       = {Poleksić, Andrija and Martinčić-Ipšić, Sanda},
  title        = {Named Entity Recognition for Climate Change Research},
  year         = {2026},
  howpublished = {Research Square},
  note         = {Preprint}
}
Downloads last month
10
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for P0L3/CliReNER-gliner_medium-v2.5

Finetuned
(2)
this model

Datasets used to train P0L3/CliReNER-gliner_medium-v2.5

Collection including P0L3/CliReNER-gliner_medium-v2.5