P0L3's picture
Update README.md
984ac9a verified
metadata
language: en
license: cc-by-sa-4.0
tags:
  - span-marker
  - token-classification
  - ner
  - named-entity-recognition
  - generated_from_span_marker_trainer
  - climate-change
  - earth-science
widget:
  - text: >-
      While a significant positive impact of solid-state cultivation using white
      rot fungi on enzymatic digestibility was reported in some studies [ 68 ,
      69 ] , a negative effect of fungal pretreatment on enzymatic hydrolysis
      was noted by investigators like Shi et al . ( 2009 ) [ 33 ] , who reported
      a glucose yield of 55 . 6 mg g − 1 of cotton stalks pretreated with P .
      chrysosporium , which was approximately 17 % lower than the yield of
      untreated cotton stalks after enzymatic hydrolysis in spite of significant
      lignin degradation .
  - text: >-
      We quantify changes in the properties and amount of bottom water entering
      the basin by combining repeat hydrographic observations , direct velocity
      measurements and flow structure derived from a 0 . 1 ° global ocean
      sea-ice model that realistically simulates AABW formation sites and export
      pathways .
  - text: >-
      The impact of these differences on cloud forcing can be signi or more .
      cant and as high as 30 W m In recent years , observations from satellite
      data have been revised considerably after significant development efforts
      , especially after utilizing new high-quality reference measurements from
      active sensors in space , and some datasets have also improved polar cloud
      detection .
  - text: >-
      If the response is significant , how does the solar forcing impact the
      EASM rainfall variability ? In this study , we will address these
      questions based on the simulation results derived from one AD 850 control
      experiment ( CTRL ) and four solar-only forcing experiments [ spectral
      solar irradiance ( SSI ) experiments ] , which were conducted by the
      Community Earth System ( CESM-LME ) Model – Last Millennium Ensemble
      modeling project ( Otto-Bliesner et al . 2016 ) .
  - text: >-
      Measurements from single moorings at each gateway reveal that the speed of
      bottom water flow into the Australian Antarctic Basin varies with location
      , season and density ( Fig . 3a , c , e ) .
pipeline_tag: token-classification
library_name: span-marker
metrics:
  - precision
  - recall
  - f1
datasets:
  - P0L3/CliReNER_v_1_1_28_SILVER
  - P0L3/CliReNER_v_1_1_28_GOLD
base_model: FacebookAI/roberta-base
model-index:
  - name: SpanMarker with FacebookAI/roberta-base
    results:
      - task:
          type: token-classification
          name: Named Entity Recognition
        dataset:
          name: CliReNER_silver
          type: P0L3/CliReNER_v_1_1_28_SILVER
          split: eval
        metrics:
          - type: f1
            value: 0.6300366300366301
            name: F1
          - type: precision
            value: 0.6437125748502994
            name: Precision
          - type: recall
            value: 0.6169296987087518
            name: Recall

SpanMarker-RoBERTa for Climate Research NER

This model is a SpanMarker model fine-tuned for fine-grained Named Entity Recognition (NER) in the climate change research domain, extracting 28 distinct entity types. It usesFacebookAI/roberta-base as the underlying encoder.

📌 Model Details

  • Model Type: SpanMarker
  • Encoder:FacebookAI/roberta-base
  • Maximum Sequence Length: 512 tokens
  • Maximum Entity Length: 14 words
  • Language: English
  • License: cc-by-sa-4.0

Model Labels

Label Examples
Asset "mental health", "water resources", "raw material"
Body Part "leaves", "plant leaves", "deep tissue compartment"
Body of Water "Dhaleshwari river", "rivers", "peripheral rivers"
Chemical "domoic acid", "cathode materials", "marine algal toxin"
Disease "seizures", "acute neurologic signs", "chronic epileptic syndrome"
Ecosystem "cloud forests", "polluted environment", "Tropical montane cloud forest"
Energy Source "12-cell series battery-pack prototype", "fossil fuels", "battery cells"
Field of Study "veterinary medicine", "reference laboratory", "study"
Geographical Feature "heterogenous topography", "mountainous regions", "low point"
Intellectual Artefact "Daily husbandry records", "data", "Veterinary medical records"
Location "wild", "Westbrook", "beaches"
Mathematical Expression "gradient", "Stepwise machine hour constraints", "difference"
Measuring Device "station", "EEG", "MRI scan"
Meteorological Phenomenon "rainfall", "climate change", "climatic variability"
Method "dosing", "serum monitoring", "clinical efficacy"
Natural Disaster "heavy metal contamination", "seasonal air pollution", "environmental pollution"
Natural Phenomenon "algal blooms", "biochemical changes", "changing ocean conditions"
Organism "Zalophus californianus", "California sea lions", "species"
Organization "reference laboratory", "long-term care facility", "NOAA National Marine Fisheries Service"
Other "marine mammal health", "normal eating", "reports"
Person "staff", "clinicians", "Clinicians"
Physical Artefact "electric vehicle", "paved east – west road", "EVs"
Physical Phenomenon "normal food intake", "structural abnormalities", "seasonal changes"
Policy "energy security", "safety", "pollution"
Quantity "200 mAhg − 1", ">", "energy density"
Satellite "TRMM", "Tropical Rainfall Measuring Mission", "satellites"
System "global overturning circulation", "system structure", "climate"
Time Period "periods of prolonged anorexia", "101 days", "several decades"

🚀 Main Results (Selected Checkpoint)

This repository provides the best-performing checkpoint selected from 5 runs with different random seeds. While the internal training logs tracked performance on the validation split of CliReNERsilver, the final model selection and the metrics below are evaluated on the independent, expert-annotated CliReNERgold dataset.

Metric Score
Precision 55.33
Recall 49.18
F1 52.08

This checkpoint corresponds to the seed with the highest strict F1 on the gold evaluation set (Seed 4 - 33).


📊 Results Across Seeds

We fine-tuned the model using 5 different random seeds to assess the stability and robustness of the architecture on the domain-specific text.

Seed Precision Recall Strict F1
1 55.39 48.69 51.83
2 58.32 44.12 50.23
3 54.80 45.92 49.97
4 55.33 49.18 52.08
5 51.19 43.95 47.30

Summary:

  • F1: mean = 50.28, std = 1.91
  • Precision: mean = 55.01, std = 2.54
  • Recall: mean = 46.37, std = 2.47

Model Selection Strategy: The uploaded checkpoint is the single best seed (highest strict F1 on the gold dataset), ensuring strong real-world performance and high-fidelity alignment with domain-expert consensus.


📂 Dataset & Evaluation

  • Training Dataset:CliReNERsilver
    • Splits used: Stratified 80:10:10 ratio (Train/Validation/Test). The 80% split was used for training.
  • Evaluation Dataset: CliReNERgold
    • Splits used: Evaluated on the combined 192 sentences (expert-annotated via Weighted Expert Voting).
  • Preprocessing:
    • Texts were tokenized using the standard RoBERTa tokenizer.
    • The dataset utilizes a flat NER schema (nested entities are excluded, and overlapping entities are resolved to the most relevant span).
  • Metric Details:
    • F1 type: Strict F1 (Entity-level exact match).
    • Evaluation was performed ensuring entities match both the exact boundary span and the exact semantic label to be considered correct.

⚖️ Precision vs Recall Behavior

(Note to author: Describe the model’s tendency here based on your results. Example: "The model slightly favors recall over precision" or "Balanced precision and recall")


⚙️ Usage

Direct Use for Inference

Because this model was trained using the SpanMarker framework, it requires the span_marker library for inference.

pip install span_marker
from span_marker import SpanMarkerModel

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("P0L3/CliReNER-roberta-base")

# Run inference
text = "Anthropogenic climate change is fundamentally altering weather patterns and climate extremes, causing widespread adverse impacts to both nature and human systems (IPCC 2023)."
entities = model.predict(text)

for entity in entities:
    print(f"Entity: {entity['span']} | Label: {entity['label']} | Score: {entity['score']:.4f}")

# Entity: climate change | Label: Meteorological Phenomenon | Score: 0.4065
# Entity: weather patterns | Label: Meteorological Phenomenon | Score: 0.6808
# Entity: climate extremes | Label: Meteorological Phenomenon | Score: 0.7115
# Entity: nature | Label: Other | Score: 0.4608
# Entity: human systems | Label: System | Score: 0.6562
# Entity: IPCC 2023 | Label: Other | Score: 0.4812

Downstream Use

You can easily continue fine-tuning this model on your own dataset.

Click to expand
from span_marker import SpanMarkerModel, Trainer
from datasets import load_dataset

# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("your-huggingface-username/your-model-name")

# Specify a Dataset with "tokens" and "ner_tags" columns
dataset = load_dataset("your_custom_dataset")

# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
    model=model,
    train_dataset=dataset["train"],
    eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("span_marker_model_id-finetuned")

📉 Training Details

Training Set Metrics

Training set Min Median Max
Sentence length 3 31.4819 97
Entities per sentence 1 7.0100 22

Training Hyperparameters

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 33
  • gradient_accumulation_steps: 2
  • total_train_batch_size: 16
  • optimizer: adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 20

Training Results (CliReNERsilver Validation Split)

Epoch Step Validation Loss Validation Precision Validation Recall Validation F1 Validation Accuracy
1.0 62 0.1324 0.0 0.0 0.0 0.6075
2.0 124 0.0839 0.3333 0.0273 0.0504 0.6166
3.0 186 0.0530 0.5845 0.4218 0.4900 0.7807
4.0 248 0.0460 0.6913 0.4433 0.5402 0.7971
5.0 310 0.0488 0.5965 0.6298 0.6127 0.8307
6.0 372 0.0447 0.6532 0.6026 0.6269 0.8340
7.0 434 0.0466 0.6365 0.6356 0.6360 0.8486
8.0 496 0.0522 0.6388 0.6370 0.6379 0.8468
9.0 558 0.0520 0.6437 0.6169 0.6300 0.8428

Framework Versions

  • Python: 3.10.19
  • SpanMarker: 1.7.0
  • Transformers: 4.50.0
  • PyTorch: 2.9.1+cu126
  • Datasets: 3.0.0
  • Tokenizers: 0.21.4

📚 Citation

If you use this model or the CliReNER datasets in your research, please cite the project:

@misc{poleksic2026named,
  author       = {Poleksić, Andrija and Martinčić-Ipšić, Sanda},
  title        = {Named Entity Recognition for Climate Change Research},
  year         = {2026},
  howpublished = {Research Square},
  note         = {Preprint}
}

Please also acknowledge the SpanMarker framework:

@software{Aarsen_SpanMarker,
    author = {Aarsen, Tom},
    license = {Apache-2.0},
    title = {{SpanMarker for Named Entity Recognition}},
    url = {https://github.com/tomaarsen/SpanMarkerNER}
}