P0L3's picture
Update README.md
984ac9a verified
---
language: en
license: cc-by-sa-4.0
tags:
- span-marker
- token-classification
- ner
- named-entity-recognition
- generated_from_span_marker_trainer
- climate-change
- earth-science
widget:
- text: While a significant positive impact of solid-state cultivation using white
rot fungi on enzymatic digestibility was reported in some studies [ 68 , 69 ]
, a negative effect of fungal pretreatment on enzymatic hydrolysis was noted by
investigators like Shi et al . ( 2009 ) [ 33 ] , who reported a glucose yield
of 55 . 6 mg g 1 of cotton stalks pretreated with P . chrysosporium , which
was approximately 17 % lower than the yield of untreated cotton stalks after enzymatic
hydrolysis in spite of significant lignin degradation .
- text: We quantify changes in the properties and amount of bottom water entering
the basin by combining repeat hydrographic observations , direct velocity measurements
and flow structure derived from a 0 . 1 ° global ocean sea-ice model that realistically
simulates AABW formation sites and export pathways .
- text: The impact of these differences on cloud forcing can be signi or more . cant
and as high as 30 W m In recent years , observations from satellite data have
been revised considerably after significant development efforts , especially after
utilizing new high-quality reference measurements from active sensors in space
, and some datasets have also improved polar cloud detection .
- text: If the response is significant , how does the solar forcing impact the EASM
rainfall variability ? In this study , we will address these questions based on
the simulation results derived from one AD 850 control experiment ( CTRL ) and
four solar-only forcing experiments [ spectral solar irradiance ( SSI ) experiments
] , which were conducted by the Community Earth System ( CESM-LME ) Model Last
Millennium Ensemble modeling project ( Otto-Bliesner et al . 2016 ) .
- text: Measurements from single moorings at each gateway reveal that the speed of
bottom water flow into the Australian Antarctic Basin varies with location , season
and density ( Fig . 3a , c , e ) .
pipeline_tag: token-classification
library_name: span-marker
metrics:
- precision
- recall
- f1
datasets:
- P0L3/CliReNER_v_1_1_28_SILVER
- P0L3/CliReNER_v_1_1_28_GOLD
base_model: FacebookAI/roberta-base
model-index:
- name: SpanMarker with FacebookAI/roberta-base
results:
- task:
type: token-classification
name: Named Entity Recognition
dataset:
name: CliReNER_silver
type: P0L3/CliReNER_v_1_1_28_SILVER
split: eval
metrics:
- type: f1
value: 0.6300366300366301
name: F1
- type: precision
value: 0.6437125748502994
name: Precision
- type: recall
value: 0.6169296987087518
name: Recall
---
# SpanMarker-RoBERTa for Climate Research NER
This model is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model fine-tuned for fine-grained Named Entity Recognition (NER) in the climate change research domain, extracting 28 distinct entity types. It uses[FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) as the underlying encoder.
## 📌 Model Details
- **Model Type:** SpanMarker
- **Encoder:**[FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base)
- **Maximum Sequence Length:** 512 tokens
- **Maximum Entity Length:** 14 words
- **Language:** English
- **License:** cc-by-sa-4.0
### Model Labels
| Label | Examples |
|:--------------------------|:--------------------------------------------------------------------------------------------|
| Asset | "mental health", "water resources", "raw material" |
| Body Part | "leaves", "plant leaves", "deep tissue compartment" |
| Body of Water | "Dhaleshwari river", "rivers", "peripheral rivers" |
| Chemical | "domoic acid", "cathode materials", "marine algal toxin" |
| Disease | "seizures", "acute neurologic signs", "chronic epileptic syndrome" |
| Ecosystem | "cloud forests", "polluted environment", "Tropical montane cloud forest" |
| Energy Source | "12-cell series battery-pack prototype", "fossil fuels", "battery cells" |
| Field of Study | "veterinary medicine", "reference laboratory", "study" |
| Geographical Feature | "heterogenous topography", "mountainous regions", "low point" |
| Intellectual Artefact | "Daily husbandry records", "data", "Veterinary medical records" |
| Location | "wild", "Westbrook", "beaches" |
| Mathematical Expression | "gradient", "Stepwise machine hour constraints", "difference" |
| Measuring Device | "station", "EEG", "MRI scan" |
| Meteorological Phenomenon | "rainfall", "climate change", "climatic variability" |
| Method | "dosing", "serum monitoring", "clinical efficacy" |
| Natural Disaster | "heavy metal contamination", "seasonal air pollution", "environmental pollution" |
| Natural Phenomenon | "algal blooms", "biochemical changes", "changing ocean conditions" |
| Organism | "Zalophus californianus", "California sea lions", "species" |
| Organization | "reference laboratory", "long-term care facility", "NOAA National Marine Fisheries Service" |
| Other | "marine mammal health", "normal eating", "reports" |
| Person | "staff", "clinicians", "Clinicians" |
| Physical Artefact | "electric vehicle", "paved east – west road", "EVs" |
| Physical Phenomenon | "normal food intake", "structural abnormalities", "seasonal changes" |
| Policy | "energy security", "safety", "pollution" |
| Quantity | "200 mAhg − 1", ">", "energy density" |
| Satellite | "TRMM", "Tropical Rainfall Measuring Mission", "satellites" |
| System | "global overturning circulation", "system structure", "climate" |
| Time Period | "periods of prolonged anorexia", "101 days", "several decades" |
---
## 🚀 Main Results (Selected Checkpoint)
This repository provides the **best-performing checkpoint** selected from 5 runs with different random seeds. While the internal training logs tracked performance on the validation split of **CliReNER<sub>silver</sub>**, the final model selection and the metrics below are evaluated on the independent, expert-annotated **CliReNER<sub>gold</sub>** dataset.
| Metric | Score |
|------------|-------|
| Precision | 55.33 |
| Recall | 49.18 |
| F1 | 52.08 |
> This checkpoint corresponds to the **seed with the highest strict F1 on the gold evaluation set** (Seed 4 - 33).
---
## 📊 Results Across Seeds
We fine-tuned the model using 5 different random seeds to assess the stability and robustness of the architecture on the domain-specific text.
| Seed | Precision | Recall | Strict F1 |
|------|-----------|--------|-----------|
| 1 | 55.39 | 48.69 | 51.83 |
| 2 | 58.32 | 44.12 | 50.23 |
| 3 | 54.80 | 45.92 | 49.97 |
| 4 | 55.33 | 49.18 | 52.08 |
| 5 | 51.19 | 43.95 | 47.30 |
**Summary:**
- **F1:** mean = 50.28, std = 1.91
- **Precision:** mean = 55.01, std = 2.54
- **Recall:** mean = 46.37, std = 2.47
**Model Selection Strategy:**
The uploaded checkpoint is the **single best seed** (highest strict F1 on the gold dataset), ensuring strong real-world performance and high-fidelity alignment with domain-expert consensus.
---
## 📂 Dataset & Evaluation
- **Training Dataset:**[CliReNER<sub>silver</sub>](https://huggingface.co/datasets/P0L3/CliReNER_v_1_1_28_SILVER)
- **Splits used:** Stratified 80:10:10 ratio (Train/Validation/Test). The 80% split was used for training.
- **Evaluation Dataset:** [CliReNER<sub>gold</sub>](https://huggingface.co/datasets/P0L3/CliReNER_v_1_1_28_GOLD)
- **Splits used:** Evaluated on the combined 192 sentences (expert-annotated via Weighted Expert Voting).
- **Preprocessing:**
- Texts were tokenized using the standard RoBERTa tokenizer.
- The dataset utilizes a flat NER schema (nested entities are excluded, and overlapping entities are resolved to the most relevant span).
- **Metric Details:**
- **F1 type:** Strict F1 (Entity-level exact match).
- Evaluation was performed ensuring entities match both the **exact boundary span and the exact semantic label** to be considered correct.
---
## ⚖️ Precision vs Recall Behavior
*(Note to author: Describe the model’s tendency here based on your results. Example: "The model slightly favors recall over precision" or "Balanced precision and recall")*
---
## ⚙️ Usage
### Direct Use for Inference
Because this model was trained using the SpanMarker framework, it requires the `span_marker` library for inference.
```bash
pip install span_marker
```
```python
from span_marker import SpanMarkerModel
# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("P0L3/CliReNER-roberta-base")
# Run inference
text = "Anthropogenic climate change is fundamentally altering weather patterns and climate extremes, causing widespread adverse impacts to both nature and human systems (IPCC 2023)."
entities = model.predict(text)
for entity in entities:
print(f"Entity: {entity['span']} | Label: {entity['label']} | Score: {entity['score']:.4f}")
# Entity: climate change | Label: Meteorological Phenomenon | Score: 0.4065
# Entity: weather patterns | Label: Meteorological Phenomenon | Score: 0.6808
# Entity: climate extremes | Label: Meteorological Phenomenon | Score: 0.7115
# Entity: nature | Label: Other | Score: 0.4608
# Entity: human systems | Label: System | Score: 0.6562
# Entity: IPCC 2023 | Label: Other | Score: 0.4812
```
### Downstream Use
You can easily continue fine-tuning this model on your own dataset.
<details><summary>Click to expand</summary>
```python
from span_marker import SpanMarkerModel, Trainer
from datasets import load_dataset
# Download from the 🤗 Hub
model = SpanMarkerModel.from_pretrained("your-huggingface-username/your-model-name")
# Specify a Dataset with "tokens" and "ner_tags" columns
dataset = load_dataset("your_custom_dataset")
# Initialize a Trainer using the pretrained model & dataset
trainer = Trainer(
model=model,
train_dataset=dataset["train"],
eval_dataset=dataset["validation"],
)
trainer.train()
trainer.save_model("span_marker_model_id-finetuned")
```
</details>
---
## 📉 Training Details
### Training Set Metrics
| Training set | Min | Median | Max |
|:----------------------|:----|:--------|:----|
| Sentence length | 3 | 31.4819 | 97 |
| Entities per sentence | 1 | 7.0100 | 22 |
### Training Hyperparameters
- **learning_rate:** 5e-05
- **train_batch_size:** 8
- **eval_batch_size:** 8
- **seed:** 33
- **gradient_accumulation_steps:** 2
- **total_train_batch_size:** 16
- **optimizer:** adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
- **lr_scheduler_type:** linear
- **lr_scheduler_warmup_ratio:** 0.1
- **num_epochs:** 20
### Training Results (CliReNER<sub>silver</sub> Validation Split)
| Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy |
|:-----:|:----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:|
| 1.0 | 62 | 0.1324 | 0.0 | 0.0 | 0.0 | 0.6075 |
| 2.0 | 124 | 0.0839 | 0.3333 | 0.0273 | 0.0504 | 0.6166 |
| 3.0 | 186 | 0.0530 | 0.5845 | 0.4218 | 0.4900 | 0.7807 |
| 4.0 | 248 | 0.0460 | 0.6913 | 0.4433 | 0.5402 | 0.7971 |
| 5.0 | 310 | 0.0488 | 0.5965 | 0.6298 | 0.6127 | 0.8307 |
| 6.0 | 372 | 0.0447 | 0.6532 | 0.6026 | 0.6269 | 0.8340 |
| 7.0 | 434 | 0.0466 | 0.6365 | 0.6356 | 0.6360 | 0.8486 |
| 8.0 | 496 | 0.0522 | 0.6388 | 0.6370 | 0.6379 | 0.8468 |
| 9.0 | 558 | 0.0520 | 0.6437 | 0.6169 | 0.6300 | 0.8428 |
### Framework Versions
- **Python:** 3.10.19
- **SpanMarker:** 1.7.0
- **Transformers:** 4.50.0
- **PyTorch:** 2.9.1+cu126
- **Datasets:** 3.0.0
- **Tokenizers:** 0.21.4
---
## 📚 Citation
If you use this model or the CliReNER datasets in your research, please cite the project:
```latex
@misc{poleksic2026named,
author = {Poleksić, Andrija and Martinčić-Ipšić, Sanda},
title = {Named Entity Recognition for Climate Change Research},
year = {2026},
howpublished = {Research Square},
note = {Preprint}
}
```
Please also acknowledge the SpanMarker framework:
```latex
@software{Aarsen_SpanMarker,
author = {Aarsen, Tom},
license = {Apache-2.0},
title = {{SpanMarker for Named Entity Recognition}},
url = {https://github.com/tomaarsen/SpanMarkerNER}
}
```