Token Classification
SpanMarker
Safetensors
English
ner
named-entity-recognition
generated_from_span_marker_trainer
climate-change
earth-science
Eval Results (legacy)
Instructions to use P0L3/CliReNER-roberta-base with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- SpanMarker
How to use P0L3/CliReNER-roberta-base with SpanMarker:
from span_marker import SpanMarkerModel model = SpanMarkerModel.from_pretrained("P0L3/CliReNER-roberta-base") - Notebooks
- Google Colab
- Kaggle
| language: en | |
| license: cc-by-sa-4.0 | |
| tags: | |
| - span-marker | |
| - token-classification | |
| - ner | |
| - named-entity-recognition | |
| - generated_from_span_marker_trainer | |
| - climate-change | |
| - earth-science | |
| widget: | |
| - text: While a significant positive impact of solid-state cultivation using white | |
| rot fungi on enzymatic digestibility was reported in some studies [ 68 , 69 ] | |
| , a negative effect of fungal pretreatment on enzymatic hydrolysis was noted by | |
| investigators like Shi et al . ( 2009 ) [ 33 ] , who reported a glucose yield | |
| of 55 . 6 mg g − 1 of cotton stalks pretreated with P . chrysosporium , which | |
| was approximately 17 % lower than the yield of untreated cotton stalks after enzymatic | |
| hydrolysis in spite of significant lignin degradation . | |
| - text: We quantify changes in the properties and amount of bottom water entering | |
| the basin by combining repeat hydrographic observations , direct velocity measurements | |
| and flow structure derived from a 0 . 1 ° global ocean sea-ice model that realistically | |
| simulates AABW formation sites and export pathways . | |
| - text: The impact of these differences on cloud forcing can be signi or more . cant | |
| and as high as 30 W m In recent years , observations from satellite data have | |
| been revised considerably after significant development efforts , especially after | |
| utilizing new high-quality reference measurements from active sensors in space | |
| , and some datasets have also improved polar cloud detection . | |
| - text: If the response is significant , how does the solar forcing impact the EASM | |
| rainfall variability ? In this study , we will address these questions based on | |
| the simulation results derived from one AD 850 control experiment ( CTRL ) and | |
| four solar-only forcing experiments [ spectral solar irradiance ( SSI ) experiments | |
| ] , which were conducted by the Community Earth System ( CESM-LME ) Model – Last | |
| Millennium Ensemble modeling project ( Otto-Bliesner et al . 2016 ) . | |
| - text: Measurements from single moorings at each gateway reveal that the speed of | |
| bottom water flow into the Australian Antarctic Basin varies with location , season | |
| and density ( Fig . 3a , c , e ) . | |
| pipeline_tag: token-classification | |
| library_name: span-marker | |
| metrics: | |
| - precision | |
| - recall | |
| - f1 | |
| datasets: | |
| - P0L3/CliReNER_v_1_1_28_SILVER | |
| - P0L3/CliReNER_v_1_1_28_GOLD | |
| base_model: FacebookAI/roberta-base | |
| model-index: | |
| - name: SpanMarker with FacebookAI/roberta-base | |
| results: | |
| - task: | |
| type: token-classification | |
| name: Named Entity Recognition | |
| dataset: | |
| name: CliReNER_silver | |
| type: P0L3/CliReNER_v_1_1_28_SILVER | |
| split: eval | |
| metrics: | |
| - type: f1 | |
| value: 0.6300366300366301 | |
| name: F1 | |
| - type: precision | |
| value: 0.6437125748502994 | |
| name: Precision | |
| - type: recall | |
| value: 0.6169296987087518 | |
| name: Recall | |
| # SpanMarker-RoBERTa for Climate Research NER | |
| This model is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model fine-tuned for fine-grained Named Entity Recognition (NER) in the climate change research domain, extracting 28 distinct entity types. It uses[FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) as the underlying encoder. | |
| ## 📌 Model Details | |
| - **Model Type:** SpanMarker | |
| - **Encoder:**[FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) | |
| - **Maximum Sequence Length:** 512 tokens | |
| - **Maximum Entity Length:** 14 words | |
| - **Language:** English | |
| - **License:** cc-by-sa-4.0 | |
| ### Model Labels | |
| | Label | Examples | | |
| |:--------------------------|:--------------------------------------------------------------------------------------------| | |
| | Asset | "mental health", "water resources", "raw material" | | |
| | Body Part | "leaves", "plant leaves", "deep tissue compartment" | | |
| | Body of Water | "Dhaleshwari river", "rivers", "peripheral rivers" | | |
| | Chemical | "domoic acid", "cathode materials", "marine algal toxin" | | |
| | Disease | "seizures", "acute neurologic signs", "chronic epileptic syndrome" | | |
| | Ecosystem | "cloud forests", "polluted environment", "Tropical montane cloud forest" | | |
| | Energy Source | "12-cell series battery-pack prototype", "fossil fuels", "battery cells" | | |
| | Field of Study | "veterinary medicine", "reference laboratory", "study" | | |
| | Geographical Feature | "heterogenous topography", "mountainous regions", "low point" | | |
| | Intellectual Artefact | "Daily husbandry records", "data", "Veterinary medical records" | | |
| | Location | "wild", "Westbrook", "beaches" | | |
| | Mathematical Expression | "gradient", "Stepwise machine hour constraints", "difference" | | |
| | Measuring Device | "station", "EEG", "MRI scan" | | |
| | Meteorological Phenomenon | "rainfall", "climate change", "climatic variability" | | |
| | Method | "dosing", "serum monitoring", "clinical efficacy" | | |
| | Natural Disaster | "heavy metal contamination", "seasonal air pollution", "environmental pollution" | | |
| | Natural Phenomenon | "algal blooms", "biochemical changes", "changing ocean conditions" | | |
| | Organism | "Zalophus californianus", "California sea lions", "species" | | |
| | Organization | "reference laboratory", "long-term care facility", "NOAA National Marine Fisheries Service" | | |
| | Other | "marine mammal health", "normal eating", "reports" | | |
| | Person | "staff", "clinicians", "Clinicians" | | |
| | Physical Artefact | "electric vehicle", "paved east – west road", "EVs" | | |
| | Physical Phenomenon | "normal food intake", "structural abnormalities", "seasonal changes" | | |
| | Policy | "energy security", "safety", "pollution" | | |
| | Quantity | "200 mAhg − 1", ">", "energy density" | | |
| | Satellite | "TRMM", "Tropical Rainfall Measuring Mission", "satellites" | | |
| | System | "global overturning circulation", "system structure", "climate" | | |
| | Time Period | "periods of prolonged anorexia", "101 days", "several decades" | | |
| --- | |
| ## 🚀 Main Results (Selected Checkpoint) | |
| This repository provides the **best-performing checkpoint** selected from 5 runs with different random seeds. While the internal training logs tracked performance on the validation split of **CliReNER<sub>silver</sub>**, the final model selection and the metrics below are evaluated on the independent, expert-annotated **CliReNER<sub>gold</sub>** dataset. | |
| | Metric | Score | | |
| |------------|-------| | |
| | Precision | 55.33 | | |
| | Recall | 49.18 | | |
| | F1 | 52.08 | | |
| > This checkpoint corresponds to the **seed with the highest strict F1 on the gold evaluation set** (Seed 4 - 33). | |
| --- | |
| ## 📊 Results Across Seeds | |
| We fine-tuned the model using 5 different random seeds to assess the stability and robustness of the architecture on the domain-specific text. | |
| | Seed | Precision | Recall | Strict F1 | | |
| |------|-----------|--------|-----------| | |
| | 1 | 55.39 | 48.69 | 51.83 | | |
| | 2 | 58.32 | 44.12 | 50.23 | | |
| | 3 | 54.80 | 45.92 | 49.97 | | |
| | 4 | 55.33 | 49.18 | 52.08 | | |
| | 5 | 51.19 | 43.95 | 47.30 | | |
| **Summary:** | |
| - **F1:** mean = 50.28, std = 1.91 | |
| - **Precision:** mean = 55.01, std = 2.54 | |
| - **Recall:** mean = 46.37, std = 2.47 | |
| **Model Selection Strategy:** | |
| The uploaded checkpoint is the **single best seed** (highest strict F1 on the gold dataset), ensuring strong real-world performance and high-fidelity alignment with domain-expert consensus. | |
| --- | |
| ## 📂 Dataset & Evaluation | |
| - **Training Dataset:**[CliReNER<sub>silver</sub>](https://huggingface.co/datasets/P0L3/CliReNER_v_1_1_28_SILVER) | |
| - **Splits used:** Stratified 80:10:10 ratio (Train/Validation/Test). The 80% split was used for training. | |
| - **Evaluation Dataset:** [CliReNER<sub>gold</sub>](https://huggingface.co/datasets/P0L3/CliReNER_v_1_1_28_GOLD) | |
| - **Splits used:** Evaluated on the combined 192 sentences (expert-annotated via Weighted Expert Voting). | |
| - **Preprocessing:** | |
| - Texts were tokenized using the standard RoBERTa tokenizer. | |
| - The dataset utilizes a flat NER schema (nested entities are excluded, and overlapping entities are resolved to the most relevant span). | |
| - **Metric Details:** | |
| - **F1 type:** Strict F1 (Entity-level exact match). | |
| - Evaluation was performed ensuring entities match both the **exact boundary span and the exact semantic label** to be considered correct. | |
| --- | |
| ## ⚖️ Precision vs Recall Behavior | |
| *(Note to author: Describe the model’s tendency here based on your results. Example: "The model slightly favors recall over precision" or "Balanced precision and recall")* | |
| --- | |
| ## ⚙️ Usage | |
| ### Direct Use for Inference | |
| Because this model was trained using the SpanMarker framework, it requires the `span_marker` library for inference. | |
| ```bash | |
| pip install span_marker | |
| ``` | |
| ```python | |
| from span_marker import SpanMarkerModel | |
| # Download from the 🤗 Hub | |
| model = SpanMarkerModel.from_pretrained("P0L3/CliReNER-roberta-base") | |
| # Run inference | |
| text = "Anthropogenic climate change is fundamentally altering weather patterns and climate extremes, causing widespread adverse impacts to both nature and human systems (IPCC 2023)." | |
| entities = model.predict(text) | |
| for entity in entities: | |
| print(f"Entity: {entity['span']} | Label: {entity['label']} | Score: {entity['score']:.4f}") | |
| # Entity: climate change | Label: Meteorological Phenomenon | Score: 0.4065 | |
| # Entity: weather patterns | Label: Meteorological Phenomenon | Score: 0.6808 | |
| # Entity: climate extremes | Label: Meteorological Phenomenon | Score: 0.7115 | |
| # Entity: nature | Label: Other | Score: 0.4608 | |
| # Entity: human systems | Label: System | Score: 0.6562 | |
| # Entity: IPCC 2023 | Label: Other | Score: 0.4812 | |
| ``` | |
| ### Downstream Use | |
| You can easily continue fine-tuning this model on your own dataset. | |
| <details><summary>Click to expand</summary> | |
| ```python | |
| from span_marker import SpanMarkerModel, Trainer | |
| from datasets import load_dataset | |
| # Download from the 🤗 Hub | |
| model = SpanMarkerModel.from_pretrained("your-huggingface-username/your-model-name") | |
| # Specify a Dataset with "tokens" and "ner_tags" columns | |
| dataset = load_dataset("your_custom_dataset") | |
| # Initialize a Trainer using the pretrained model & dataset | |
| trainer = Trainer( | |
| model=model, | |
| train_dataset=dataset["train"], | |
| eval_dataset=dataset["validation"], | |
| ) | |
| trainer.train() | |
| trainer.save_model("span_marker_model_id-finetuned") | |
| ``` | |
| </details> | |
| --- | |
| ## 📉 Training Details | |
| ### Training Set Metrics | |
| | Training set | Min | Median | Max | | |
| |:----------------------|:----|:--------|:----| | |
| | Sentence length | 3 | 31.4819 | 97 | | |
| | Entities per sentence | 1 | 7.0100 | 22 | | |
| ### Training Hyperparameters | |
| - **learning_rate:** 5e-05 | |
| - **train_batch_size:** 8 | |
| - **eval_batch_size:** 8 | |
| - **seed:** 33 | |
| - **gradient_accumulation_steps:** 2 | |
| - **total_train_batch_size:** 16 | |
| - **optimizer:** adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 | |
| - **lr_scheduler_type:** linear | |
| - **lr_scheduler_warmup_ratio:** 0.1 | |
| - **num_epochs:** 20 | |
| ### Training Results (CliReNER<sub>silver</sub> Validation Split) | |
| | Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy | | |
| |:-----:|:----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:| | |
| | 1.0 | 62 | 0.1324 | 0.0 | 0.0 | 0.0 | 0.6075 | | |
| | 2.0 | 124 | 0.0839 | 0.3333 | 0.0273 | 0.0504 | 0.6166 | | |
| | 3.0 | 186 | 0.0530 | 0.5845 | 0.4218 | 0.4900 | 0.7807 | | |
| | 4.0 | 248 | 0.0460 | 0.6913 | 0.4433 | 0.5402 | 0.7971 | | |
| | 5.0 | 310 | 0.0488 | 0.5965 | 0.6298 | 0.6127 | 0.8307 | | |
| | 6.0 | 372 | 0.0447 | 0.6532 | 0.6026 | 0.6269 | 0.8340 | | |
| | 7.0 | 434 | 0.0466 | 0.6365 | 0.6356 | 0.6360 | 0.8486 | | |
| | 8.0 | 496 | 0.0522 | 0.6388 | 0.6370 | 0.6379 | 0.8468 | | |
| | 9.0 | 558 | 0.0520 | 0.6437 | 0.6169 | 0.6300 | 0.8428 | | |
| ### Framework Versions | |
| - **Python:** 3.10.19 | |
| - **SpanMarker:** 1.7.0 | |
| - **Transformers:** 4.50.0 | |
| - **PyTorch:** 2.9.1+cu126 | |
| - **Datasets:** 3.0.0 | |
| - **Tokenizers:** 0.21.4 | |
| --- | |
| ## 📚 Citation | |
| If you use this model or the CliReNER datasets in your research, please cite the project: | |
| ```latex | |
| @misc{poleksic2026named, | |
| author = {Poleksić, Andrija and Martinčić-Ipšić, Sanda}, | |
| title = {Named Entity Recognition for Climate Change Research}, | |
| year = {2026}, | |
| howpublished = {Research Square}, | |
| note = {Preprint} | |
| } | |
| ``` | |
| Please also acknowledge the SpanMarker framework: | |
| ```latex | |
| @software{Aarsen_SpanMarker, | |
| author = {Aarsen, Tom}, | |
| license = {Apache-2.0}, | |
| title = {{SpanMarker for Named Entity Recognition}}, | |
| url = {https://github.com/tomaarsen/SpanMarkerNER} | |
| } | |
| ``` |