--- language: en license: cc-by-sa-4.0 tags: - span-marker - token-classification - ner - named-entity-recognition - generated_from_span_marker_trainer - climate-change - earth-science widget: - text: While a significant positive impact of solid-state cultivation using white rot fungi on enzymatic digestibility was reported in some studies [ 68 , 69 ] , a negative effect of fungal pretreatment on enzymatic hydrolysis was noted by investigators like Shi et al . ( 2009 ) [ 33 ] , who reported a glucose yield of 55 . 6 mg g − 1 of cotton stalks pretreated with P . chrysosporium , which was approximately 17 % lower than the yield of untreated cotton stalks after enzymatic hydrolysis in spite of significant lignin degradation . - text: We quantify changes in the properties and amount of bottom water entering the basin by combining repeat hydrographic observations , direct velocity measurements and flow structure derived from a 0 . 1 ° global ocean sea-ice model that realistically simulates AABW formation sites and export pathways . - text: The impact of these differences on cloud forcing can be signi or more . cant and as high as 30 W m In recent years , observations from satellite data have been revised considerably after significant development efforts , especially after utilizing new high-quality reference measurements from active sensors in space , and some datasets have also improved polar cloud detection . - text: If the response is significant , how does the solar forcing impact the EASM rainfall variability ? In this study , we will address these questions based on the simulation results derived from one AD 850 control experiment ( CTRL ) and four solar-only forcing experiments [ spectral solar irradiance ( SSI ) experiments ] , which were conducted by the Community Earth System ( CESM-LME ) Model – Last Millennium Ensemble modeling project ( Otto-Bliesner et al . 2016 ) . - text: Measurements from single moorings at each gateway reveal that the speed of bottom water flow into the Australian Antarctic Basin varies with location , season and density ( Fig . 3a , c , e ) . pipeline_tag: token-classification library_name: span-marker metrics: - precision - recall - f1 datasets: - P0L3/CliReNER_v_1_1_28_SILVER - P0L3/CliReNER_v_1_1_28_GOLD base_model: FacebookAI/roberta-base model-index: - name: SpanMarker with FacebookAI/roberta-base results: - task: type: token-classification name: Named Entity Recognition dataset: name: CliReNER_silver type: P0L3/CliReNER_v_1_1_28_SILVER split: eval metrics: - type: f1 value: 0.6300366300366301 name: F1 - type: precision value: 0.6437125748502994 name: Precision - type: recall value: 0.6169296987087518 name: Recall --- # SpanMarker-RoBERTa for Climate Research NER This model is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model fine-tuned for fine-grained Named Entity Recognition (NER) in the climate change research domain, extracting 28 distinct entity types. It uses[FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) as the underlying encoder. ## 📌 Model Details - **Model Type:** SpanMarker - **Encoder:**[FacebookAI/roberta-base](https://huggingface.co/FacebookAI/roberta-base) - **Maximum Sequence Length:** 512 tokens - **Maximum Entity Length:** 14 words - **Language:** English - **License:** cc-by-sa-4.0 ### Model Labels | Label | Examples | |:--------------------------|:--------------------------------------------------------------------------------------------| | Asset | "mental health", "water resources", "raw material" | | Body Part | "leaves", "plant leaves", "deep tissue compartment" | | Body of Water | "Dhaleshwari river", "rivers", "peripheral rivers" | | Chemical | "domoic acid", "cathode materials", "marine algal toxin" | | Disease | "seizures", "acute neurologic signs", "chronic epileptic syndrome" | | Ecosystem | "cloud forests", "polluted environment", "Tropical montane cloud forest" | | Energy Source | "12-cell series battery-pack prototype", "fossil fuels", "battery cells" | | Field of Study | "veterinary medicine", "reference laboratory", "study" | | Geographical Feature | "heterogenous topography", "mountainous regions", "low point" | | Intellectual Artefact | "Daily husbandry records", "data", "Veterinary medical records" | | Location | "wild", "Westbrook", "beaches" | | Mathematical Expression | "gradient", "Stepwise machine hour constraints", "difference" | | Measuring Device | "station", "EEG", "MRI scan" | | Meteorological Phenomenon | "rainfall", "climate change", "climatic variability" | | Method | "dosing", "serum monitoring", "clinical efficacy" | | Natural Disaster | "heavy metal contamination", "seasonal air pollution", "environmental pollution" | | Natural Phenomenon | "algal blooms", "biochemical changes", "changing ocean conditions" | | Organism | "Zalophus californianus", "California sea lions", "species" | | Organization | "reference laboratory", "long-term care facility", "NOAA National Marine Fisheries Service" | | Other | "marine mammal health", "normal eating", "reports" | | Person | "staff", "clinicians", "Clinicians" | | Physical Artefact | "electric vehicle", "paved east – west road", "EVs" | | Physical Phenomenon | "normal food intake", "structural abnormalities", "seasonal changes" | | Policy | "energy security", "safety", "pollution" | | Quantity | "200 mAhg − 1", ">", "energy density" | | Satellite | "TRMM", "Tropical Rainfall Measuring Mission", "satellites" | | System | "global overturning circulation", "system structure", "climate" | | Time Period | "periods of prolonged anorexia", "101 days", "several decades" | --- ## 🚀 Main Results (Selected Checkpoint) This repository provides the **best-performing checkpoint** selected from 5 runs with different random seeds. While the internal training logs tracked performance on the validation split of **CliReNERsilver**, the final model selection and the metrics below are evaluated on the independent, expert-annotated **CliReNERgold** dataset. | Metric | Score | |------------|-------| | Precision | 55.33 | | Recall | 49.18 | | F1 | 52.08 | > This checkpoint corresponds to the **seed with the highest strict F1 on the gold evaluation set** (Seed 4 - 33). --- ## 📊 Results Across Seeds We fine-tuned the model using 5 different random seeds to assess the stability and robustness of the architecture on the domain-specific text. | Seed | Precision | Recall | Strict F1 | |------|-----------|--------|-----------| | 1 | 55.39 | 48.69 | 51.83 | | 2 | 58.32 | 44.12 | 50.23 | | 3 | 54.80 | 45.92 | 49.97 | | 4 | 55.33 | 49.18 | 52.08 | | 5 | 51.19 | 43.95 | 47.30 | **Summary:** - **F1:** mean = 50.28, std = 1.91 - **Precision:** mean = 55.01, std = 2.54 - **Recall:** mean = 46.37, std = 2.47 **Model Selection Strategy:** The uploaded checkpoint is the **single best seed** (highest strict F1 on the gold dataset), ensuring strong real-world performance and high-fidelity alignment with domain-expert consensus. --- ## 📂 Dataset & Evaluation - **Training Dataset:**[CliReNERsilver](https://huggingface.co/datasets/P0L3/CliReNER_v_1_1_28_SILVER) - **Splits used:** Stratified 80:10:10 ratio (Train/Validation/Test). The 80% split was used for training. - **Evaluation Dataset:** [CliReNERgold](https://huggingface.co/datasets/P0L3/CliReNER_v_1_1_28_GOLD) - **Splits used:** Evaluated on the combined 192 sentences (expert-annotated via Weighted Expert Voting). - **Preprocessing:** - Texts were tokenized using the standard RoBERTa tokenizer. - The dataset utilizes a flat NER schema (nested entities are excluded, and overlapping entities are resolved to the most relevant span). - **Metric Details:** - **F1 type:** Strict F1 (Entity-level exact match). - Evaluation was performed ensuring entities match both the **exact boundary span and the exact semantic label** to be considered correct. --- ## ⚖️ Precision vs Recall Behavior *(Note to author: Describe the model’s tendency here based on your results. Example: "The model slightly favors recall over precision" or "Balanced precision and recall")* --- ## ⚙️ Usage ### Direct Use for Inference Because this model was trained using the SpanMarker framework, it requires the `span_marker` library for inference. ```bash pip install span_marker ``` ```python from span_marker import SpanMarkerModel # Download from the 🤗 Hub model = SpanMarkerModel.from_pretrained("P0L3/CliReNER-roberta-base") # Run inference text = "Anthropogenic climate change is fundamentally altering weather patterns and climate extremes, causing widespread adverse impacts to both nature and human systems (IPCC 2023)." entities = model.predict(text) for entity in entities: print(f"Entity: {entity['span']} | Label: {entity['label']} | Score: {entity['score']:.4f}") # Entity: climate change | Label: Meteorological Phenomenon | Score: 0.4065 # Entity: weather patterns | Label: Meteorological Phenomenon | Score: 0.6808 # Entity: climate extremes | Label: Meteorological Phenomenon | Score: 0.7115 # Entity: nature | Label: Other | Score: 0.4608 # Entity: human systems | Label: System | Score: 0.6562 # Entity: IPCC 2023 | Label: Other | Score: 0.4812 ``` ### Downstream Use You can easily continue fine-tuning this model on your own dataset.
Click to expand ```python from span_marker import SpanMarkerModel, Trainer from datasets import load_dataset # Download from the 🤗 Hub model = SpanMarkerModel.from_pretrained("your-huggingface-username/your-model-name") # Specify a Dataset with "tokens" and "ner_tags" columns dataset = load_dataset("your_custom_dataset") # Initialize a Trainer using the pretrained model & dataset trainer = Trainer( model=model, train_dataset=dataset["train"], eval_dataset=dataset["validation"], ) trainer.train() trainer.save_model("span_marker_model_id-finetuned") ```
--- ## 📉 Training Details ### Training Set Metrics | Training set | Min | Median | Max | |:----------------------|:----|:--------|:----| | Sentence length | 3 | 31.4819 | 97 | | Entities per sentence | 1 | 7.0100 | 22 | ### Training Hyperparameters - **learning_rate:** 5e-05 - **train_batch_size:** 8 - **eval_batch_size:** 8 - **seed:** 33 - **gradient_accumulation_steps:** 2 - **total_train_batch_size:** 16 - **optimizer:** adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 - **lr_scheduler_type:** linear - **lr_scheduler_warmup_ratio:** 0.1 - **num_epochs:** 20 ### Training Results (CliReNERsilver Validation Split) | Epoch | Step | Validation Loss | Validation Precision | Validation Recall | Validation F1 | Validation Accuracy | |:-----:|:----:|:---------------:|:--------------------:|:-----------------:|:-------------:|:-------------------:| | 1.0 | 62 | 0.1324 | 0.0 | 0.0 | 0.0 | 0.6075 | | 2.0 | 124 | 0.0839 | 0.3333 | 0.0273 | 0.0504 | 0.6166 | | 3.0 | 186 | 0.0530 | 0.5845 | 0.4218 | 0.4900 | 0.7807 | | 4.0 | 248 | 0.0460 | 0.6913 | 0.4433 | 0.5402 | 0.7971 | | 5.0 | 310 | 0.0488 | 0.5965 | 0.6298 | 0.6127 | 0.8307 | | 6.0 | 372 | 0.0447 | 0.6532 | 0.6026 | 0.6269 | 0.8340 | | 7.0 | 434 | 0.0466 | 0.6365 | 0.6356 | 0.6360 | 0.8486 | | 8.0 | 496 | 0.0522 | 0.6388 | 0.6370 | 0.6379 | 0.8468 | | 9.0 | 558 | 0.0520 | 0.6437 | 0.6169 | 0.6300 | 0.8428 | ### Framework Versions - **Python:** 3.10.19 - **SpanMarker:** 1.7.0 - **Transformers:** 4.50.0 - **PyTorch:** 2.9.1+cu126 - **Datasets:** 3.0.0 - **Tokenizers:** 0.21.4 --- ## 📚 Citation If you use this model or the CliReNER datasets in your research, please cite the project: ```latex @misc{poleksic2026named, author = {Poleksić, Andrija and Martinčić-Ipšić, Sanda}, title = {Named Entity Recognition for Climate Change Research}, year = {2026}, howpublished = {Research Square}, note = {Preprint} } ``` Please also acknowledge the SpanMarker framework: ```latex @software{Aarsen_SpanMarker, author = {Aarsen, Tom}, license = {Apache-2.0}, title = {{SpanMarker for Named Entity Recognition}}, url = {https://github.com/tomaarsen/SpanMarkerNER} } ```