Update README.md

918d9b8 verified 2 months ago

14.8 kB

	---
	language: en
	license: cc-by-sa-4.0
	tags:
	- span-marker
	- token-classification
	- ner
	- named-entity-recognition
	- generated_from_span_marker_trainer
	- climate-change
	- earth-science
	widget:
	- text: While a significant positive impact of solid-state cultivation using white
	rot fungi on enzymatic digestibility was reported in some studies [68, 69], a
	negative effect of fungal pretreatment on enzymatic hydrolysis was noted by investigators
	like Shi et al . (2009) [33], who reported a glucose yield of 55 . 6 mg g − 1
	of cotton stalks pretreated with P . chrysosporium, which was approximately 17%
	lower than the yield of untreated cotton stalks after enzymatic hydrolysis in
	spite of significant lignin degradation.
	- text: We quantify changes in the properties and amount of bottom water entering
	the basin by combining repeat hydrographic observations, direct velocity measurements
	and flow structure derived from a 0 . 1 ° global ocean sea-ice model that realistically
	simulates AABW formation sites and export pathways.
	- text: The impact of these differences on cloud forcing can be signi or more . cant
	and as high as 30 W m In recent years, observations from satellite data have been
	revised considerably after significant development efforts, especially after utilizing
	new high-quality reference measurements from active sensors in space, and some
	datasets have also improved polar cloud detection.
	- text: If the response is significant, how does the solar forcing impact the EASM
	rainfall variability? In this study, we will address these questions based on
	the simulation results derived from one AD 850 control experiment (CTRL) and four
	solar-only forcing experiments [spectral solar irradiance (SSI) experiments],
	which were conducted by the Community Earth System (CESM-LME) Model – Last Millennium
	Ensemble modeling project (Otto-Bliesner et al . 2016).
	- text: Measurements from single moorings at each gateway reveal that the speed of
	bottom water flow into the Australian Antarctic Basin varies with location, season
	and density (Fig . 3a, c, e).
	pipeline_tag: token-classification
	library_name: span-marker
	metrics:
	- precision
	- recall
	- f1
	datasets:
	- P0L3/CliReNER_v_1_1_28_SILVER
	- P0L3/CliReNER_v_1_1_28_GOLD
	base_model: allenai/scibert_scivocab_uncased
	model-index:
	- name: SpanMarker with allenai/scibert_scivocab_uncased
	results:
	- task:
	type: token-classification
	name: Named Entity Recognition
	dataset:
	name: CliReNER_silver
	type: P0L3/CliReNER_v_1_1_28_SILVER
	split: eval
	metrics:
	- type: f1
	value: 0.6542591267000716
	name: F1
	- type: precision
	value: 0.6528571428571428
	name: Precision
	- type: recall
	value: 0.6556671449067432
	name: Recall
	---

	# SpanMarker-SciBERT for Climate Research NER

	This model is a [SpanMarker](https://github.com/tomaarsen/SpanMarkerNER) model fine-tuned for fine-grained Named Entity Recognition (NER) in the climate change research domain, extracting 28 distinct entity types. It utilizes the domain-specific [allenai/scibert_scivocab_uncased](https://huggingface.co/allenai/scibert_scivocab_uncased) as the underlying encoder.

	## 📌 Model Details

	- Model Type: SpanMarker
	- Encoder: [allenai/scibert_scivocab_uncased](https://huggingface.co/allenai/scibert_scivocab_uncased)
	- Maximum Sequence Length: 512 tokens
	- Maximum Entity Length: 14 words
	- Language: English
	- License: cc-by-sa-4.0

	### Model Labels
	\| Label \| Examples \|
	\|:--------------------------\|:--------------------------------------------------------------------------------------------\|
	\| Asset \| "mental health", "raw material", "water resources" \|
	\| Body Part \| "plant leaves", "deep tissue compartment", "leaves" \|
	\| Body of Water \| "peripheral rivers", "Dhaleshwari river", "rivers" \|
	\| Chemical \| "cathode materials", "domoic acid", "marine algal toxin" \|
	\| Disease \| "seizures", "chronic epileptic syndrome", "acute neurologic signs" \|
	\| Ecosystem \| "polluted environment", "Tropical montane cloud forest", "cloud forests" \|
	\| Energy Source \| "battery cells", "fossil fuels", "12-cell series battery-pack prototype" \|
	\| Field of Study \| "veterinary medicine", "study", "reference laboratory" \|
	\| Geographical Feature \| "mountainous regions", "heterogenous topography", "low point" \|
	\| Intellectual Artefact \| "Veterinary medical records", "Daily husbandry records", "data" \|
	\| Location \| "Westbrook", "beaches", "wild" \|
	\| Mathematical Expression \| "gradient", "Stepwise machine hour constraints", "difference" \|
	\| Measuring Device \| "EEG", "MRI scan", "station" \|
	\| Meteorological Phenomenon \| "climatic variability", "rainfall", "climate change" \|
	\| Method \| "serum monitoring", "clinical efficacy", "dosing" \|
	\| Natural Disaster \| "environmental pollution", "heavy metal contamination", "seasonal air pollution" \|
	\| Natural Phenomenon \| "biochemical changes", "algal blooms", "changing ocean conditions" \|
	\| Organism \| "Zalophus californianus", "species", "California sea lions" \|
	\| Organization \| "long-term care facility", "NOAA National Marine Fisheries Service", "reference laboratory" \|
	\| Other \| "normal eating", "reports", "marine mammal health" \|
	\| Person \| "Clinicians", "staff", "clinicians" \|
	\| Physical Artefact \| "electric vehicle", "paved east – west road", "EVs" \|
	\| Physical Phenomenon \| "seasonal changes", "structural abnormalities", "normal food intake" \|
	\| Policy \| "safety", "pollution", "energy security" \|
	\| Quantity \| "energy density", ">", "200 mAhg − 1" \|
	\| Satellite \| "TRMM", "satellites", "Tropical Rainfall Measuring Mission" \|
	\| System \| "global overturning circulation", "climate", "system structure" \|
	\| Time Period \| "periods of prolonged anorexia", "several decades", "101 days" \|

	---

	## 🚀 Main Results (Selected Checkpoint)

	This repository provides the best-performing checkpoint selected from 5 runs with different random seeds. While the internal training logs tracked performance on the validation split of CliReNER<sub>silver</sub>, the final model selection and the metrics below are evaluated on the independent, expert-annotated CliReNER<sub>gold</sub> dataset.

	\| Metric \| Score \|
	\|------------\|-------\|
	\| Precision \| XX.XX \|
	\| Recall \| XX.XX \|
	\| F1 \| XX.XX \|

	> This checkpoint corresponds to the seed with the highest strict F1 on the gold evaluation set.

	---

	## 📊 Results Across Seeds

	We fine-tuned the model using 5 different random seeds to assess the stability and robustness of the architecture on the domain-specific text.

	\| Seed \| Precision \| Recall \| Strict F1 \|
	\|------\|-----------\|--------\|-----------\|
	\| 1 \| XX.XX \| XX.XX \| XX.XX \|
	\| 2 \| XX.XX \| XX.XX \| XX.XX \|
	\| 3 \| XX.XX \| XX.XX \| XX.XX \|
	\| 4 \| XX.XX \| XX.XX \| XX.XX \|
	\| 5 \| XX.XX \| XX.XX \| XX.XX \|

	Summary:

	- F1: mean = XX.XX, std = XX.XX
	- Precision: mean = XX.XX, std = XX.XX
	- Recall: mean = XX.XX, std = XX.XX

	Model Selection Strategy:
	The uploaded checkpoint is the single best seed (highest strict F1 on the gold dataset), ensuring strong real-world performance and high-fidelity alignment with domain-expert consensus.

	---

	## 📂 Dataset & Evaluation

	- Training Dataset: [CliReNER<sub>silver</sub>](https://huggingface.co/datasets/P0L3/CliReNER_v_1_1_28_SILVER)
	- Splits used: Stratified 80:10:10 ratio (Train/Validation/Test). The 80% split was used for training.
	- Evaluation Dataset: [CliReNER<sub>gold</sub>](https://huggingface.co/datasets/P0L3/CliReNER_v_1_1_28_GOLD)
	- Splits used: Evaluated on the combined 192 sentences (expert-annotated via Weighted Expert Voting).
	- Preprocessing:
	- Texts were tokenized using the standard SciBERT WordPiece tokenizer (`scivocab`).
	- The dataset utilizes a flat NER schema (nested entities are excluded, and overlapping entities are resolved to the most relevant span).
	- Metric Details:
	- F1 type: Strict F1 (Entity-level exact match).
	- Evaluation was performed ensuring entities match both the exact boundary span and the exact semantic label to be considered correct.

	---

	## ⚖️ Precision vs Recall Behavior

	(Note to author: Describe the model’s tendency here based on your results. Example: "The model slightly favors recall over precision" or "Balanced precision and recall")

	---

	## ⚙️ Usage

	### Direct Use for Inference

	Because this model was trained using the SpanMarker framework, it requires the `span_marker` library for inference.

	```bash
	pip install span_marker
	```

	```python
	from span_marker import SpanMarkerModel

	# Download from the 🤗 Hub
	model = SpanMarkerModel.from_pretrained("P0L3/CliReNER-scibert_scivocab_uncased")

	# Run inference
	text = "At the same time, terrestrial systems are shifting as warming drives rapid changes in frozen soils. These soils, which cover 20% of the Earth’s land surface, are degrading, with cascading effects on water, energy, and carbon cycles (Zhao et al. 2026)."
	entities = model.predict(text)

	for entity in entities:
	print(f"Entity: {entity['span']} \| Label: {entity['label']} \| Score: {entity['score']:.4f}")

	# Entity: terrestrial systems \| Label: System \| Score: 0.6377
	# Entity: warming \| Label: Physical Phenomenon \| Score: 0.4777
	# Entity: frozen soils \| Label: Geographical Feature \| Score: 0.3317
	# Entity: soils \| Label: Body of Water \| Score: 0.3320
	# Entity: 20% \| Label: Quantity \| Score: 0.9814
	# Entity: Earth \| Label: Location \| Score: 0.9857
	# Entity: land surface \| Label: Geographical Feature \| Score: 0.5472
	# Entity: degrading \| Label: Other \| Score: 0.4863
	# Entity: energy \| Label: Chemical \| Score: 0.4364
	# Entity: water \| Label: Chemical \| Score: 0.6675
	# Entity: carbon cycles \| Label: Physical Phenomenon \| Score: 0.6238
	# Entity: Zhao et al. \| Label: Person \| Score: 0.8318
	# Entity: 2026 \| Label: Time Period \| Score: 0.9117

	```

	### Downstream Use

	You can easily continue fine-tuning this model on your own dataset.

	<details><summary>Click to expand</summary>

	```python
	from span_marker import SpanMarkerModel, Trainer
	from datasets import load_dataset

	# Download from the 🤗 Hub
	model = SpanMarkerModel.from_pretrained("your-huggingface-username/your-model-name")

	# Specify a Dataset with "tokens" and "ner_tags" columns
	dataset = load_dataset("your_custom_dataset")

	# Initialize a Trainer using the pretrained model & dataset
	trainer = Trainer(
	model=model,
	train_dataset=dataset["train"],
	eval_dataset=dataset["validation"],
	)
	trainer.train()
	trainer.save_model("span_marker_model_id-finetuned")
	```
	</details>

	---

	## 📉 Training Details

	### Training Set Metrics
	\| Training set \| Min \| Median \| Max \|
	\|:----------------------\|:----\|:--------\|:----\|
	\| Sentence length \| 3 \| 31.4819 \| 97 \|
	\| Entities per sentence \| 1 \| 7.0100 \| 22 \|

	### Training Hyperparameters
	- learning_rate: 5e-05
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 0
	- gradient_accumulation_steps: 2
	- total_train_batch_size: 16
	- optimizer: adamw_torch with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- lr_scheduler_warmup_ratio: 0.1
	- num_epochs: 20

	### Training Results (CliReNER<sub>silver</sub> Validation Split)

	\| Epoch \| Step \| Validation Loss \| Validation Precision \| Validation Recall \| Validation F1 \| Validation Accuracy \|
	\|:-----:\|:----:\|:---------------:\|:--------------------:\|:-----------------:\|:-------------:\|:-------------------:\|
	\| 1.0 \| 62 \| 0.1052 \| 0.0 \| 0.0 \| 0.0 \| 0.6075 \|
	\| 2.0 \| 124 \| 0.0534 \| 0.5492 \| 0.4648 \| 0.5035 \| 0.7995 \|
	\| 3.0 \| 186 \| 0.0392 \| 0.7086 \| 0.5409 \| 0.6135 \| 0.8201 \|
	\| 4.0 \| 248 \| 0.0415 \| 0.6404 \| 0.6184 \| 0.6292 \| 0.8374 \|
	\| 5.0 \| 310 \| 0.0382 \| 0.6823 \| 0.6471 \| 0.6642 \| 0.8513 \|
	\| 6.0 \| 372 \| 0.0449 \| 0.6888 \| 0.6098 \| 0.6469 \| 0.8468 \|
	\| 7.0 \| 434 \| 0.0483 \| 0.6611 \| 0.6298 \| 0.6451 \| 0.8498 \|
	\| 8.0 \| 496 \| 0.0497 \| 0.6529 \| 0.6557 \| 0.6543 \| 0.8531 \|

	### Framework Versions
	- Python: 3.10.19
	- SpanMarker: 1.7.0
	- Transformers: 4.50.0
	- PyTorch: 2.9.1
	- Datasets: 3.0.0
	- Tokenizers: 0.21.4

	---

	## 📚 Citation

	If you use this model or the CliReNER datasets in your research, please cite the project:

	```latex
	@misc{poleksic2026named,
	author = {Poleksić, Andrija and Martinčić-Ipšić, Sanda},
	title = {Named Entity Recognition for Climate Change Research},
	year = {2026},
	howpublished = {Research Square},
	note = {Preprint}
	}
	```

	Please also acknowledge the SpanMarker framework:

	```latex
	@software{Aarsen_SpanMarker,
	author = {Aarsen, Tom},
	license = {Apache-2.0},
	title = {{SpanMarker for Named Entity Recognition}},
	url = {https://github.com/tomaarsen/SpanMarkerNER}
	}
	```