Cbelem
/

scibert-certainty-ordinal

Model card Files Files and versions

scibert-certainty-ordinal / README.md

Cbelem's picture

Upload README.md

6efb6df verified about 2 months ago

|

history blame contribute delete

3.27 kB

	This is a text classification model, fully fine-tuned from a ` allenai/scibert_scivocab_uncased`. It re-uses the main BERT model and fits an ordinal regression head on the `[CLS]` token. The model is fine-tuned on the certainty labels collected in [Wurl et al (2024): _Understanding Fine-Grained Distortions in Reports for Scientific Finding_](https://aclanthology.org/2024.findings-acl.369/). The authors originally collect certainty annotations from humans using a 4-point Likert Scale ranging from (1) Uncertain to (4) Certain. Because the resulting datasets suffer from severe class imbalance, we merge the classes (1) Uncertain and (2) Somewhat Uncertain.



	### Dataset Statistics

	There are 1330 examples in the training set and 334 in the test set.
	Each example is a sentence long.
	Examples are filtered from the [copenlu/spiced](https://huggingface.co/datasets/copenlu/spiced) dataset to exhibit final score greater or equal than 4.

	The original base rates are as follows:

	\| Class \| Base Rate in Training set \| Base Rate in Test set \|
	\| ----- \| ------------------------- \| --------------------- \|
	\| 0 - Uncertain \| 5.5970 \| 7.1856 \|
	\| 1 - Somewhat Uncertain \| 15.2985 \| 17.6647 \|
	\| 2 - Somewhat Certain \| 32.3881 \| 33.2335 \|
	\| 3 - Certain \| 46.7164 \| 41.9162 \|

	After combining classes 0 and 1, we obtain the base rates below. Note that this mimicks the procedure adopted in the original paper.

	\| Class \| Base Rate in Training set \| Base Rate in Test set \|
	\| ----- \| ------------------------- \| --------------------- \|
	\| 0 - Uncertain \| 20.8955 \| 24.8503 \|
	\| 1 - Somewhat Certain \| 32.3881 \| 33.2335 \|
	\| 2 - Certain \| 46.7164 \| 41.9162 \|




	### Hyperparameter Optimization

	The published model represents one of the 29 models different configurations. The selected model maximizes Quadratic Weighted Kappa (implemented using [cohen_kappa with quadratic weights](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html)), which is better adapted to ordinal problems, such as ordinal scales. Under this metric, a random model would score 0. We adopt this metric as opposed to accuracy or macro F1 to address class imbalances.

	Here is the classification report and test set metrics:

	```
	17:44:36 INFO test loss=0.9565 acc=0.578 QWK=0.5004
	17:44:36 INFO
	precision recall f1-score support

	0 0.58 0.51 0.54 83
	1 0.47 0.46 0.46 111
	2 0.65 0.71 0.68 140

	accuracy 0.58 334
	macro avg 0.57 0.56 0.56 334
	weighted avg 0.57 0.58 0.57 334
	```


	We conduct a hyperparameter sweep of the following hyperp

	- Freeze / Unfreeze
	- LR: 1e-6 through 1e-3
	- Batch Size: 16, 32
	- Hidden Size Dimensions: 256, 128
	- Warmup Ratio: 0.05, 0.1, 0.2, 0.3
	- Epochs 30 (with patience)



	## Usage

	```python
	from transformers import AutoModelForSequenceClassification, AutoTokenizer

	model = AutoModelForSequenceClassification.from_pretrained("cbelem/scibert-certainty-ordinal", trust_remote_code=True)
	tokenizer = AutoTokenizer.from_pretrained("cbelem/scibert-certainty-ordinal", trust_remote_code=True)
	```