bert-base-cased-sci-units-ner

This model is a fine-tuned version of bert-base-cased on the PQA part of the bowenxian/BioProBench dataset It achieves the following results on the evaluation set:

  • Loss: 0.0175
  • Precision: 0.9873
  • Recall: 0.9867
  • F1: 0.9870
  • Accuracy: 0.9962

Model description

The model has been trained to perform token classification task by training the bert-base-cased model. The tokens to be classified correspond to the values and units of scientific measurements.

For example in the sentence:

"Place the seeds in a refrigerator at 4°C along with a small amount of water for 2-3 days."

The model will select "4°C" and identify the value as 4 and the unit as °C

"Centrifuge at 863g for 5 min at room temperature (18–28°C), decant supernatant and resuspend cells in culture medium."

The model will identify to value-unit combinations:

  • VALUE : 863, UNIT: g
  • VALUE : 18 - 28, UNIT: '°C'

Intended uses & limitations

Identify VALUES and scientific UNITS from a sentence.

This is a work in progress and currently only identifies the units:

  • Temperature: '°C'
  • Mass (grams): 'g, ug, mg'
  • Volume (L): 'L, uL, mL'

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Precision Recall F1 Accuracy
0.0684 1.0 682 0.0268 0.9814 0.9765 0.9790 0.9937
0.0194 2.0 1364 0.0195 0.9870 0.9837 0.9853 0.9954
0.0067 3.0 2046 0.0175 0.9873 0.9867 0.9870 0.9962

Framework versions

  • Transformers 5.0.0
  • Pytorch 2.10.0+cu128
  • Datasets 4.0.0
  • Tokenizers 0.22.2
Downloads last month
64
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for m1969m/bert-base-cased-sci-units-ner

Finetuned
(2895)
this model

Dataset used to train m1969m/bert-base-cased-sci-units-ner