|
|
--- |
|
|
license: apache-2.0 |
|
|
language: |
|
|
- en |
|
|
metrics: |
|
|
- exact_match |
|
|
- f1 |
|
|
base_model: |
|
|
- google-bert/bert-base-uncased |
|
|
--- |
|
|
# Fine-Tuned BERT Models for Thermoelectric Materials Question Answering |
|
|
|
|
|
## Introduction |
|
|
|
|
|
This repository contains three BERT models fine-tuned for question-answering (QA) tasks related to thermoelectric materials. The models are trained on different datasets to evaluate their performance on specialised QA tasks in the field of materials science. |
|
|
|
|
|
We present a method for auto-generating a large question-answering dataset about thermoelectric materials for language model applications. The method was used to generate a dataset with sentence-wide contexts from a database of thermoelectric material records. The dataset was contrasted with SQuAD-v2, as well as the mixed combination of the two datasets. Hyperparameter optimisation was employed to fine-tune BERT models on each dataset, and the three best-performing models were then compared on a manually annotated test set of thermoelectric material paragraph contexts with questions spanning material names, five different properties, and temperatures during recording. The best BERT model fine-tuned on the mixed dataset outperforms the other two models when evaluated on the test dataset, indicating that mixing datasets with different semantic and syntactic scopes might be a beneficial approach to improving performance on specialised question-answering tasks. |
|
|
|
|
|
## Models Included |
|
|
|
|
|
1. **squad-v2_best** |
|
|
|
|
|
Description: Fine-tuned on the SQuAD-v2 dataset, which is a widely used benchmark for QA tasks. \ |
|
|
Dataset: SQuAD-v2 \ |
|
|
Location: squad-v2_best/ |
|
|
|
|
|
2. **te-cde_best** |
|
|
|
|
|
Description: Fine-tuned on a thermoelectric materials-specific dataset generated using our auto-generation method. \ |
|
|
Dataset: Thermoelectric Materials QA Dataset (TE-CDE) \ |
|
|
Location: te-cde_best/ |
|
|
|
|
|
3. **mixed_best** |
|
|
|
|
|
Description: Fine-tuned on a mixed dataset combining SQuAD-v2 and the thermoelectric materials dataset to enhance performance on specialised QA tasks. \ |
|
|
Dataset: Combination of SQuAD-v2 and TE-CDE \ |
|
|
Location: mixed_best/ |
|
|
|
|
|
## Dataset Details |
|
|
|
|
|
**SQuAD-v2** |
|
|
|
|
|
A reading comprehension dataset consisting of questions posed by crowdworkers on a set of Wikipedia articles. |
|
|
Some questions are unanswerable, adding complexity to the QA task. |
|
|
|
|
|
**Thermoelectric Materials QA Dataset (TE-CDE)** |
|
|
|
|
|
Auto-generated dataset containing QA pairs about thermoelectric materials. |
|
|
Contexts are sentence-wide excerpts from a database of thermoelectric material records. |
|
|
Questions cover: |
|
|
Material names |
|
|
Five different properties |
|
|
Temperatures during recording |
|
|
|
|
|
**Mixed Dataset** |
|
|
|
|
|
A combination of SQuAD-v2 and TE-CDE datasets. |
|
|
Aims to leverage the strengths of both general-purpose and domain-specific data. |
|
|
|
|
|
## Training Details |
|
|
|
|
|
Base Model: BERT Base Uncased |
|
|
Hyperparameter Optimisation: Employed to find the best-performing models for each dataset. |
|
|
Training Parameters: |
|
|
Epochs: Adjusted per dataset based on validation loss. |
|
|
Batch Size: Optimized during training. |
|
|
Learning Rate: Tuned using grid search. |
|
|
|
|
|
## Evaluation Metrics |
|
|
|
|
|
Evaluation Dataset: Manually annotated test set of thermoelectric material paragraph contexts. |
|
|
Metrics Used: |
|
|
Exact Match (EM): Measures the percentage of predictions that match any one of the ground truth answers exactly. |
|
|
F1 Score: Harmonic mean of precision and recall, considering overlap between the prediction and ground truth answers. |
|
|
|
|
|
### Performance Comparison |
|
|
Model Exact Match (EM) F1 Score |
|
|
squad-v2_best 57.60% 61.82% |
|
|
te-cde_best 65.39% 69.78% |
|
|
mixed_best 67.92% 72.29% |
|
|
|
|
|
## Usage Instructions |
|
|
|
|
|
### Installing Dependencies |
|
|
|
|
|
```bash |
|
|
pip install transformers |
|
|
``` |
|
|
|
|
|
### Loading a Model |
|
|
|
|
|
Replace `model_name` with one of the following: |
|
|
|
|
|
"odysie/bert-finetuned-qa-datasets/squad-v2_best" |
|
|
"odysie/bert-finetuned-qa-datasets/te-cde_best" |
|
|
"odysie/bert-finetuned-qa-datasets/mixed_best" |
|
|
|
|
|
```python |
|
|
from transformers import BertForQuestionAnswering, BertTokenizer |
|
|
|
|
|
model_name = "odysie/bert-finetuned-qa-datasets/mixed_best" |
|
|
|
|
|
tokenizer = BertTokenizer.from_pretrained(model_name) |
|
|
model = BertForQuestionAnswering.from_pretrained(model_name) |
|
|
|
|
|
# Example question and context |
|
|
question = "What is the chemical formula for water?" |
|
|
context = "Water is a molecule composed of two hydrogen atoms and one oxygen atom, with the chemical formula H2O." |
|
|
|
|
|
# Tokenize input |
|
|
inputs = tokenizer.encode_plus(question, context, return_tensors="pt") |
|
|
|
|
|
# Get model predictions |
|
|
outputs = model(**inputs) |
|
|
start_scores = outputs.start_logits |
|
|
end_scores = outputs.end_logits |
|
|
|
|
|
# Get the most likely beginning and end of answer with the argmax of the score |
|
|
start_index = start_scores.argmax() |
|
|
end_index = end_scores.argmax() |
|
|
|
|
|
# Convert tokens to answer |
|
|
tokens = inputs["input_ids"][0][start_index : end_index + 1] |
|
|
answer = tokenizer.decode(tokens) |
|
|
|
|
|
print(f"Answer: {answer}") |
|
|
``` |
|
|
|
|
|
## License |
|
|
|
|
|
This project is licensed under Apache 2.0 |
|
|
|
|
|
|
|
|
## Citation |
|
|
|
|
|
If you use these models in your research or application, please cite our work: |
|
|
|
|
|
bibtex |
|
|
|
|
|
(PENDING) |
|
|
|
|
|
@article{ |
|
|
... |
|
|
} |
|
|
|
|
|
## Acknowledgments |
|
|
|
|
|
We thank the contributors of the SQuAD-v2 dataset and the developers of the Hugging Face Transformers library for providing valuable resources that made this work possible. |