|
|
--- |
|
|
library_name: transformers |
|
|
language: en |
|
|
license: apache-2.0 |
|
|
datasets: |
|
|
- CogComp/trec |
|
|
base_model: |
|
|
- google-bert/bert-base-uncased |
|
|
--- |
|
|
|
|
|
# Model Card: BERT-TREC |
|
|
|
|
|
An in-domain BERT-base model, pre-trained from scratch on the TREC dataset text. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Description |
|
|
|
|
|
This model is based on the [BERT base (uncased)](https://huggingface.co/google-bert/bert-base-uncased) |
|
|
architecture and was pre-trained from scratch (in-domain) using the text in TREC dataset, excluding its test split. |
|
|
Only the masked language modeling (MLM) objective was used during pre-training. |
|
|
|
|
|
- **Developed by:** [Cesar Gonzalez-Gutierrez](https://ceguel.es) |
|
|
- **Funded by:** [ERC](https://erc.europa.eu) |
|
|
- **Architecture:** BERT-base |
|
|
- **Language:** English |
|
|
- **License:** Apache 2.0 |
|
|
- **Base model:** [BERT base model (uncased)](https://huggingface.co/google-bert/bert-base-uncased) |
|
|
|
|
|
### Checkpoints |
|
|
|
|
|
Intermediate checkpoints from the pre-training process are available and can be accessed using specific tags, |
|
|
which correspond to training epochs and steps: |
|
|
|
|
|
| Epoch | Step | Tags | | |
|
|
|---|---|---|---| |
|
|
| 1 | 51 | epoch-1 | step-51 | |
|
|
| 5 | 256 | epoch-5 | step-256 | |
|
|
| 10 | 513 | epoch-10 | step-513 | |
|
|
| 20 | 1026 | epoch-20 | step-1026 | |
|
|
| 40 | 2053 | epoch-40 | step-2053 | |
|
|
| 60 | 3080 | epoch-60 | step-3080 | |
|
|
| 80 | 4106 | epoch-80 | step-4106 | |
|
|
| 100 | 5133 | epoch-100 | step-5133 | |
|
|
| 120 | 6160 | epoch-120 | step-6160 | |
|
|
| 140 | 7186 | epoch-140 | step-7186 | |
|
|
| 160 | 8213 | epoch-160 | step-8213 | |
|
|
| 180 | 9240 | epoch-180 | step-9240 | |
|
|
| 199 | 10200 | epoch-199 | step-10200 | |
|
|
|
|
|
To load a model from a specific intermediate checkpoint, use the `revision` parameter with the corresponding tag: |
|
|
```python |
|
|
from transformers import AutoModelForMaskedLM |
|
|
|
|
|
model = AutoModelForMaskedLM.from_pretrained("<model-name>", revision="<checkpoint-tag>") |
|
|
``` |
|
|
|
|
|
### Sources |
|
|
|
|
|
- **Paper:** [Information pending] |
|
|
|
|
|
## Training Details |
|
|
|
|
|
For more details on the training procedure, please refer to the base model's documentation: |
|
|
[Training procedure](https://huggingface.co/google-bert/bert-base-uncased#training-procedure). |
|
|
|
|
|
### Training Data |
|
|
|
|
|
All texts from TREC dataset, excluding the test partition. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
- **Precision:** fp16 |
|
|
- **Batch size:** 32 |
|
|
- **Gradient accumulation steps:** 3 |
|
|
|
|
|
## Uses |
|
|
|
|
|
For typical use cases and limitations, please refer to the base model's guidance: |
|
|
[Inteded uses & limitations](https://huggingface.co/google-bert/bert-base-uncased#intended-uses--limitations). |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
This model inherits potential risks and limitations from the base model. Refer to: |
|
|
[Limitations and bias](https://huggingface.co/google-bert/bert-base-uncased#limitations-and-bias). |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
- **Hardware Type:** NVIDIA Tesla V100 PCIE 32GB |
|
|
- **Runtime:** 4 h |
|
|
- **Cluster Provider:** [Artemisa](https://artemisa.ific.uv.es/web/) |
|
|
- **Compute Region:** EU |
|
|
- **Carbon Emitted:** 0.74 kg CO2 eq. |
|
|
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
|
|
|
|
[More Information Needed] |
|
|
|