File size: 2,865 Bytes
44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 44c9efb 1da2009 70a453b 1da2009 44c9efb 1da2009 44c9efb | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | ---
library_name: transformers
language: en
license: apache-2.0
datasets:
- stanfordnlp/sentiment140
base_model:
- google-bert/bert-base-uncased
---
# Model Card: BERT-Sentiment140
An in-domain BERT-base model, pre-trained from scratch on the Sentiment140 dataset text.
## Model Details
### Description
This model is based on the [BERT base (uncased)](https://huggingface.co/google-bert/bert-base-uncased)
architecture and was pre-trained from scratch (in-domain) using the text in Sentiment140 dataset, excluding its test split.
Only the masked language modeling (MLM) objective was used during pre-training.
- **Developed by:** [Cesar Gonzalez-Gutierrez](https://ceguel.es)
- **Funded by:** [ERC](https://erc.europa.eu)
- **Architecture:** BERT-base
- **Language:** English
- **License:** Apache 2.0
- **Base model:** [BERT base model (uncased)](https://huggingface.co/google-bert/bert-base-uncased)
### Checkpoints
Intermediate checkpoints from the pre-training process are available and can be accessed using specific tags,
which correspond to training epochs and steps:
| Epoch | Step | Tags | |
|---|---|---|---|
| 1 | 15000 | epoch-1 | step-15000 |
| 2 | 30000 | epoch-2 | step-30000 |
| 3 | 45000 | epoch-3 | step-45000 |
| 5 | 75000 | epoch-5 | step-75000 |
| 10 | 150000 | epoch-10 | step-150000 |
| 15 | 225000 | epoch-15 | step-225000 |
| 20 | 300000 | epoch-20 | step-300000 |
| 25 | 375000 | epoch-25 | step-375000 |
To load a model from a specific intermediate checkpoint, use the `revision` parameter with the corresponding tag:
```python
from transformers import AutoModelForMaskedLM
model = AutoModelForMaskedLM.from_pretrained("<model-name>", revision="<checkpoint-tag>")
```
### Sources
- **Paper:** [Information pending]
## Training Details
For more details on the training procedure, please refer to the base model's documentation:
[Training procedure](https://huggingface.co/google-bert/bert-base-uncased#training-procedure).
### Training Data
All texts from Sentiment140 dataset, excluding the test partition.
#### Training Hyperparameters
- **Precision:** fp16
- **Batch size:** 32
- **Gradient accumulation steps:** 3
## Uses
For typical use cases and limitations, please refer to the base model's guidance:
[Inteded uses & limitations](https://huggingface.co/google-bert/bert-base-uncased#intended-uses--limitations).
## Bias, Risks, and Limitations
This model inherits potential risks and limitations from the base model. Refer to:
[Limitations and bias](https://huggingface.co/google-bert/bert-base-uncased#limitations-and-bias).
## Environmental Impact
- **Hardware Type:** NVIDIA Tesla V100 PCIE 32GB
- **Runtime:** 36.5 h
- **Cluster Provider:** [Artemisa](https://artemisa.ific.uv.es/web/)
- **Compute Region:** EU
- **Carbon Emitted:** 6.79 kg CO2 eq.
## Citation
**BibTeX:**
[More Information Needed]
|