|
|
--- |
|
|
library_name: transformers |
|
|
language: en |
|
|
license: mit |
|
|
datasets: |
|
|
- stanfordnlp/imdb |
|
|
base_model: |
|
|
- openai-community/gpt2 |
|
|
--- |
|
|
|
|
|
# Model Card: GPT-2-IMDb |
|
|
|
|
|
An in-domain GPT-2, pre-trained from scratch on the IMDb dataset text. |
|
|
|
|
|
## Model Details |
|
|
|
|
|
### Description |
|
|
|
|
|
This model is based on the [GPT-2](https://huggingface.co/openai-community/gpt2) |
|
|
architecture and was pre-trained from scratch (in-domain) using the text in IMDb dataset, excluding its test split. |
|
|
|
|
|
- **Developed by:** [Cesar Gonzalez-Gutierrez](https://ceguel.es) |
|
|
- **Funded by:** [ERC](https://erc.europa.eu) |
|
|
- **Architecture:** GPT-2 |
|
|
- **Language:** English |
|
|
- **License:** MIT |
|
|
- **Base model:** [GPT-2](https://huggingface.co/openai-community/gpt2) |
|
|
|
|
|
### Checkpoints |
|
|
|
|
|
Intermediate checkpoints from the pre-training process are available and can be accessed using specific tags, |
|
|
which correspond to training epochs and steps: |
|
|
|
|
|
| Epoch | Step | Tags | | |
|
|
|---|---|---|---| |
|
|
| 1 | 703 | epoch-1 | step-703 | |
|
|
| 5 | 3515 | epoch-5 | step-3515 | |
|
|
| 10 | 7031 | epoch-10 | step-7031 | |
|
|
| 20 | 14063 | epoch-20 | step-14063 | |
|
|
| 30 | 21095 | epoch-30 | step-21095 | |
|
|
| 40 | 28126 | epoch-40 | step-28126 | |
|
|
| 50 | 35158 | epoch-50 | step-35158 | |
|
|
| 60 | 42190 | epoch-60 | step-42190 | |
|
|
| 70 | 49221 | epoch-70 | step-49221 | |
|
|
| 80 | 56240 | epoch-80 | step-56240 | |
|
|
|
|
|
To load a model from a specific intermediate checkpoint, use the `revision` parameter with the corresponding tag: |
|
|
```python |
|
|
from transformers import AutoModelForCausalLM |
|
|
|
|
|
model = AutoModelForMaskedLM.from_pretrained("<model-name>", revision="<checkpoint-tag>") |
|
|
``` |
|
|
|
|
|
### Sources |
|
|
|
|
|
- **Paper:** [Information pending] |
|
|
|
|
|
## Training Details |
|
|
|
|
|
For more details on the training procedure, please refer to the base model's documentation: |
|
|
[Training procedure](https://huggingface.co/openai-community/gpt2#training-procedure). |
|
|
|
|
|
### Training Data |
|
|
|
|
|
All texts from IMDb dataset, excluding the test partition. |
|
|
|
|
|
#### Training Hyperparameters |
|
|
|
|
|
- **Precision:** fp16 |
|
|
- **Batch size:** 8 |
|
|
- **Gradient accumulation steps:** 12 |
|
|
|
|
|
## Uses |
|
|
|
|
|
For typical use cases and limitations, please refer to the base model's guidance: |
|
|
[Inteded uses & limitations](https://huggingface.co/openai-community/gpt2#intended-uses--limitations). |
|
|
|
|
|
## Bias, Risks, and Limitations |
|
|
|
|
|
This model inherits potential risks and limitations from the base model. Refer to: |
|
|
[Limitations and bias](https://huggingface.co/openai-community/gpt2#limitations-and-bias). |
|
|
|
|
|
## Environmental Impact |
|
|
|
|
|
- **Hardware Type:** NVIDIA A100 PCIE 40GB |
|
|
- **Runtime:** 7 h |
|
|
- **Cluster Provider:** [Artemisa](https://artemisa.ific.uv.es/web/) |
|
|
- **Compute Region:** EU |
|
|
- **Carbon Emitted:** 1.08 kg CO2 eq. |
|
|
|
|
|
## Citation |
|
|
|
|
|
**BibTeX:** |
|
|
|
|
|
[More Information Needed] |
|
|
|