cglez
/

gpt2-trec

Text Generation

text-generation-inference

Model card Files Files and versions

Metrics Training metrics Community

gpt2-trec / README.md

cglez's picture

Update README.md

17fdfe0 verified 3 months ago

|

history blame contribute delete

2.79 kB

	---
	library_name: transformers
	language: en
	license: mit
	datasets:
	- CogComp/trec
	base_model:
	- openai-community/gpt2
	---

	# Model Card: GPT-2-TREC

	An in-domain GPT-2, pre-trained from scratch on the TREC dataset text.

	## Model Details

	### Description

	This model is based on the [GPT-2](https://huggingface.co/openai-community/gpt2)
	architecture and was pre-trained from scratch (in-domain) using the text in TREC dataset, excluding its test split.

	- Developed by: [Cesar Gonzalez-Gutierrez](https://ceguel.es)
	- Funded by: [ERC](https://erc.europa.eu)
	- Architecture: GPT-2
	- Language: English
	- License: MIT
	- Base model: [GPT-2](https://huggingface.co/openai-community/gpt2)

	### Checkpoints

	Intermediate checkpoints from the pre-training process are available and can be accessed using specific tags,
	which correspond to training epochs and steps:

	\| Epoch \| Step \| Tags \| \|
	\|---\|---\|---\|---\|
	\| 1 \| 51 \| epoch-1 \| step-51 \|
	\| 5 \| 255 \| epoch-5 \| step-255 \|
	\| 10 \| 511 \| epoch-10 \| step-511 \|
	\| 20 \| 1023 \| epoch-20 \| step-1023 \|
	\| 40 \| 2046 \| epoch-40 \| step-2046 \|
	\| 60 \| 3070 \| epoch-60 \| step-3070 \|
	\| 80 \| 4093 \| epoch-80 \| step-4093 \|
	\| 100 \| 5116 \| epoch-100 \| step-5116 \|
	\| 120 \| 6140 \| epoch-120 \| step-6140 \|
	\| 140 \| 7163 \| epoch-140 \| step-7163 \|
	\| 160 \| 8186 \| epoch-160 \| step-8186 \|
	\| 180 \| 9210 \| epoch-180 \| step-9210 \|
	\| 199 \| 10200 \| epoch-199 \| step-10200 \|

	To load a model from a specific intermediate checkpoint, use the `revision` parameter with the corresponding tag:
	```python
	from transformers import AutoModelForCausalLM

	model = AutoModelForMaskedLM.from_pretrained("<model-name>", revision="<checkpoint-tag>")
	```

	### Sources

	- Paper: [Information pending]

	## Training Details

	For more details on the training procedure, please refer to the base model's documentation:
	[Training procedure](https://huggingface.co/openai-community/gpt2#training-procedure).

	### Training Data

	All texts from TREC dataset, excluding the test partition.

	#### Training Hyperparameters

	- Precision: fp16
	- Batch size: 8
	- Gradient accumulation steps: 12

	## Uses

	For typical use cases and limitations, please refer to the base model's guidance:
	[Inteded uses & limitations](https://huggingface.co/openai-community/gpt2#intended-uses--limitations).

	## Bias, Risks, and Limitations

	This model inherits potential risks and limitations from the base model. Refer to:
	[Limitations and bias](https://huggingface.co/openai-community/gpt2#limitations-and-bias).

	## Environmental Impact

	- Hardware Type: NVIDIA A100 PCIE 40GB
	- Runtime: 28.5
	- Cluster Provider: [Artemisa](https://artemisa.ific.uv.es/web/)
	- Compute Region: EU
	- Carbon Emitted: 4.42 kg CO2 eq.
	## Citation

	BibTeX:

	[More Information Needed]