PreDA-large / README.md

Update README.md

f0a1f15 verified 5 months ago

5.82 kB

	---
	library_name: transformers
	license: cc0-1.0
	base_model: google-t5/t5-large
	tags:
	- generated_from_trainer
	metrics:
	- rouge
	model-index:
	- name: PreDA_t5-large
	results: []
	language:
	- en
	---

	<!-- This model card has been generated automatically according to the information the Trainer had access to. You
	should probably proofread and complete it, then remove this comment. -->

	![framework](preda_architecture_digram.png)

	# PreDA-large (Prefix-Based Dream Reports Annotation)

	This model is a fine-tuned version of [google-t5/t5-large](https://huggingface.co/google-t5/t5-large) on the annotated [Dreambank.net](https://dreambank.net/) dataset.It achieves the following results on the evaluation set:

	## Intended uses & limitations

	This model is designed for research purposes. See the disclaimer for more details.

	## Training procedure

	The overall idea of our approach is to disentangle each dream report from its annotation as a whole and to create an augmented set of (dream report; single
	feature annotation). To make sure that, given the same report, the model would produce a specific HVDC feature, we simply append at
	the beginning of each report a string of the form ``HVDC-Feature:'', in a manner that closely mimics T5 task-specific prefix fine-tuning.

	After this procedure to the original dataset (\~1.8K) we obtain approximately 6.6K items. In the present study, we focused on a subset of six HVDC features:
	Characters, Activities, Emotion, Friendliness, Misfortune, and Good Fortune.
	This selection was made to exclude features that represented less than 10\% of the total instances. Notably, Good Fortune would have been excluded under this
	criterion, but we intentionally retained this feature to control against potential
	memorisation effects and to provide a counterbalance to the Misfortune feature. After filtering out instances whose annotation
	feature is not one of the six selected features, we are left with \~5.3K
	dream reports. We then generate a random split of 80\%-20\% for the training (i.e., 4,311 reports) and testing (i.e. 1,078 reports) sets.

	### Training

	#### Hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.001
	- train_batch_size: 8
	- eval_batch_size: 8
	- seed: 42
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 20
	- label_smoothing_factor: 0.1

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \| Rouge1 \| Rouge2 \| Rougel \| Rougelsum \|
	\|:-------------:\|:-----:\|:-----:\|:---------------:\|:------:\|:------:\|:------:\|:---------:\|
	\| 1.9478 \| 1.0 \| 539 \| 1.9524 \| 0.3298 \| 0.1797 \| 0.3121 \| 0.3113 \|
	\| 1.9141 \| 2.0 \| 1078 \| 1.9039 \| 0.3665 \| 0.1942 \| 0.3495 \| 0.3489 \|
	\| 1.914 \| 3.0 \| 1617 \| 1.8993 \| 0.4076 \| 0.2223 \| 0.3873 \| 0.3870 \|
	\| 1.9264 \| 4.0 \| 2156 \| 1.8725 \| 0.3454 \| 0.1843 \| 0.3306 \| 0.3302 \|
	\| 1.9018 \| 5.0 \| 2695 \| 1.8669 \| 0.3494 \| 0.1814 \| 0.3345 \| 0.3347 \|
	\| 1.889 \| 6.0 \| 3234 \| 1.8872 \| 0.3387 \| 0.1609 \| 0.3211 \| 0.3208 \|
	\| 1.8511 \| 7.0 \| 3773 \| 1.8412 \| 0.4200 \| 0.2403 \| 0.4065 \| 0.4065 \|
	\| 1.8756 \| 8.0 \| 4312 \| 1.8191 \| 0.4735 \| 0.2705 \| 0.4467 \| 0.4469 \|
	\| 1.8483 \| 9.0 \| 4851 \| 1.7966 \| 0.4915 \| 0.2996 \| 0.4662 \| 0.4665 \|
	\| 1.8182 \| 10.0 \| 5390 \| 1.7787 \| 0.5071 \| 0.3169 \| 0.4857 \| 0.4860 \|
	\| 1.7715 \| 11.0 \| 5929 \| 1.7709 \| 0.5017 \| 0.3182 \| 0.4767 \| 0.4767 \|
	\| 1.7955 \| 12.0 \| 6468 \| 1.7557 \| 0.4772 \| 0.3015 \| 0.4544 \| 0.4549 \|
	\| 1.7391 \| 13.0 \| 7007 \| 1.7279 \| 0.5644 \| 0.3693 \| 0.5270 \| 0.5281 \|
	\| 1.7013 \| 14.0 \| 7546 \| 1.7054 \| 0.5484 \| 0.3694 \| 0.5222 \| 0.5221 \|
	\| 1.7364 \| 15.0 \| 8085 \| 1.6900 \| 0.5607 \| 0.3778 \| 0.5349 \| 0.5350 \|
	\| 1.6592 \| 16.0 \| 8624 \| 1.6643 \| 0.6010 \| 0.4191 \| 0.5691 \| 0.5688 \|
	\| 1.645 \| 17.0 \| 9163 \| 1.6448 \| 0.6160 \| 0.4440 \| 0.5854 \| 0.5863 \|
	\| 1.6245 \| 18.0 \| 9702 \| 1.6264 \| 0.6301 \| 0.4640 \| 0.6015 \| 0.6018 \|
	\| 1.616 \| 19.0 \| 10241 \| 1.6145 \| 0.6578 \| 0.4933 \| 0.6253 \| 0.6251 \|
	\| 1.5914 \| 20.0 \| 10780 \| 1.6073 \| 0.6587 \| 0.4979 \| 0.6269 \| 0.6270 \|


	### Framework versions

	- Transformers 4.44.2
	- Pytorch 2.1.0+cu118
	- Datasets 3.0.1
	- Tokenizers 0.19.1

	# Usage

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	model_id = "jrc-ai/PreDA-large"
	device = "cpu"
	encoder_max_length = 100
	decoder_max_length = 50

	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

	dream = "I was talking with my brother about my birthday dinner. I was feeling sad."
	prefixes = ["Emotion", "Activities", "Characters"]
	text_inputs = ["{} : {}".format(p, dream) for p in prefixes]

	inputs = tokenizer(
	text_inputs,
	max_length=encoder_max_length,
	truncation=True,
	padding=True,
	return_tensors="pt"
	)

	output = model.generate(
	**inputs.to(device),
	do_sample=False,
	max_length=decoder_max_length,
	)

	for decode_dream in output:
	print(tokenizer.decode(decode_dream, skip_special_tokens=True))
	```

	# Dual-Use Implication
	Upon evaluation we identified no dual-use implication for the present model. The model parameters, including the weights are available under CC0 1.0 Public Domain Dedication.

	# Cite
	Please note that the paper referring to this model, titled PreDA: Prefix-Based Dream Reports Annotation
	with Generative Language Models, has been accepted for publication at LOD 2025 conference and will appear in the conference proceedings.