|
|
--- |
|
|
language: |
|
|
- es |
|
|
datasets: |
|
|
- none |
|
|
tags: |
|
|
- deep-narrow |
|
|
- t5 |
|
|
- tiny |
|
|
inference: false |
|
|
|
|
|
license: apache-2.0 |
|
|
--- |
|
|
|
|
|
# T5-Spanish-Efficient-TINY (*NUEVA* Versión Deep-Narrow en español - Marzo 2024) |
|
|
|
|
|
T5-Efficient-TINY es una variación de [Google's original T5](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) que sigue la [arquitectura del modelo T5](https://huggingface.co/docs/transformers/model_doc/t5). |
|
|
Es una variación que ha sido entrenada por *Javier Albarracín* de [Quantico AI](https://www.quantico.ai/). La versión original fue compartida en el paper **[Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers](https://arxiv.org/abs/2109.10686)** |
|
|
por *Yi Tay, Mostafa Dehghani, Jinfeng Rao, William Fedus, Samira Abnar, Hyung Won Chung, Sharan Narang, Dani Yogatama, Ashish Vaswani, Donald Metzler*. |
|
|
|
|
|
Esta versión del modelo ha sido entrenado desde cero usando un dataset en español. Esta versión **NECESITA FINE-TUNE** no ha sido entrenada en ninguna tarea. |
|
|
lo positivo del modelo es que está en español y puede servir para entrenar tareas simples. Por su relativa poca complejidad y su peso <29mb es ideal para uso en CPU. |
|
|
|
|
|
Tiene su propio tokenizador español (solo letras minúsculas) con 5000 tokens de tamaño. |
|
|
|
|
|
## Detalles de arquitectura del modelo: |
|
|
|
|
|
Este modelo - **T5-spanish-efficient-tiny** - es de tipo **Tiny** con variaciones en dimensión y tamaño de las capas *feed forward*. |
|
|
Tiene **17.94** milliones de parámetros y requiere **29 MB** de memoria en *full precision* (*fp32*) |
|
|
o **15 MB** de memoria en *half precision* (*fp16* o *bf16*). |
|
|
|
|
|
Este *modelo en español* ha sido creado con características más ligeras que el modelo Tiny original. |
|
|
|
|
|
| Modelo | nl (el/dl) | ff | dm | kv | nh | #Params| |
|
|
| ----| ---- | ---- | ---- | ---- | ---- | ----| |
|
|
| This | 4/3 | 512 | 320 | 64 | 4 | 7M| |
|
|
|
|
|
Un resumen del *modelo original* T5 puede ser visto aquí: |
|
|
|
|
|
| Modelo | nl (el/dl) | ff | dm | kv | nh | #Params| |
|
|
| ----| ---- | ---- | ---- | ---- | ---- | ----| |
|
|
| Tiny | 4/4 | 1024 | 256 | 32 | 4 | 16M| |
|
|
| Mini | 4/4 | 1536 | 384 | 32 | 8 | 31M| |
|
|
| Small | 6/6 | 2048 | 512 | 32 | 8 | 60M| |
|
|
| Base | 12/12 | 3072 | 768 | 64 | 12 | 220M| |
|
|
| Large | 24/24 | 4096 | 1024 | 64 | 16 | 738M| |
|
|
| Xl | 24/24 | 16384 | 1024 | 128 | 32 | 3B| |
|
|
| XXl | 24/24 | 65536 | 1024 | 128 | 128 | 11B| |
|
|
|
|
|
Las abreviaciones usadas: |
|
|
|
|
|
| Abreviación | Definición | |
|
|
| ----| ---- | |
|
|
| nl | Number of transformer blocks (depth) | |
|
|
| dm | Dimension of embedding vector (output vector of transformers block) | |
|
|
| kv | Dimension of key/value projection matrix | |
|
|
| nh | Number of attention heads | |
|
|
| ff | Dimension of intermediate vector within transformer block (size of feed-forward projection matrix) | |
|
|
| el | Number of transformer blocks in the encoder (encoder depth) | |
|
|
| dl | Number of transformer blocks in the decoder (decoder depth) | |
|
|
| sh | Signifies that attention heads are shared | |
|
|
| skv | Signifies that key-values projection matrices are tied | |
|
|
|
|
|
If a model checkpoint has no specific, *el* or *dl* than both the number of encoder- and decoder layers correspond to *nl*. |
|
|
|
|
|
## Pre-Training |
|
|
|
|
|
Ha sido pre entrenado con 2MM de registros random del dataset MSMARCO en idioma en español. |
|
|
|
|
|
## Fine-Tuning |
|
|
|
|
|
**Nota**: Este modelo **requiere** fine tune para funcionar aquí algunos ejemplos de como hacerlo: |
|
|
|
|
|
*PyTorch*: |
|
|
|
|
|
- [Summarization](https://github.com/huggingface/transformers/tree/master/examples/pytorch/summarization) |
|
|
- [Question Answering](https://github.com/huggingface/transformers/blob/master/examples/pytorch/question-answering/run_seq2seq_qa.py) |
|
|
- [Text Classification](https://github.com/huggingface/transformers/tree/master/examples/pytorch/text-classification) - *Note*: You will have to slightly adapt the training example here to make it work with an encoder-decoder model. |
|
|
|
|
|
*Tensorflow*: |
|
|
|
|
|
- [Summarization](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/summarization) |
|
|
- [Text Classification](https://github.com/huggingface/transformers/tree/master/examples/tensorflow/text-classification) - *Note*: You will have to slightly adapt the training example here to make it work with an encoder-decoder model. |
|
|
|
|
|
*JAX/Flax*: |
|
|
|
|
|
- [Summarization](https://github.com/huggingface/transformers/tree/master/examples/flax/summarization) |
|
|
- [Text Classification](https://github.com/huggingface/transformers/tree/master/examples/flax/text-classification) - *Note*: You will have to slightly adapt the training example here to make it work with an encoder-decoder model. |