File size: 7,360 Bytes
dffde1f
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
---

language: en
license: apache-2.0
model_name: t5-encoder-12.onnx
tags:
- validated
- text
- machine_comprehension
- t5
---

<!--- SPDX-License-Identifier: Apache-2.0 -->

# T5

## Use-cases
Transformer-based language model trained on multiple tasks including summarization, sentiment-analysis, q&a, translation etc.
The implementation in this repo is an adaptation of the [onnxt5 repo](https://github.com/abelriboulot/onnxt5) which makes the export and use of T5 with ONNX easier.

## Description
[T5](https://arxiv.org/abs/1910.10683) is a transformer model which aims to provide great flexibility and provide better semantic
understanding through the training of multiple tasks at once.

## Model

| Model | Download  | Download (with sample test data) | ONNX version | Opset version |
| ----------- | ---------- |--------------| -------------- | -------------- |
|T5-encoder       |[650.6 MB](model/t5-encoder-12.onnx) | [205.0 MB](model/t5-encoder-12.tar.gz)| 1.7 | 12
|T5-decoder-with-lm-head |[304.9 MB](model/t5-decoder-with-lm-head-12.onnx) | [304.9 MB](model/t5-decoder-with-lm-head-12.tar.gz)| 1.7 | 12


### Source
Huggingface PyTorch T5 + script changes ==> ONNX T5-encoder

Huggingface PyTorch T5 + script changes ==> ONNX T5-decoder-with-lm-head

Script changes include:
- reshaping the Huggingface models to combine the lm head with the decoder to allow for a unified model
- reshaping the encoder to output the hidden state directly

## Inference
The script for ONNX model conversion and ONNX Runtime inference is [here](dependencies/T5-export.py).
More complete utilities to export and use the models are maintained in the [onnxt5 repo](https://github.com/abelriboulot/onnxt5).

### Input to model
This implementation takes as inputs a prompt which begins by the task at hand here. Examples of some tasks include ```summarize: <PROMPT>```,
```translate English to French: <PROMPT>```, ```cola sentence: <PROMPT>```, etc.
For the full list of task you can refer to the appendix D of the [original paper](https://arxiv.org/pdf/1910.10683.pdf).


### Preprocessing steps
The easiest way to use the model is to use the onnxt5 utilities (installation instructions: ```pip install onnxt5```).

In that case you can use the model with the following piece of code:
```python

from onnxt5 import GenerativeT5

from onnxt5.api import get_encoder_decoder_tokenizer

decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()

generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)

prompt = 'translate English to French: I was a victim of a series of accidents.'

output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)

# output_text: "J'ai été victime d'une série d'accidents."

```

Or if you wish to produce the embeddings of a sentence:
```python

from onnxt5.api import get_encoder_decoder_tokenizer, run_embeddings_text



decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()

prompt = 'Listen, Billy Pilgrim has come unstuck in time.'

encoder_embeddings, decoder_embeddings = run_embeddings_text(encoder_sess, decoder_sess, tokenizer, prompt)

```

Otherwise you can manually create the Generative model with the following:

```python

from onnxruntime import InferenceSession

from transformers import T5Tokenizer

from .dependencies.models import GenerativeT5



tokenizer = T5Tokenizer.from_pretrained('t5-base')



# Start from ORT 1.10, ORT requires explicitly setting the providers parameter if you want to use execution providers

# other than the default CPU provider (as opposed to the previous behavior of providers getting set/registered by default

# based on the build flags) when instantiating InferenceSession.

# For example, if NVIDIA GPU is available and ORT Python package is built with CUDA, then call API as following:

# InferenceSession(path/to/model, providers=['CUDAExecutionProvider'])

decoder_sess = InferenceSession(str(path_t5_decoder))

encoder_sess = InferenceSession(str(path_t5_encoder))

generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)

generative_t5('translate English to French: I was a victim of a series of accidents.', 21, temperature=0.)[0]

```

### Output of model
For the T5-encoder model:

**last_hidden_state**: Sequence of hidden-states at the last layer of the model. It's a float tensor of size (batch_size, sequence_length, hidden_size).



For T5-decoder-with-lm-head model:



**logit_predictions**: Prediction scores of the language modeling head. It's a float tensor of size (batch_size, sequence_length, vocab_size).

### Postprocessing steps
For the T5-encoder model:

```python

last_hidden_states = model(input_ids)[0]

```

For the T5-decoder-with-lm-head model:

```python

# To generate the encoder's last hidden state

encoder_output = encoder_sess.run(None, {"input_ids": input_ids})[0]

# To generate the full model's embeddings

decoder_output = decoder_sess.run(None, {

"input_ids": input_ids,

"encoder_hidden_states": encoder_output

})[0]

```

For the generative model, to generate a translation:
```

from onnxt5 import GenerativeT5

from onnxt5.api import get_encoder_decoder_tokenizer

decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()

generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)

prompt = 'translate English to French: I was a victim of a series of accidents.'

output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)

```
<hr>

## Dataset (Train and validation)
The original model from Google Brain is pretrained on the [Colossal Clean Crawled Corpus](https://www.tensorflow.org/datasets/catalog/c4).
The pretrained model is referenced in [huggingface/transformers](https://github.com/huggingface/transformers/blob/master/transformers/modeling_t5.py), trained on the same data.
<hr>

## Validation accuracy
Benchmarking can be run with the following [script](https://github.com/abelriboulot/onnxt5/blob/master/notebooks/benchmark_performance.ipynb) with initial results in this [post](https://kta.io/posts/onnx_t5).
<hr>


## Publication/Attribution
This repo is based on the work of Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and
Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu from Google, as well as the implementation of T5 from the
huggingface team, the work of the Microsoft ONNX and onnxruntime teams, in particular Tianlei Wu, and the work of Thomas Wolf on generation of text.

[Original T5 Paper](https://arxiv.org/pdf/1910.10683.pdf)
```

@article{2019t5,

author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},

title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},

journal = {arXiv e-prints},

year = {2019},

archivePrefix = {arXiv},

eprint = {1910.10683},

}

```

## References
This model is converted directly from [huggingface/transformers](https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_t5.py).
<hr>

## Contributors
[Abel Riboulot](https://github.com/abelriboulot)
<hr>

## License
Apache 2.0 License
<hr>