Update README.md
Browse filesUpdate model card
README.md
CHANGED
|
@@ -5,16 +5,53 @@ language:
|
|
| 5 |
tags:
|
| 6 |
- mt5
|
| 7 |
- t5
|
|
|
|
|
|
|
| 8 |
widget:
|
| 9 |
-
- text:
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
- example_title:
|
| 13 |
-
- text:
|
| 14 |
-
- example_title:
|
|
|
|
|
|
|
| 15 |
---
|
| 16 |
# Model Card for Model ID
|
| 17 |
|
| 18 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 19 |
-
Please check **google/mt5-base** model. This model is pruned version of mt5-base model to only work in Turkish and English.
|
| 20 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 5 |
tags:
|
| 6 |
- mt5
|
| 7 |
- t5
|
| 8 |
+
- text-generation-inference
|
| 9 |
+
- turkish
|
| 10 |
widget:
|
| 11 |
+
- text: >-
|
| 12 |
+
Bu hafta hasta olduğum için <extra_id_0> gittim. Midem ağrıyordu ondan
|
| 13 |
+
dolayı şu an <extra_id_1>.
|
| 14 |
+
- example_title: Turkish Example 1
|
| 15 |
+
- text: Bu gece kar yağacakmış. Yarın yollarda <extra_id_0> olabilir.
|
| 16 |
+
- example_title: Turkish Example 2
|
| 17 |
+
- text: I bought two tickets for NBA match. Do you like <extra_id_0> ?
|
| 18 |
+
- example_title: English Example 2
|
| 19 |
---
|
| 20 |
# Model Card for Model ID
|
| 21 |
|
| 22 |
<!-- Provide a quick summary of what the model is/does. -->
|
| 23 |
+
Please check [**google/mt5-base**](https://huggingface.co/google/mt5-base) model. This model is pruned version of mt5-base model to only work in Turkish and English. Also for methodology, you can check Russian version of mT5-base [cointegrated/rut5-base](https://huggingface.co/cointegrated/rut5-base).
|
| 24 |
|
| 25 |
+
# Usage
|
| 26 |
+
|
| 27 |
+
You should import required libraries by:
|
| 28 |
+
```python
|
| 29 |
+
from transformers import T5ForConditionalGeneration, T5Tokenizer
|
| 30 |
+
import torch
|
| 31 |
+
```
|
| 32 |
+
|
| 33 |
+
To load model:
|
| 34 |
+
```python
|
| 35 |
+
model = T5ForConditionalGeneration.from_pretrained('bonur/t5-base-tr')
|
| 36 |
+
tokenizer = T5Tokenizer.from_pretrained('bonur/t5-base-tr')
|
| 37 |
+
```
|
| 38 |
+
|
| 39 |
+
To make inference with given text, you can use the following code:
|
| 40 |
+
|
| 41 |
+
|
| 42 |
+
To get the sentence embeddings, you can use the following code:
|
| 43 |
+
```python
|
| 44 |
+
inputs = tokenizer("Bu hafta hasta olduğum için <extra_id_0> gittim.", return_tensors='pt')
|
| 45 |
+
with torch.no_grad():
|
| 46 |
+
hypotheses = model.generate(
|
| 47 |
+
**inputs,
|
| 48 |
+
do_sample=True, top_p=0.95,
|
| 49 |
+
num_return_sequences=2,
|
| 50 |
+
repetition_penalty=2.75,
|
| 51 |
+
max_length=32,
|
| 52 |
+
)
|
| 53 |
+
for h in hypotheses:
|
| 54 |
+
print(tokenizer1.decode(h))
|
| 55 |
+
```
|
| 56 |
+
|
| 57 |
+
You can tune parameters for better result, and this model is ready to fine-tune in downstream tasks which utilizes bilingual with English and Turkish.
|