Translation
Transformers
PyTorch
ONNX
Safetensors
m2m_100
text2text-generation
small100
flores101
gsarti/flores_101
tico19
gmnlp/tico19
tatoeba
Instructions to use alirezamsh/small100 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use alirezamsh/small100 with Transformers:
# Use a pipeline as a high-level helper # Warning: Pipeline type "translation" is no longer supported in transformers v5. # You must load the model directly (see below) or downgrade to v4.x with: # 'pip install "transformers<5.0.0' from transformers import pipeline pipe = pipeline("translation", model="alirezamsh/small100")# Load model directly from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("alirezamsh/small100") model = AutoModelForSeq2SeqLM.from_pretrained("alirezamsh/small100") - Inference
- Notebooks
- Google Colab
- Kaggle
Commit ·
70a4f18
1
Parent(s): 3e1147d
Update README.md
Browse files
README.md
CHANGED
|
@@ -119,11 +119,15 @@ SMaLL-100 is a compact and fast massively multilingual machine translation model
|
|
| 119 |
|
| 120 |
The model architecture and config are the same as [M2M-100](https://huggingface.co/facebook/m2m100_418M/tree/main) implementation, but the tokenizer is modified to adjust language codes. So, you should load the tokenizer locally from [tokenization_small100.py](https://huggingface.co/alirezamsh/small100/blob/main/tokenization_small100.py) file for the moment.
|
| 121 |
|
| 122 |
-
**Note**: SMALL100Tokenizer requires sentencepiece, so make sure to install it by
|
|
|
|
|
|
|
| 123 |
|
| 124 |
- **Supervised Training**
|
| 125 |
|
| 126 |
-
SMaLL-100 is a seq-to-seq model for the translation task. The input to the model is ```source:[tgt_lang_code] + src_tokens + [EOS]``` and ```target: tgt_tokens + [EOS]```.
|
|
|
|
|
|
|
| 127 |
|
| 128 |
```
|
| 129 |
from transformers import M2M100ForConditionalGeneration
|
|
|
|
| 119 |
|
| 120 |
The model architecture and config are the same as [M2M-100](https://huggingface.co/facebook/m2m100_418M/tree/main) implementation, but the tokenizer is modified to adjust language codes. So, you should load the tokenizer locally from [tokenization_small100.py](https://huggingface.co/alirezamsh/small100/blob/main/tokenization_small100.py) file for the moment.
|
| 121 |
|
| 122 |
+
**Note**: SMALL100Tokenizer requires sentencepiece, so make sure to install it by:
|
| 123 |
+
|
| 124 |
+
```pip install sentencepiece```
|
| 125 |
|
| 126 |
- **Supervised Training**
|
| 127 |
|
| 128 |
+
SMaLL-100 is a seq-to-seq model for the translation task. The input to the model is ```source:[tgt_lang_code] + src_tokens + [EOS]``` and ```target: tgt_tokens + [EOS]```.
|
| 129 |
+
|
| 130 |
+
An example of supervised training is shown below:
|
| 131 |
|
| 132 |
```
|
| 133 |
from transformers import M2M100ForConditionalGeneration
|