Commit
·
ca8a7fb
1
Parent(s):
91a0672
Update README.md
Browse files
README.md
CHANGED
|
@@ -65,7 +65,7 @@ language:
|
|
| 65 |
- my
|
| 66 |
- ne
|
| 67 |
- nl
|
| 68 |
-
- no
|
| 69 |
- ns
|
| 70 |
- oc
|
| 71 |
- or
|
|
@@ -113,6 +113,7 @@ tags:
|
|
| 113 |
datasets:
|
| 114 |
- tico19
|
| 115 |
- flores101
|
|
|
|
| 116 |
---
|
| 117 |
|
| 118 |
# SMALL-100 Model
|
|
@@ -121,6 +122,9 @@ SMaLL-100 is a compact and fast massively multilingual machine translation model
|
|
| 121 |
|
| 122 |
The model architecture and config are the same as [M2M-100](https://huggingface.co/facebook/m2m100_418M/tree/main) implementation, but the tokenizer is modified to adjust language codes. So, you should load the tokenizer locally from [tokenization_small100.py](https://huggingface.co/alirezamsh/small100/blob/main/tokenization_small100.py) file for the moment.
|
| 123 |
|
|
|
|
|
|
|
|
|
|
| 124 |
**Note**: SMALL100Tokenizer requires sentencepiece, so make sure to install it by:
|
| 125 |
|
| 126 |
```pip install sentencepiece```
|
|
|
|
| 65 |
- my
|
| 66 |
- ne
|
| 67 |
- nl
|
| 68 |
+
- 'no'
|
| 69 |
- ns
|
| 70 |
- oc
|
| 71 |
- or
|
|
|
|
| 113 |
datasets:
|
| 114 |
- tico19
|
| 115 |
- flores101
|
| 116 |
+
- tatoeba
|
| 117 |
---
|
| 118 |
|
| 119 |
# SMALL-100 Model
|
|
|
|
| 122 |
|
| 123 |
The model architecture and config are the same as [M2M-100](https://huggingface.co/facebook/m2m100_418M/tree/main) implementation, but the tokenizer is modified to adjust language codes. So, you should load the tokenizer locally from [tokenization_small100.py](https://huggingface.co/alirezamsh/small100/blob/main/tokenization_small100.py) file for the moment.
|
| 124 |
|
| 125 |
+
- **Generation**
|
| 126 |
+
Demo is available at: https://huggingface.co/spaces/alirezamsh/small100
|
| 127 |
+
|
| 128 |
**Note**: SMALL100Tokenizer requires sentencepiece, so make sure to install it by:
|
| 129 |
|
| 130 |
```pip install sentencepiece```
|