Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,20 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Translator
|
| 2 |
+
Model that translates text from English to Russian with Attention Is All You Need transformer
|
| 3 |
+
|
| 4 |
+
# How to launch
|
| 5 |
+
- Run
|
| 6 |
+
```shell
|
| 7 |
+
docker-compose up -d --build
|
| 8 |
+
```
|
| 9 |
+
- Visit http://localhost:4000/
|
| 10 |
+
# Data
|
| 11 |
+
In this project we used OpenSubtitles English to Russian dataset.
|
| 12 |
+
|
| 13 |
+
[Link to get from OPUS](https://opus.nlpl.eu/results/en&ru/corpus-result-table)
|
| 14 |
+
|
| 15 |
+
[Direct download link](https://object.pouta.csc.fi/OPUS-OpenSubtitles/v2024/moses/en-ru.txt.zip)
|
| 16 |
+
|
| 17 |
+
# Tokenizer
|
| 18 |
+
We use Sentencepiece, params for the tokenizer:
|
| 19 |
+
- Vocabulary length = 10000
|
| 20 |
+
- Training pairs = 200000
|