# Translator
This is a research project to create a model that can work with text

### How to launch in docker environment

### How to launch in your environment
- Clone repository
- Install dependencies by
```shell
pip install poetry && poetry install
```
- Run code
```python
from Translator import Writer
writer = Writer.from_pretrained() #  .to("cuda")
print(writer(input_seq="One day I saw a ", temperature=2))  # I highly recommend high temperature
```

# Model architecture and training pipeline
Transformer decoder architecture with params:
- decoder blocks = 4
- vocab size = 8192
- embedding_size = 512
- number of heads = 8
- hidden size in FFN = 1024
- max_sequence_length = 128

Trained with params:
- loss = CrossEntropyLoss
- optimizer = Adam
- batch = 400
- accumulation steps = 3
- epochs = 10
- nums of sequences in dataset = 21kk

Total training time: 10 hours

# Sources
- Architecture inspired from [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
- [Dataset](https://huggingface.co/datasets/roneneldan/TinyStories)