# Translator This is a research project to create a model that can work with text ### How to launch in docker environment ### How to launch in your environment - Clone repository - Install dependencies by ```shell pip install poetry && poetry install ``` - Run code ```python from Translator import Writer writer = Writer.from_pretrained() # .to("cuda") print(writer(input_seq="One day I saw a ", temperature=2)) # I highly recommend high temperature ``` # Model architecture and training pipeline Transformer decoder architecture with params: - decoder blocks = 4 - vocab size = 8192 - embedding_size = 512 - number of heads = 8 - hidden size in FFN = 1024 - max_sequence_length = 128 Trained with params: - loss = CrossEntropyLoss - optimizer = Adam - batch = 400 - accumulation steps = 3 - epochs = 10 - nums of sequences in dataset = 21kk Total training time: 10 hours # Sources - Architecture inspired from [Attention Is All You Need](https://arxiv.org/abs/1706.03762) - [Dataset](https://huggingface.co/datasets/roneneldan/TinyStories)