File size: 1,088 Bytes
b062376
7edfd55
b062376
7edfd55
 
 
99e96a4
7edfd55
 
 
 
99e96a4
 
 
7edfd55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
# Translator
This is a research project to create a model that can work with text

### How to launch in docker environment

### How to launch in your environment
- Clone repository
- Install dependencies by
```shell

pip install poetry && poetry install

```
- Run code
```python

from Translator import Writer

writer = Writer.from_pretrained() #  .to("cuda")

print(writer(input_seq="One day I saw a ", temperature=2))  # I highly recommend high temperature

```

# Model architecture and training pipeline
Transformer decoder architecture with params:
- decoder blocks = 4
- vocab size = 8192
- embedding_size = 512

- number of heads = 8

- hidden size in FFN = 1024

- max_sequence_length = 128



Trained with params:

- loss = CrossEntropyLoss

- optimizer = Adam

- batch = 400

- accumulation steps = 3

- epochs = 10

- nums of sequences in dataset = 21kk



Total training time: 10 hours



# Sources

- Architecture inspired from [Attention Is All You Need](https://arxiv.org/abs/1706.03762)

- [Dataset](https://huggingface.co/datasets/roneneldan/TinyStories)