Sashavav
/

Translator

Model card Files Files and versions

Translator / README.md

Sashavav's picture

Upload README.md with huggingface_hub

7edfd55 verified 8 months ago

|

history blame contribute delete

1.09 kB

	# Translator
	This is a research project to create a model that can work with text

	### How to launch in docker environment

	### How to launch in your environment
	- Clone repository
	- Install dependencies by
	```shell
	pip install poetry && poetry install
	```
	- Run code
	```python
	from Translator import Writer
	writer = Writer.from_pretrained() # .to("cuda")
	print(writer(input_seq="One day I saw a ", temperature=2)) # I highly recommend high temperature
	```

	# Model architecture and training pipeline
	Transformer decoder architecture with params:
	- decoder blocks = 4
	- vocab size = 8192
	- embedding_size = 512
	- number of heads = 8
	- hidden size in FFN = 1024
	- max_sequence_length = 128

	Trained with params:
	- loss = CrossEntropyLoss
	- optimizer = Adam
	- batch = 400
	- accumulation steps = 3
	- epochs = 10
	- nums of sequences in dataset = 21kk

	Total training time: 10 hours

	# Sources
	- Architecture inspired from [Attention Is All You Need](https://arxiv.org/abs/1706.03762)
	- [Dataset](https://huggingface.co/datasets/roneneldan/TinyStories)