Gpt-Translate-Nano
Gpt-Translate-Nano is a small-scale translation AI built entirely from scratch for English, French, and German. This project demonstrates the implementation of a Transformer architecture without using any pre-trained models or existing tokenizer assets.
Model Description
- Architecture: Custom Nano-Transformer (Sequence-to-Sequence style via Next Token Prediction).
- Parameters: ~4.66 Million.
- Layers: 4 layers, 4 attention heads.
- Embedding Dimension: 128.
- Context Length: 128 tokens.
- Training Objective: Next-token prediction on concatenated multilingual pairs:
[BOS] Source [MASK] Target [EOS].
Tokenization
- Type: Byte Pair Encoding (BPE) trained from scratch.
- Vocabulary Size: 15,000 tokens.
- Dataset: Trained on a 40,000-line subset of the
opus_booksparallel corpus (En, Fr, De).
How to Use
To use this model, you must load the weights using the GPTTranslateNano class definition provided in the original notebook and the tokenizer.json file included in this repo.
Limitations and Bias
As a 'Nano' model trained for only 5,000 iterations on a limited dataset, translations are often 'creative' or syntactically correct but semantically inaccurate. This is a proof-of-concept for educational purposes.