Gpt-Translate-Nano

Gpt-Translate-Nano is a small-scale translation AI built entirely from scratch for English, French, and German. This project demonstrates the implementation of a Transformer architecture without using any pre-trained models or existing tokenizer assets.

Model Description

  • Architecture: Custom Nano-Transformer (Sequence-to-Sequence style via Next Token Prediction).
  • Parameters: ~4.66 Million.
  • Layers: 4 layers, 4 attention heads.
  • Embedding Dimension: 128.
  • Context Length: 128 tokens.
  • Training Objective: Next-token prediction on concatenated multilingual pairs: [BOS] Source [MASK] Target [EOS].

Tokenization

  • Type: Byte Pair Encoding (BPE) trained from scratch.
  • Vocabulary Size: 15,000 tokens.
  • Dataset: Trained on a 40,000-line subset of the opus_books parallel corpus (En, Fr, De).

How to Use

To use this model, you must load the weights using the GPTTranslateNano class definition provided in the original notebook and the tokenizer.json file included in this repo.

Limitations and Bias

As a 'Nano' model trained for only 5,000 iterations on a limited dataset, translations are often 'creative' or syntactically correct but semantically inaccurate. This is a proof-of-concept for educational purposes.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support