Urdu Completion Transformer (10M Parameters)

A small decoder-only Transformer for Urdu text generation, implemented from scratch in Rust using the Burn deep learning framework.

Model Description

This is a GPT-style autoregressive language model trained on Urdu Wikipedia. Given a text prompt in Urdu, it generates a continuation in Wikipedia-style prose.

Architecture

  • Type: Decoder-only Transformer
  • Parameters: ~10 million
  • Vocabulary: 10,000 (byte-level BPE)
  • Context Length: 512 tokens
  • Layers: 6 transformer blocks
  • Attention Heads: 8
  • Embedding Dimension: 256
  • FFN Dimension: 1,024

Training

  • Dataset: Urdu Wikipedia (20231101.ur)
  • Epochs: 10
  • Batch Size: 16
  • Optimizer: AdamW (lr=3e-4, gradient clipping=1.0)
  • Hardware: GPU via WGPU (Vulkan backend)

Final Metrics

Split Loss
Train 0.766
Valid 0.737

Intended Use

  • Educational: Understanding transformer architecture
  • Research: Baseline for Urdu NLP experiments
  • Demo: Interactive Urdu text completion

Hardware

Trained on consumer hardware:

  • CPU: AMD Ryzen 5 5600X
  • GPU: NVIDIA RTX 4060 (8 GB VRAM)
  • RAM: 16 GB

These constraints limited model size (~10M params) and training duration (10 epochs). The model was still improving at epoch 10 (valid loss < train loss) and would benefit from continued training.

Limitations

  • Small model: 10M parameters limits factual accuracy and coherence
  • Limited training: 10 epochs on consumer GPU; more training would improve quality
  • Byte-level tokenization: Occasional character-level artifacts in Urdu script
  • Wikipedia bias: Outputs resemble encyclopedia articles, not conversational text
  • No instruction-following: This is a completion model, not a chatbot

How to Use

This model is implemented in Rust, not Python. See the GitHub repository for:

# Interactive inference
cargo run --release --bin infer

Example

Input: یہ ایک

Output: یہ ایک بھارتی فلمی اداکارہ ہے۔ متعلقہ روابط بھارتی فلمی اداکاراؤں کی فہرست بھارتی سنیما حوالہ جات...

(Translation: "This is an Indian film actress. Related links: List of Indian film actresses, Indian cinema, References...")

Citation

@misc{Urdu-Completion-Transformer-10M,
  title={Urdu Completion Transformer (10M) in Rust},
  year={2024},
  url={https://github.com/Ibzie/Burn-Deep-Learning-Implementations/tree/main/Transformer}
}

License

MIT

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train Ibzie/Urdu-Completion-Transformer-10M