---
title: PyTorch_Transformer_model  
emoji: 🌐
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.35.0
app_file: app.py
pinned: false
---
# 🌐 Seq2Seq Transformer English-to-Spanish Translator

An interactive web application deployed on Hugging Face Spaces that demonstrates a Sequence-to-Sequence (Seq2Seq) Transformer model built from scratch using PyTorch and served via Streamlit.

The application automatically builds custom word-level vocabularies from a training dataset, handles variable-length sequence padding, trains a classic multi-head attention Transformer network, and performs auto-regressive decoding for real-time translation inference.

---

## 🚀 Features

* **Custom Tokenization Pipeline:** Processes raw text data, builds independent source and target vocabularies, and converts sentences into tensor indices with `<SOS>` and `<EOS>` boundaries.
* **Transformer from Scratch:** Implements Sinusoidal Positional Encodings and a standard PyTorch Multi-Head Attention Transformer (`nn.Transformer`) with proper causal masks (`tgt_mask`) to prevent lookahead during training.
* **On-the-Fly UI Training:** Automatically fits the model to the provided dataset on the initial load and caches the trained weights to provide instant inference for users.
* **Streamlit Web Interface:** A clean, user-friendly text field interface allowing anyone to input English sentences and see Spanish translations instantaneously.

---

## 📂 Project Structure

To run properly on Hugging Face Spaces, ensure your repository contains the following files in the root directory:

```text
├── app.py               # Main Streamlit web application & PyTorch script
├── requirements.txt     # Python runtime dependencies
├── data.csv             # Custom English/Spanish translation dataset
└── README.md            # Project documentation (this file)