Spaces:
Sleeping
Sleeping
| title: PyTorch_Transformer_model | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: streamlit | |
| sdk_version: 1.35.0 | |
| app_file: app.py | |
| pinned: false | |
| # π Seq2Seq Transformer English-to-Spanish Translator | |
| An interactive web application deployed on Hugging Face Spaces that demonstrates a Sequence-to-Sequence (Seq2Seq) Transformer model built from scratch using PyTorch and served via Streamlit. | |
| The application automatically builds custom word-level vocabularies from a training dataset, handles variable-length sequence padding, trains a classic multi-head attention Transformer network, and performs auto-regressive decoding for real-time translation inference. | |
| --- | |
| ## π Features | |
| * **Custom Tokenization Pipeline:** Processes raw text data, builds independent source and target vocabularies, and converts sentences into tensor indices with `<SOS>` and `<EOS>` boundaries. | |
| * **Transformer from Scratch:** Implements Sinusoidal Positional Encodings and a standard PyTorch Multi-Head Attention Transformer (`nn.Transformer`) with proper causal masks (`tgt_mask`) to prevent lookahead during training. | |
| * **On-the-Fly UI Training:** Automatically fits the model to the provided dataset on the initial load and caches the trained weights to provide instant inference for users. | |
| * **Streamlit Web Interface:** A clean, user-friendly text field interface allowing anyone to input English sentences and see Spanish translations instantaneously. | |
| --- | |
| ## π Project Structure | |
| To run properly on Hugging Face Spaces, ensure your repository contains the following files in the root directory: | |
| ```text | |
| βββ app.py # Main Streamlit web application & PyTorch script | |
| βββ requirements.txt # Python runtime dependencies | |
| βββ data.csv # Custom English/Spanish translation dataset | |
| βββ README.md # Project documentation (this file) |