Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available: 1.58.0
metadata
title: PyTorch_Transformer_model
emoji: π
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.35.0
app_file: app.py
pinned: false
π Seq2Seq Transformer English-to-Spanish Translator
An interactive web application deployed on Hugging Face Spaces that demonstrates a Sequence-to-Sequence (Seq2Seq) Transformer model built from scratch using PyTorch and served via Streamlit.
The application automatically builds custom word-level vocabularies from a training dataset, handles variable-length sequence padding, trains a classic multi-head attention Transformer network, and performs auto-regressive decoding for real-time translation inference.
π Features
- Custom Tokenization Pipeline: Processes raw text data, builds independent source and target vocabularies, and converts sentences into tensor indices with
<SOS>and<EOS>boundaries. - Transformer from Scratch: Implements Sinusoidal Positional Encodings and a standard PyTorch Multi-Head Attention Transformer (
nn.Transformer) with proper causal masks (tgt_mask) to prevent lookahead during training. - On-the-Fly UI Training: Automatically fits the model to the provided dataset on the initial load and caches the trained weights to provide instant inference for users.
- Streamlit Web Interface: A clean, user-friendly text field interface allowing anyone to input English sentences and see Spanish translations instantaneously.
π Project Structure
To run properly on Hugging Face Spaces, ensure your repository contains the following files in the root directory:
βββ app.py # Main Streamlit web application & PyTorch script
βββ requirements.txt # Python runtime dependencies
βββ data.csv # Custom English/Spanish translation dataset
βββ README.md # Project documentation (this file)