pkraman06's picture
Update README.md
bf1a105 verified
|
Raw
History Blame Contribute Delete
1.89 kB

A newer version of the Streamlit SDK is available: 1.58.0

Upgrade
metadata
title: PyTorch_Transformer_model
emoji: 🌐
colorFrom: blue
colorTo: indigo
sdk: streamlit
sdk_version: 1.35.0
app_file: app.py
pinned: false

🌐 Seq2Seq Transformer English-to-Spanish Translator

An interactive web application deployed on Hugging Face Spaces that demonstrates a Sequence-to-Sequence (Seq2Seq) Transformer model built from scratch using PyTorch and served via Streamlit.

The application automatically builds custom word-level vocabularies from a training dataset, handles variable-length sequence padding, trains a classic multi-head attention Transformer network, and performs auto-regressive decoding for real-time translation inference.


πŸš€ Features

  • Custom Tokenization Pipeline: Processes raw text data, builds independent source and target vocabularies, and converts sentences into tensor indices with <SOS> and <EOS> boundaries.
  • Transformer from Scratch: Implements Sinusoidal Positional Encodings and a standard PyTorch Multi-Head Attention Transformer (nn.Transformer) with proper causal masks (tgt_mask) to prevent lookahead during training.
  • On-the-Fly UI Training: Automatically fits the model to the provided dataset on the initial load and caches the trained weights to provide instant inference for users.
  • Streamlit Web Interface: A clean, user-friendly text field interface allowing anyone to input English sentences and see Spanish translations instantaneously.

πŸ“‚ Project Structure

To run properly on Hugging Face Spaces, ensure your repository contains the following files in the root directory:

β”œβ”€β”€ app.py               # Main Streamlit web application & PyTorch script
β”œβ”€β”€ requirements.txt     # Python runtime dependencies
β”œβ”€β”€ data.csv             # Custom English/Spanish translation dataset
└── README.md            # Project documentation (this file)