--- title: Multilingual Transliteration emoji: 🌐 colorFrom: blue colorTo: purple sdk: gradio sdk_version: 5.8.0 app_file: app.py pinned: false --- # Multilingual Transliteration Model This project implements a multilingual transliteration model (English -> Hindi, Bengali, Tamil) using a fine-tuned mT5 model. It focuses on optimization using CTranslate2 for fast inference and provides a Gradio-based web interface. ## Project Structure - `src/`: Source code for training, optimization, and deployment. - `data/`: Directory for storing datasets (train/test/val). - `models/`: Directory for saving trained and optimized models. - `requirements.txt`: Python dependencies. ## Setup 1. **Clone the repository:** ```bash git clone cd ``` 2. **Create a virtual environment (optional but recommended):** ```bash python -m venv venv .\venv\Scripts\activate # Windows # source venv/bin/activate # Linux/Mac ``` 3. **Install dependencies:** ```bash pip install -r requirements.txt ``` ## Usage ### 1. Data Preparation Generate dummy data for training: ```bash python src/prepare_data.py ``` ### 2. Training Train the mT5 model: ```bash python src/train.py ``` ### 3. Optimization Optimize the trained model using CTranslate2 and benchmark: ```bash python src/optimize.py ``` ### 4. Run Demo Launch the Gradio app: ```bash python src/app.py ``` ## Approach - **Model:** `google/mt5-small` is used as the base model due to its multilingual capabilities and efficiency. - **Optimization:** CTranslate2 is used to quantize and optimize the model for faster CPU/GPU inference. - **Deployment:** Gradio provides a simple and interactive UI for the model.