Spaces:
Runtime error
Runtime error
| title: Multilingual Transliteration | |
| emoji: ๐ | |
| colorFrom: blue | |
| colorTo: purple | |
| sdk: gradio | |
| sdk_version: 5.8.0 | |
| app_file: app.py | |
| pinned: false | |
| # Multilingual Transliteration Model | |
| This project implements a multilingual transliteration model (English -> Hindi, Bengali, Tamil) using a fine-tuned mT5 model. It focuses on optimization using CTranslate2 for fast inference and provides a Gradio-based web interface. | |
| ## Project Structure | |
| - `src/`: Source code for training, optimization, and deployment. | |
| - `data/`: Directory for storing datasets (train/test/val). | |
| - `models/`: Directory for saving trained and optimized models. | |
| - `requirements.txt`: Python dependencies. | |
| ## Setup | |
| 1. **Clone the repository:** | |
| ```bash | |
| git clone <repo_url> | |
| cd <repo_name> | |
| ``` | |
| 2. **Create a virtual environment (optional but recommended):** | |
| ```bash | |
| python -m venv venv | |
| .\venv\Scripts\activate # Windows | |
| # source venv/bin/activate # Linux/Mac | |
| ``` | |
| 3. **Install dependencies:** | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## Usage | |
| ### 1. Data Preparation | |
| Generate dummy data for training: | |
| ```bash | |
| python src/prepare_data.py | |
| ``` | |
| ### 2. Training | |
| Train the mT5 model: | |
| ```bash | |
| python src/train.py | |
| ``` | |
| ### 3. Optimization | |
| Optimize the trained model using CTranslate2 and benchmark: | |
| ```bash | |
| python src/optimize.py | |
| ``` | |
| ### 4. Run Demo | |
| Launch the Gradio app: | |
| ```bash | |
| python src/app.py | |
| ``` | |
| ## Approach | |
| - **Model:** `google/mt5-small` is used as the base model due to its multilingual capabilities and efficiency. | |
| - **Optimization:** CTranslate2 is used to quantize and optimize the model for faster CPU/GPU inference. | |
| - **Deployment:** Gradio provides a simple and interactive UI for the model. | |