Spaces:
Runtime error
Runtime error
File size: 1,802 Bytes
724838e | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | ---
title: Multilingual Transliteration
emoji: 🌐
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.8.0
app_file: app.py
pinned: false
---
# Multilingual Transliteration Model
This project implements a multilingual transliteration model (English -> Hindi, Bengali, Tamil) using a fine-tuned mT5 model. It focuses on optimization using CTranslate2 for fast inference and provides a Gradio-based web interface.
## Project Structure
- `src/`: Source code for training, optimization, and deployment.
- `data/`: Directory for storing datasets (train/test/val).
- `models/`: Directory for saving trained and optimized models.
- `requirements.txt`: Python dependencies.
## Setup
1. **Clone the repository:**
```bash
git clone <repo_url>
cd <repo_name>
```
2. **Create a virtual environment (optional but recommended):**
```bash
python -m venv venv
.\venv\Scripts\activate # Windows
# source venv/bin/activate # Linux/Mac
```
3. **Install dependencies:**
```bash
pip install -r requirements.txt
```
## Usage
### 1. Data Preparation
Generate dummy data for training:
```bash
python src/prepare_data.py
```
### 2. Training
Train the mT5 model:
```bash
python src/train.py
```
### 3. Optimization
Optimize the trained model using CTranslate2 and benchmark:
```bash
python src/optimize.py
```
### 4. Run Demo
Launch the Gradio app:
```bash
python src/app.py
```
## Approach
- **Model:** `google/mt5-small` is used as the base model due to its multilingual capabilities and efficiency.
- **Optimization:** CTranslate2 is used to quantize and optimize the model for faster CPU/GPU inference.
- **Deployment:** Gradio provides a simple and interactive UI for the model.
|