---
title: Multilingual Transliteration
emoji: 🌐
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.8.0
app_file: app.py
pinned: false
---

# Multilingual Transliteration Model

This project implements a multilingual transliteration model (English -> Hindi, Bengali, Tamil) using a fine-tuned mT5 model. It focuses on optimization using CTranslate2 for fast inference and provides a Gradio-based web interface.

## Project Structure
- `src/`: Source code for training, optimization, and deployment.
- `data/`: Directory for storing datasets (train/test/val).
- `models/`: Directory for saving trained and optimized models.
- `requirements.txt`: Python dependencies.

## Setup

1.  **Clone the repository:**
    ```bash
    git clone <repo_url>
    cd <repo_name>
    ```

2.  **Create a virtual environment (optional but recommended):**
    ```bash
    python -m venv venv
    .\venv\Scripts\activate  # Windows
    # source venv/bin/activate # Linux/Mac
    ```

3.  **Install dependencies:**
    ```bash
    pip install -r requirements.txt
    ```

## Usage

### 1. Data Preparation
Generate dummy data for training:
```bash
python src/prepare_data.py
```

### 2. Training
Train the mT5 model:
```bash
python src/train.py
```

### 3. Optimization
Optimize the trained model using CTranslate2 and benchmark:
```bash
python src/optimize.py
```

### 4. Run Demo
Launch the Gradio app:
```bash
python src/app.py
```

## Approach
- **Model:** `google/mt5-small` is used as the base model due to its multilingual capabilities and efficiency.
- **Optimization:** CTranslate2 is used to quantize and optimize the model for faster CPU/GPU inference.
- **Deployment:** Gradio provides a simple and interactive UI for the model.