File size: 1,802 Bytes
724838e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
---

title: Multilingual Transliteration
emoji: 🌐
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.8.0
app_file: app.py
pinned: false
---


# Multilingual Transliteration Model

This project implements a multilingual transliteration model (English -> Hindi, Bengali, Tamil) using a fine-tuned mT5 model. It focuses on optimization using CTranslate2 for fast inference and provides a Gradio-based web interface.

## Project Structure
- `src/`: Source code for training, optimization, and deployment.
- `data/`: Directory for storing datasets (train/test/val).
- `models/`: Directory for saving trained and optimized models.
- `requirements.txt`: Python dependencies.

## Setup

1.  **Clone the repository:**
    ```bash

    git clone <repo_url>

    cd <repo_name>

    ```


2.  **Create a virtual environment (optional but recommended):**
    ```bash

    python -m venv venv

    .\venv\Scripts\activate  # Windows

    # source venv/bin/activate # Linux/Mac

    ```


3.  **Install dependencies:**
    ```bash

    pip install -r requirements.txt

    ```


## Usage

### 1. Data Preparation
Generate dummy data for training:
```bash

python src/prepare_data.py

```

### 2. Training
Train the mT5 model:
```bash

python src/train.py

```

### 3. Optimization
Optimize the trained model using CTranslate2 and benchmark:
```bash

python src/optimize.py

```

### 4. Run Demo
Launch the Gradio app:
```bash

python src/app.py

```

## Approach
- **Model:** `google/mt5-small` is used as the base model due to its multilingual capabilities and efficiency.
- **Optimization:** CTranslate2 is used to quantize and optimize the model for faster CPU/GPU inference.
- **Deployment:** Gradio provides a simple and interactive UI for the model.