Spaces:

Abhishek11k
/

Project

Runtime error

File size: 1,802 Bytes

724838e

---

title: Multilingual Transliteration
emoji: 🌐
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.8.0
app_file: app.py
pinned: false
---


# Multilingual Transliteration Model

This project implements a multilingual transliteration model (English -> Hindi, Bengali, Tamil) using a fine-tuned mT5 model. It focuses on optimization using CTranslate2 for fast inference and provides a Gradio-based web interface.

## Project Structure
- `src/`: Source code for training, optimization, and deployment.
- `data/`: Directory for storing datasets (train/test/val).
- `models/`: Directory for saving trained and optimized models.
- `requirements.txt`: Python dependencies.

## Setup

1.  **Clone the repository:**
    ```bash

    git clone <repo_url>

    cd <repo_name>

    ```


2.  **Create a virtual environment (optional but recommended):**
    ```bash

    python -m venv venv

    .\venv\Scripts\activate  # Windows

    # source venv/bin/activate # Linux/Mac

    ```


3.  **Install dependencies:**
    ```bash

    pip install -r requirements.txt

    ```


## Usage

### 1. Data Preparation
Generate dummy data for training:
```bash

python src/prepare_data.py

```

### 2. Training
Train the mT5 model:
```bash

python src/train.py

```

### 3. Optimization
Optimize the trained model using CTranslate2 and benchmark:
```bash

python src/optimize.py

```

### 4. Run Demo
Launch the Gradio app:
```bash

python src/app.py

```

## Approach
- **Model:** `google/mt5-small` is used as the base model due to its multilingual capabilities and efficiency.
- **Optimization:** CTranslate2 is used to quantize and optimize the model for faster CPU/GPU inference.
- **Deployment:** Gradio provides a simple and interactive UI for the model.