Project / README.md
Abhishek11k's picture
Upload 31 files
724838e verified

A newer version of the Gradio SDK is available: 6.4.0

Upgrade
metadata
title: Multilingual Transliteration
emoji: 🌐
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 5.8.0
app_file: app.py
pinned: false

Multilingual Transliteration Model

This project implements a multilingual transliteration model (English -> Hindi, Bengali, Tamil) using a fine-tuned mT5 model. It focuses on optimization using CTranslate2 for fast inference and provides a Gradio-based web interface.

Project Structure

  • src/: Source code for training, optimization, and deployment.
  • data/: Directory for storing datasets (train/test/val).
  • models/: Directory for saving trained and optimized models.
  • requirements.txt: Python dependencies.

Setup

  1. Clone the repository:

    git clone <repo_url>
    cd <repo_name>
    
  2. Create a virtual environment (optional but recommended):

    python -m venv venv
    .\venv\Scripts\activate  # Windows
    # source venv/bin/activate # Linux/Mac
    
  3. Install dependencies:

    pip install -r requirements.txt
    

Usage

1. Data Preparation

Generate dummy data for training:

python src/prepare_data.py

2. Training

Train the mT5 model:

python src/train.py

3. Optimization

Optimize the trained model using CTranslate2 and benchmark:

python src/optimize.py

4. Run Demo

Launch the Gradio app:

python src/app.py

Approach

  • Model: google/mt5-small is used as the base model due to its multilingual capabilities and efficiency.
  • Optimization: CTranslate2 is used to quantize and optimize the model for faster CPU/GPU inference.
  • Deployment: Gradio provides a simple and interactive UI for the model.