Spaces:

Abhishek11k
/

Project

Runtime error

App Files Files Community

Project / README.md

Abhishek11k

Upload 31 files

724838e verified 9 days ago

preview code

raw

history blame contribute delete

1.8 kB

	---
	title: Multilingual Transliteration
	emoji: 🌐
	colorFrom: blue
	colorTo: purple
	sdk: gradio
	sdk_version: 5.8.0
	app_file: app.py
	pinned: false
	---

	# Multilingual Transliteration Model

	This project implements a multilingual transliteration model (English -> Hindi, Bengali, Tamil) using a fine-tuned mT5 model. It focuses on optimization using CTranslate2 for fast inference and provides a Gradio-based web interface.

	## Project Structure
	- `src/`: Source code for training, optimization, and deployment.
	- `data/`: Directory for storing datasets (train/test/val).
	- `models/`: Directory for saving trained and optimized models.
	- `requirements.txt`: Python dependencies.

	## Setup

	1. Clone the repository:
	```bash
	git clone <repo_url>
	cd <repo_name>
	```

	2. Create a virtual environment (optional but recommended):
	```bash
	python -m venv venv
	.\venv\Scripts\activate # Windows
	# source venv/bin/activate # Linux/Mac
	```

	3. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	## Usage

	### 1. Data Preparation
	Generate dummy data for training:
	```bash
	python src/prepare_data.py
	```

	### 2. Training
	Train the mT5 model:
	```bash
	python src/train.py
	```

	### 3. Optimization
	Optimize the trained model using CTranslate2 and benchmark:
	```bash
	python src/optimize.py
	```

	### 4. Run Demo
	Launch the Gradio app:
	```bash
	python src/app.py
	```

	## Approach
	- Model: `google/mt5-small` is used as the base model due to its multilingual capabilities and efficiency.
	- Optimization: CTranslate2 is used to quantize and optimize the model for faster CPU/GPU inference.
	- Deployment: Gradio provides a simple and interactive UI for the model.