Spaces:
Sleeping
Sleeping
| title: NMT demo | |
| emoji: 👌 | |
| colorFrom: red | |
| colorTo: blue | |
| sdk: gradio | |
| sdk_version: "5.19.0" | |
| app_file: app.py | |
| pinned: false | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| # Neural Machine Translation for English-Hindi | |
| This project implements a Neural Machine Translation system for English-Hindi translation using the MarianMT model fine-tuned on 100k split of Samanantar, with a user-friendly Gradio interface. | |
|  | |
| ## Features | |
| - Unidirectional translation between English and Hindi | |
| - User-friendly web interface built with Gradio | |
| - Example translations included | |
| - Built on Helsinki-NLP's MarianMT model | |
| ## Installation | |
| ### Local Setup with Virtual Environment | |
| 1. Clone the repository: | |
| ```bash | |
| git clone https://github.com/yourusername/NLPA_Assignment_2_Group_54.git | |
| cd NLPA_Assignment_2_Group_54 | |
| ``` | |
| 2. Create and activate a virtual environment: | |
| ```bash | |
| python -m venv venv | |
| source venv/bin/activate # On Windows, use: venv\Scripts\activate | |
| ``` | |
| 3. Install the required packages: | |
| ```bash | |
| pip install -r requirements.txt | |
| ``` | |
| ## Usage | |
| 1. Make sure your virtual environment is activated | |
| 2. Run the UI: | |
| ```bash | |
| python nmt_ui.py | |
| ``` | |
| 3. Open your browser and navigate to `http://localhost:7860` | |
| ## Supported Language Pairs | |
| - English -> Hindi (using rooftopcoder/opus-mt-en-hi-samanantar-100k model) | |
| ## Training the Model | |
| The `train.py` script is used to train the MarianMT model on the Samanantar dataset. The script performs the following steps: | |
| - Loads the Samanantar dataset (English-Hindi subset). | |
| - Splits the dataset into training and validation sets. | |
| - Tokenizes the dataset. | |
| - Sets up training arguments optimized for GPU. | |
| - Trains the model using the Hugging Face `Trainer` class. | |
| - Saves the trained model to the specified directory. | |
| - Uploads the trained model to the Hugging Face Hub. | |
| To train the model, run: | |
| ```bash | |
| python train.py | |
| ``` | |
| ## Testing the Model | |
| The `model_test.py` script is used to test the trained MarianMT model. The script performs the following steps: | |
| - Loads the trained model and tokenizer from the Hugging Face Hub. | |
| - Translates a sample input text from English to Hindi. | |
| - Prints the translated text. | |
| To test the model, run: | |
| ```bash | |
| python model_test.py | |
| ``` | |
| ## User Interface | |
| The `nmt_ui.py` script provides a Gradio-based user interface for translating text between English and Hindi. The interface includes options for transliteration of Romanized Hindi text to Devanagari script. | |
| To launch the interface, run: | |
| ```bash | |
| python nmt_ui.py | |
| ``` | |
| ## Model Information | |
| This project uses the MarianMT model from Hugging Face Transformers. | |
| ### Notes: | |
| - The model supports English-Hindi translation. | |
| - Based on the Helsinki-NLP/opus-mt-en-hi model. | |
| - Optimized for English -> Hindi translation pairs. | |
| - Includes transliteration support for Romanized Hindi text. | |
| ### Supported Features: | |
| - English -> Hindi translation. | |
| - Romanized Hindi -> Devanagari Hindi transliteration. | |
| ### Examples of Transliteration: | |
| - "namaste" → "नमस्ते" | |
| - "aap kaise ho" → "आप कैसे हो" | |
| - "mera naam" → "मेरा नाम" | |
| ## Project Structure | |
| ``` | |
| NLPA_Assignment_2_Group_54/ | |
| ├── nmt_ui.py # Main application file with Gradio interface | |
| ├── requirements.txt # Python dependencies | |
| └── README.md # Project documentation | |
| ``` | |
| ## License | |
| MIT | |
| ## Group Members | |
| - Shubhra J Gadhwala: 2023aa05750 | |
| - Sandeep Kumar Yadav: 2023ab05047 | |
| - Ravi Krishna Mayura: 2023ab05157 | |
| - Satheesh Kumar G: 2023ab05041 | |