bajpaideeksha/English_hinglish_colloquial-dataset
Viewer • Updated • 3.13k • 13
How to use bajpaideeksha/hinglish-translation with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "translation" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("translation", model="bajpaideeksha/hinglish-translation") # Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("bajpaideeksha/hinglish-translation", dtype="auto")This model translates English to Hinglish (a mix of Hindi and English). It is fine-tuned using GPT-2 and LoRA (Low-Rank Adaptation) for efficient and lightweight training.
The model is designed to convert English sentences into Hinglish, a colloquial blend of Hindi and English commonly used in informal communication. It is particularly useful for applications like chatbots, social media tools, and language learning platforms.
This model is intended for:
You can use this model with the Hugging Face transformers library. Below is an example of how to load and use the model:
from transformers import AutoTokenizer, AutoModelForCausalLM
# Load the tokenizer and model
tokenizer = AutoTokenizer.from_pretrained("bajpaideeksha/hinglish-translation")
model = AutoModelForCausalLM.from_pretrained("bajpaideeksha/hinglish-translation")
# Input text
input_text = "Did you prepone the meeting?"
# Tokenize input
inputs = tokenizer(input_text, return_tensors="pt")
# Generate output
outputs = model.generate(**inputs, max_length=50)
# Decode and print the output
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
**Training Details
Dataset
The model was fine-tuned on a custom dataset of 3080 English-Hinglish sentence pairs. The dataset includes colloquial phrases, humor, and regional variations.
Training Parameters
Base model: GPT-2
Fine-tuning method: LoRA (Low-Rank Adaptation)
Epochs: 5
Batch size: 2
Learning rate: 2e-5
FP16 mixed precision: Enabled
Hardware
GPU: NVIDIA T4 (Google Colab)
Training time: ~1 hour
Limitations
Small Dataset: The model is trained on a relatively small dataset (3080 rows), so it may not generalize well to all types of sentences.
Complex Sentences: The model may struggle with complex or highly technical sentences.
Regional Variations: While the model handles some regional variations, it may not capture all dialects of Hinglish.
Humor and Context: The model may not always understand humor or context perfectly.
Ethical Considerations
Bias: The model may inherit biases present in the training data. Use with caution in sensitive applications.
Misuse: The model should not be used for generating harmful, offensive, or misleading content.
License
This model is licensed under the MIT License. See the LICENSE file for more details.
Thank you for using the Hinglish Translation Model! 😊
Base model
openai-community/gpt2