# ================================================================ # GRADIO UI FOR LUHYA MULTILINGUAL TRANSLATION MODEL # ================================================================ import gradio as gr import torch from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer import time import json class LuhyaTranslationInterface: """Gradio interface for Luhya translation model""" def __init__(self, model_name: str): self.model_name = model_name self.device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Load model and tokenizer print(f"Loading model: {model_name}") self.tokenizer = M2M100Tokenizer.from_pretrained(model_name) self.model = M2M100ForConditionalGeneration.from_pretrained(model_name) self.model.to(self.device) self.model.eval() # Language and dialect mappings self.languages = { "English": "en", "Swahili": "sw", "Luhya (General)": "luy" } self.dialects = { "Bukusu": "luy_bukusu", "Wanga": "luy_wanga", "Kisa": "luy_kisa", "Maragoli": "luy_maragoli", "Tachoni": "luy_tachoni", "Kabras": "luy_kabras", "Tsotso": "luy_tsotso", "Marachi": "luy_marachi", "Luwanga": "luy_luwanga" } # Example translations for quick testing self.examples = [ ["Good morning", "English", "Tsotso", "Basic greeting"], ["Hello, how are you?", "English", "Bukusu", "Common question"], ["Thank you very much", "English", "Wanga", "Gratitude expression"], ["What is your name?", "English", "Maragoli", "Personal question"], ["I love you", "English", "Kabras", "Emotional expression"], ["Where are you going?", "English", "Tachoni", "Direction question"] ] def translate_text(self, text: str, source_lang: str, target_dialect: str, max_length: int = 128): """Translate text using the model""" if not text.strip(): return "Please enter some text to translate.", "", 0.0 try: start_time = time.time() # Map language names to codes source_code = self.languages.get(source_lang, "en") target_code = self.dialects.get(target_dialect, "luy_bukusu") # Set tokenizer languages self.tokenizer.src_lang = source_code if source_code in ["en", "sw"] else "sw" self.tokenizer.tgt_lang = "sw" # Use Swahili as base target # Prepare input text with dialect token if source_code != "en": # For non-English input, add source dialect token input_text = text else: # For English input, add target dialect token to guide translation input_text = f"<{target_code}> {text}" # Tokenize inputs = self.tokenizer(input_text, return_tensors="pt", max_length=max_length, truncation=True).to(self.device) # Generate translation with torch.no_grad(): outputs = self.model.generate( **inputs, max_length=max_length, num_beams=4, early_stopping=True, pad_token_id=self.tokenizer.pad_token_id, eos_token_id=self.tokenizer.eos_token_id, do_sample=False, temperature=1.0 ) # Decode result translation = self.tokenizer.decode(outputs[0], skip_special_tokens=False) translation = translation.replace('~~', '').replace('~~', '').strip() # Calculate translation time translation_time = time.time() - start_time # Simple confidence score based on presence of target dialect token and length confidence = self.calculate_confidence(translation, target_code, text) return translation, f"Translation completed in {translation_time:.2f} seconds", confidence except Exception as e: return f"Translation error: {str(e)}", "Error occurred during translation", 0.0 def calculate_confidence(self, translation: str, target_code: str, source_text: str) -> float: """Calculate a simple confidence score for the translation""" score = 0.0 # Check if target dialect token is present if f"<{target_code}>" in translation: score += 0.4 # Check if translation is not just copying source if source_text.lower() not in translation.lower(): score += 0.3 # Check reasonable length words = translation.split() if 1 <= len(words) <= 15: score += 0.2 # Check for repetitive patterns if not (".)" in translation or "..." in translation): score += 0.1 return min(1.0, score) def create_interface(self): """Create the Gradio interface""" # Custom CSS for better styling css = """ .gradio-container { font-family: 'Arial', sans-serif; } .title { text-align: center; color: #2E8B57; margin-bottom: 20px; } .description { text-align: center; color: #666; margin-bottom: 30px; } .confidence-high { color: #28a745; } .confidence-medium { color: #ffc107; } .confidence-low { color: #dc3545; } """ # Create interface with gr.Blocks(css=css, title="Luhya Multilingual Translator") as demo: # Header gr.HTML("""

🌍 Luhya Multilingual Translation Model

Translate between English, Swahili, and various Luhya dialects including Bukusu, Wanga, Maragoli, and more.

This model supports bidirectional translation and dialect-specific outputs.

""") # Main interface with gr.Row(): with gr.Column(scale=1): # Input section gr.HTML("

📝 Input

") input_text = gr.Textbox( label="Text to translate", placeholder="Enter text in English, Swahili, or Luhya...", lines=3, max_lines=5 ) with gr.Row(): source_lang = gr.Dropdown( choices=list(self.languages.keys()), label="Source Language", value="English" ) target_dialect = gr.Dropdown( choices=list(self.dialects.keys()), label="Target Dialect", value="Bukusu" ) translate_btn = gr.Button("🔄 Translate", variant="primary", size="lg") with gr.Column(scale=1): # Output section gr.HTML("

✨ Translation

") output_text = gr.Textbox( label="Translated text", lines=3, max_lines=5, interactive=False ) with gr.Row(): status_text = gr.Textbox( label="Status", interactive=False, scale=2 ) confidence_score = gr.Number( label="Confidence", interactive=False, scale=1 ) # Examples section gr.HTML("

💡 Try these examples:

") examples_component = gr.Examples( examples=self.examples, inputs=[input_text, source_lang, target_dialect, gr.Textbox(visible=False)], outputs=[output_text, status_text, confidence_score], fn=lambda t, s, d, _: self.translate_text(t, s, d), cache_examples=False ) # Information section with gr.Accordion("ℹ️ Model Information", open=False): gr.HTML(f"""

Model Details

Base Model: facebook/m2m100_418M
Model Repository: {self.model_name}
Supported Languages: English, Swahili
Supported Dialects: Bukusu, Wanga, Kisa, Maragoli, Tachoni, Kabras, Tsotso, Marachi, Luwanga
Training: Fine-tuned on community-sourced Luhya translations

Usage Tips

Keep sentences reasonably short (under 100 words) for best results
The model works best with common phrases and everyday language
Confidence scores indicate model certainty about the translation
Try different dialects to see variations in translation

Cultural Context

This model was developed to support Luhya language preservation and accessibility. Luhya is a group of related Bantu languages spoken in western Kenya by the Luhya people.

""") # Set up the translation function translate_btn.click( fn=self.translate_text, inputs=[input_text, source_lang, target_dialect], outputs=[output_text, status_text, confidence_score] ) # Footer gr.HTML("""

Luhya Multilingual Translation Model

Built with ❤️ for language preservation and community accessibility

Part of the effort to digitize and preserve African languages

""") return demo # ================================================================ # STANDALONE GRADIO APP # ================================================================ def create_luhya_translator_app(model_name: str = "your-username/luhya-multilingual-m2m100"): """Create and launch the Luhya translation app""" # Initialize the interface translator = LuhyaTranslationInterface(model_name) # Create the Gradio interface demo = translator.create_interface() return demo # ================================================================ # FOR HUGGINGFACE SPACES DEPLOYMENT # ================================================================ # This is the main file that HuggingFace Spaces will run if __name__ == "__main__": import os # Get model name from environment variable or use default model_name = os.getenv("MODEL_NAME", "mamakobe/luhya-multilingual-m2m100") # Create and launch the app demo = create_luhya_translator_app(model_name) # Launch with specific settings for HuggingFace Spaces demo.launch( server_name="0.0.0.0", # Required for HuggingFace Spaces server_port=7860, # Default port for HuggingFace Spaces share=False, # Don't create public link when on Spaces show_error=True, # Show errors in interface show_tips=True, # Show Gradio tips enable_queue=True # Enable queueing for better performance )