# 🧠 MarianMT-Text-Translation-AI-Model-"en-de" A sequence-to-sequence translation model fine-tuned on English–German sentence pairs. This model translates English text into German and is built using the Hugging Face MarianMTModel. It’s suitable for general-purpose translation, language learning, and formal or semi-formal communication across English and German. --- ## ✨ Model Highlights - πŸ“Œ Base Model: Helsinki-NLP/opus-mt-en-de - πŸ“š Fine-tuned on a cleaned and tokenized parallel English-German dataset - 🌍 Direction: English β†’ German - πŸ”§ Framework: Hugging Face Transformers + PyTorch --- ## 🧠 Intended Uses - βœ… Translating English content (emails, documentation, support text) into German - βœ… Use in educational platforms for learning German - βœ… Supporting cross-lingual customer service, product documentation, or semi-formal communications --- ## 🚫 Limitations - ❌ Not optimized for informal, idiomatic, or slang expressions - ❌ Not ideal for legal, medical, or sensitive content translation - πŸ“ Sentences longer than 128 tokens are truncated - ⚠️ Domain-specific accuracy may vary (e.g., legal, technical) --- ## πŸ‹οΈβ€β™‚οΈ Training Details | Attribute | Value | |--------------------|----------------------------------| | Base Model | `Helsinki-NLP/opus-mt-en-de` | | Dataset | WMT14 English-German | | Task Type | Translation | | Max Token Length | 128 | | Epochs | 3 | | Batch Size | 16 | | Optimizer | AdamW | | Loss Function | CrossEntropyLoss | | Framework | PyTorch + Transformers | | Hardware | CUDA-enabled GPU | --- ## πŸ“Š Evaluation Metrics | Metric | Score | |------------|---------| | BLEU Score | 30.42 | --- ## πŸ”Ž Output Details - Input: English text string - Output: Translated German text string --- ## πŸš€ Usage ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM import torch model_name = "AventIQ-AI/Ai-Translate-Model-Eng-German" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModelForSeq2SeqLM.from_pretrained(model_name) model.eval() def translate(text): device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model.to(device) inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True).to(device) outputs = model.generate(**inputs) return tokenizer.decode(outputs[0], skip_special_tokens=True) # Example print(translate("How are you doing today?")) ``` --- ## πŸ“ Repository Structure ``` finetuned-model/ β”œβ”€β”€ config.json βœ… Model architecture & config β”œβ”€β”€ pytorch_model.bin βœ… Model weights β”œβ”€β”€ tokenizer_config.json βœ… Tokenizer settings β”œβ”€β”€ tokenizer.json βœ… Tokenizer vocabulary (JSON format) β”œβ”€β”€ source.spm βœ… SentencePiece model for source language β”œβ”€β”€ target.spm βœ… SentencePiece model for target language β”œβ”€β”€ special_tokens_map.json βœ… Special tokens mapping β”œβ”€β”€ generation_config.json βœ… (Optional) Generation defaults β”œβ”€β”€ README.md βœ… Model card ``` ## 🀝 Contributing Contributions are welcome! Feel free to open an issue or pull request to improve the model, training scripts, or documentation.