--- license: apache-2.0 language: - en - es tags: - translation - marian - nmt - encoder-decoder pipeline_tag: translation widget: - text: "The weather is beautiful today." example_title: "Simple sentence" - text: "Machine learning is transforming the way we build software applications." example_title: "Technical text" - text: "The European Union has proposed new regulations on artificial intelligence." example_title: "Formal text" datasets: - opus100 - europarl_bilingual - un_pc model-index: - name: pumadic-en-es results: [] --- # Pumatic English-Spanish Translation Model A neural machine translation model for English to Spanish translation built with the MarianMT architecture. ## Model Description - **Model type:** Encoder-Decoder (MarianMT architecture) - **Language pair:** English → Spanish - **Parameters:** ~74.7M - **GPU:** H100 - **Trained by:** [pumad](https://huggingface.co/pumad) ## Training Details ### Training Data The model was trained on high-quality parallel corpora: - **OPUS-100** - Multilingual parallel corpus - **Europarl** - European Parliament proceedings - **UN Parallel Corpus (UNPC)** - United Nations documents ### Training Procedure - **Hardware:** NVIDIA H100 GPU - **Framework:** Hugging Face Transformers - **Batch size:** 128 - **Learning rate:** 2e-5 - **Epochs:** 3 - **Max sequence length:** 128 tokens ### Data Preprocessing - Quality filtering: Removed pairs with fewer than 5 words or more than 200 words - Length ratio filtering: Excluded pairs with extreme length ratios (< 0.5 or > 2.0) - Deduplication: Removed duplicate source sentences ## Usage ### Using the Transformers library ```python from transformers import MarianMTModel, MarianTokenizer model_name = "pumad/pumadic-en-es" tokenizer = MarianTokenizer.from_pretrained(model_name) model = MarianMTModel.from_pretrained(model_name) text = "Hello, how are you today?" inputs = tokenizer(text, return_tensors="pt", padding=True) translated = model.generate(**inputs) output = tokenizer.decode(translated[0], skip_special_tokens=True) print(output) ``` ### Using the Pipeline API ```python from transformers import pipeline translator = pipeline("translation", model="pumad/pumadic-en-es") result = translator("The quick brown fox jumps over the lazy dog.") print(result[0]['translation_text']) ``` ## Demo Try this model live at [pumatic.eu](https://pumatic.eu) API documentation available at [pumatic.eu/docs](https://pumatic.eu/docs) ## Limitations - Optimized for general-purpose translation; domain-specific terminology may vary in quality - Maximum input length of ~400 characters per chunk for optimal results - Best performance on formal/written text; colloquial expressions may be less accurate ## License Apache 2.0 ## Citation If you use this model, please cite: ```bibtex @misc{pumatic-en-es, author = {pumad}, title = {Pumatic English-Spanish Translation Model}, year = {2025}, publisher = {Hugging Face}, url = {https://huggingface.co/pumad/pumadic-en-es} } ```