pumadic-en-de / README.md
pumad's picture
Update README.md
2677605 verified
metadata
license: apache-2.0
language:
  - en
  - de
tags:
  - translation
  - marian
  - nmt
  - encoder-decoder
pipeline_tag: translation
widget:
  - text: The weather is beautiful today.
    example_title: Simple sentence
  - text: Machine learning is transforming the way we build software applications.
    example_title: Technical text
  - text: >-
      The European Union has proposed new regulations on artificial
      intelligence.
    example_title: Formal text
datasets:
  - opus100
  - europarl_bilingual
  - un_pc
model-index:
  - name: pumadic-en-de
    results: []

Pumatic English-German Translation Model

A neural machine translation model for English to German translation built with the MarianMT architecture.

Model Description

  • Model type: Encoder-Decoder (MarianMT architecture)
  • Language pair: English → German
  • Parameters: ~74.7M
  • GPU: H100
  • Trained by: pumad

Training Details

Training Data

The model was trained on high-quality parallel corpora:

  • OPUS-100 - Multilingual parallel corpus
  • Europarl - European Parliament proceedings
  • UN Parallel Corpus (UNPC) - United Nations documents

Training Procedure

  • Hardware: NVIDIA H100 GPU
  • Framework: Hugging Face Transformers
  • Batch size: 128
  • Learning rate: 2e-5
  • Epochs: 3
  • Max sequence length: 128 tokens

Data Preprocessing

  • Quality filtering: Removed pairs with fewer than 5 words or more than 200 words
  • Length ratio filtering: Excluded pairs with extreme length ratios (< 0.5 or > 2.0)
  • Deduplication: Removed duplicate source sentences

Usage

Using the Transformers library

from transformers import MarianMTModel, MarianTokenizer

model_name = "pumad/pumadic-en-de"
tokenizer = MarianTokenizer.from_pretrained(model_name)
model = MarianMTModel.from_pretrained(model_name)

text = "Hello, how are you today?"
inputs = tokenizer(text, return_tensors="pt", padding=True)
translated = model.generate(**inputs)
output = tokenizer.decode(translated[0], skip_special_tokens=True)
print(output)

Using the Pipeline API

from transformers import pipeline

translator = pipeline("translation", model="pumad/pumadic-en-de")
result = translator("The quick brown fox jumps over the lazy dog.")
print(result[0]['translation_text'])

Demo

Try this model live at pumatic.eu

API documentation available at pumatic.eu/docs

Limitations

  • Optimized for general-purpose translation; domain-specific terminology may vary in quality
  • Maximum input length of ~400 characters per chunk for optimal results
  • Best performance on formal/written text; colloquial expressions may be less accurate

License

Apache 2.0

Citation

If you use this model, please cite:

@misc{pumatic-en-de,
  author = {pumad},
  title = {Pumatic English-German Translation Model},
  year = {2025},
  publisher = {Hugging Face},
  url = {https://huggingface.co/pumad/pumadic-en-de}
}