opus-mt-mul-en-big (Core ML)

Core ML conversion of opus-mt-mul-en-big (multilingual-to-English, Marian NMT, ~0.3B parameters) for deployment on iOS and macOS.

Contents

  • OpusMT_Encoder_128.mlpackage – Marian encoder (input_ids, attention_mask β†’ hidden states)
  • OpusMT_Decoder_128.mlpackage – Marian decoder (input_ids, encoder_hidden_states, encoder_attention_mask β†’ logits)
  • tokenizer/ – Marian tokenizer (sentencepiece + config)
  • config.json – Model config
  • Optional: test.txt – Multilingual sample sentences; test_results_*.csv / test_results_*.json – Evaluation outputs

Input format

Source text must be prefixed with a language code so the model routes correctly:

>>jpn<< γ“γ‚Œγ―γƒ†γ‚Ήγƒˆγ§γ™γ€‚
>>fra<< Ceci est un test.
>>deu<< Das ist ein Test.

Use the opus-mt language code (e.g. jpn, fra, deu, eng). For unsupported languages the notebook typically falls back to eng (output is for reference only).

Usage on iOS / macOS

  1. Add the .mlpackage files and tokenizer assets to your Xcode project (or load from disk).
  2. Tokenize input with the included tokenizer; format as >>{lang}<< {text}.
  3. Run the encoder with input_ids (int32, shape [1, 128]) and attention_mask (int32, shape [1, 128]).
  4. Run the decoder in a loop: feed encoder hidden states and decoder input ids; take argmax of logits for the next token until EOS or max length.

Precision is float32 for both encoder and decoder to avoid quality loss (e.g. Korean/CJK with float16) and decoder output collapse.

Evaluation

Sample sentences and test results are provided in this repo:

  • test.txt – One sentence per line in the form N. Language name (English)γ€€Source text
  • test_results_*.csv / test_results_*.json – Batch translation results (index, code, source, translation)

License

Apache 2.0 (inherited from OPUS-MT / Marian).

Downloads last month
31
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support