opus-mt-en-sla-ct2

CTranslate2 int8 quantized conversion of Helsinki-NLP/opus-mt-en-sla, packaged for offline use in Playto.

Purpose

Lightweight en→pl (Polish (via Slavic multi)) NMT for the Playto desktop game-language-learning tool, used as a low-spec / no-GPU fallback when local LLM inference is not available.

Source model

  • Upstream: Helsinki-NLP/opus-mt-en-sla — MarianMT, transformer architecture
  • License: CC-BY 4.0
  • No fine-tuning, no weight modification — purely a format conversion + int8 quantization of the upstream weights

Conversion

pip install ctranslate2 transformers sentencepiece
ct2-transformers-converter \
  --model Helsinki-NLP/opus-mt-en-sla \
  --output_dir opus-mt-en-sla \
  --quantization int8 \
  --copy_files source.spm target.spm
tar czf opus-mt-en-sla-ct2.tar.gz opus-mt-en-sla-ct2

File layout (inside opus-mt-en-sla-ct2.tar.gz)

File Approx Size Purpose
model.bin ~59MB CTranslate2 int8 quantized weights
shared_vocabulary.json ~2.5MB CTranslate2 vocab
source.spm ~800 KB SentencePiece source tokenizer
target.spm ~800 KB SentencePiece target tokenizer
config.json ~250 B CTranslate2 config

Usage

With ctranslate2 (Python)

import ctranslate2
import sentencepiece

translator = ctranslate2.Translator("opus-mt-en-sla", device="cpu", compute_type="int8")
sp_source = sentencepiece.SentencePieceProcessor("opus-mt-en-sla/source.spm")
sp_target = sentencepiece.SentencePieceProcessor("opus-mt-en-sla/target.spm")

source_tokens = sp_source.encode(">>pol<< Hello, how are you?", out_type=str) + ["</s>"]
results = translator.translate_batch([source_tokens])
print(sp_target.decode(results[0].hypotheses[0]))
# → "Cześć, jak się masz?"

With ct2rs (Rust)

use ct2rs::{Translator, Tokenizer};

let tokenizer = Tokenizer::new("opus-mt-en-sla")?;
let translator = Translator::with_tokenizer("opus-mt-en-sla", tokenizer, /* config */)?;
let result = translator.translate_batch(&[">>pol<< Hello, how are you?".to_string()], /* options */)?;

Important: MarianMT models require </s> appended to source token sequences. The ct2rs::Tokenizer wrapper handles this automatically; raw SentencePiece calls must add it manually.

Quality

~50% Good rate on Playto fixture corpus when invoked with >>pol<< language prefix on input. Without prefix, model produces Croatian/Serbian (= default Slavic). Playto's NMT pipeline auto-prepends the prefix per manifest config.

For higher-quality translation, Playto's LLM-based translation path is recommended.

Attribution

This is a derivative work of Helsinki-NLP/opus-mt-en-sla. License is CC-BY 4.0 inherited from upstream.

Helsinki-NLP. OPUS-MT — Open Machine Translation Models.
https://github.com/Helsinki-NLP/Opus-MT

Disclaimer

NMT output quality is significantly lower than modern LLM-based translation. This model is intended as a lightweight fallback for environments where LLM inference is not viable (= low VRAM, mobile, slow connection). For higher-quality translation, Playto's LLM-based translation path is recommended.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for playto-mt/opus-mt-en-sla-ct2

Finetuned
(1)
this model