CA byte-LM translator — en2x

A vocabulary-free, UTF-8-byte, decoder-only translator with a causal Neural-Cellular-Automaton front-end, fine-tuned (prompted continuation, prefix-LM, target-span loss) from the pretrained base sujayrittikar/ca-byte-lm-indic6 for the en2x direction. WMT-2026 Indic MT research track. Covers Assamese, Khasi, Manipuri, Meitei-Mayek, Mizo, Nyishi ↔ English — and is the only system for Khasi, Nyishi and Meitei-Mayek.

Dev chrF++ (best checkpoint @ step 60000)

language	chrF++
Assamese	14.85
Manipuri	16.06
Mizo	35.62
Khasi	25.58
Nyishi	68.76

Dev numbers are measured on the 2025 test set, which was folded into training — indicative, not held-out.

Usage

from ca_byte_lm import from_hub, translate
model, cfg, meta = from_hub("sujayrittikar/ca-byte-mt-en2x", device="cuda")
print(translate(model, cfg, meta, "The government announced a new policy.", "English", "Khasi", device="cuda"))

Weights: ca_byte_lm.pt; architecture: ca_byte_lm.py; config: config.json.

Downloads last month: 19

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support