Bimba / README.md
OldestSalt's picture
Update README.md
1134ea7 verified
|
Raw
History Blame Contribute Delete
1.42 kB
metadata
license: apache-2.0
pipeline_tag: translation
tags:
  - SimulMT
  - Mamba-2
  - Cross-attention
  - en-ru
datasets:
  - OldestSalt/translation_enru
language:
  - en
  - ru

Bimba

Bimba is almost linear SimulMT model trained with wait-k policy (k = 3, 5, 7, 9, 11) on en-ru translation dataset.

Architecture

The model has encoder-decoder architecture, where self-attention blocks are Mamba-2 blocks instead. It means that encoder is linear, but cross-attention's input is all outputs of encoder, and this means that complexity of Bimba is O(S * T), which is not exactly linear

Bimba inference

Bimba was developed and trained as a part of master's thesis, and I hope that I will continue research in the Linear SimulMT field.

Using

To download Bimba you can clone the GitHub repository and use the HybridMamba2MT class:

from model_classes import HybridMamba2MT
from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("OldestSalt/Bimba")
model = HybridMamba2MT.from_pretrained("OldestSalt/Bimba")

Translation

Maybe someday I will write here an example of simultaneous translation.

Tokenizer

This model was distilled from NLLB-200-1.3B, so Bimba uses its' tokenizer.