does this support word-by-word detection instead of line-by-line for multi??
title :p
if not, any plans to have word-by-word for multi-lang model available?
We did some initial experiments with word by word detection and found that it yielded a worse model. The model was having trouble finding consistent granularity with some languages being word by word and some languages line level. With Chinese and Japanese it was non-obvious how to handle this, especially because those languages also often have english words mixed in.
For english we can do word level but multilingual it is a bit more complicated.
hmm i see. what about korean then?
It should be possible to do a word level korean model, but we found it was most performant for this single model multi-language approach to have a single line level prediction. Doing some languages word level and some languages line level hurt performance.