does this support word-by-word detection instead of line-by-line for multi??

#3
by hnyll - opened

title :p

hnyll changed discussion title from does this support word-by-word detection instead of line-by-line ?? to does this support word-by-word detection instead of line-by-line for multi??

if not, any plans to have word-by-word for multi-lang model available?

hnyll changed discussion status to closed
hnyll changed discussion status to open
NVIDIA org

We did some initial experiments with word by word detection and found that it yielded a worse model. The model was having trouble finding consistent granularity with some languages being word by word and some languages line level. With Chinese and Japanese it was non-obvious how to handle this, especially because those languages also often have english words mixed in.

For english we can do word level but multilingual it is a bit more complicated.

hmm i see. what about korean then?

NVIDIA org

It should be possible to do a word level korean model, but we found it was most performant for this single model multi-language approach to have a single line level prediction. Doing some languages word level and some languages line level hurt performance.

emelryan changed discussion status to closed

Sign up or log in to comment