does this support word-by-word detection instead of line-by-line for multi??

by hnyll - opened 19 days ago

title :p

hnyll changed discussion title from does this support word-by-word detection instead of line-by-line ?? to does this support word-by-word detection instead of line-by-line for multi?? 19 days ago

hnyll

19 days ago

if not, any plans to have word-by-word for multi-lang model available?

hnyll changed discussion status to closed 19 days ago

hnyll changed discussion status to open 19 days ago

emelryan

NVIDIA org 18 days ago

We did some initial experiments with word by word detection and found that it yielded a worse model. The model was having trouble finding consistent granularity with some languages being word by word and some languages line level. With Chinese and Japanese it was non-obvious how to handle this, especially because those languages also often have english words mixed in.

For english we can do word level but multilingual it is a bit more complicated.

hnyll

18 days ago

hmm i see. what about korean then?

emelryan

NVIDIA org 17 days ago

It should be possible to do a word level korean model, but we found it was most performant for this single model multi-language approach to have a single line level prediction. Doing some languages word level and some languages line level hurt performance.

emelryan changed discussion status to closed 15 days ago

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment