long texts are not labelled to the end

by valentinbeuze - opened May 2, 2023

May 2, 2023

If I copy and paste your default text ten times ("Apple est créée le 1er avril..."), something is wrong
The last paragraphs are not labelled
Any idea? Is it related to a prefixed maximum number of words for inference?
Do I have to cut my text into blocks to use your model?
Thanks

Jean-Baptiste

Owner May 2, 2023

Hello Valentin,

There is indeed a predefined maximum number of tokens in each model. For camembert models this is around 500 tokens. This means that depending on how many tokens each word will be split, you will be limited to a certain number of words (I would guess probably around 100/200 words).
You can find models which handle more tokens but there will always be a limit.
So yes I would recommend to split your text before.

Thanks,
Jean-Baptiste

Jean-Baptiste changed discussion status to closed May 3, 2023

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment