Instructions to use Jean-Baptiste/camembert-ner with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Jean-Baptiste/camembert-ner with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="Jean-Baptiste/camembert-ner")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("Jean-Baptiste/camembert-ner") model = AutoModelForTokenClassification.from_pretrained("Jean-Baptiste/camembert-ner") - Inference
- Notebooks
- Google Colab
- Kaggle
long texts are not labelled to the end
If I copy and paste your default text ten times ("Apple est créée le 1er avril..."), something is wrong
The last paragraphs are not labelled
Any idea? Is it related to a prefixed maximum number of words for inference?
Do I have to cut my text into blocks to use your model?
Thanks
Hello Valentin,
There is indeed a predefined maximum number of tokens in each model. For camembert models this is around 500 tokens. This means that depending on how many tokens each word will be split, you will be limited to a certain number of words (I would guess probably around 100/200 words).
You can find models which handle more tokens but there will always be a limit.
So yes I would recommend to split your text before.
Thanks,
Jean-Baptiste