Instructions to use dicta-il/dictabert-morph with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use dicta-il/dictabert-morph with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("feature-extraction", model="dicta-il/dictabert-morph", trust_remote_code=True)# Load model directly from transformers import AutoTokenizer, AutoModel tokenizer = AutoTokenizer.from_pretrained("dicta-il/dictabert-morph", trust_remote_code=True) model = AutoModel.from_pretrained("dicta-il/dictabert-morph", trust_remote_code=True) - Notebooks
- Google Colab
- Kaggle
token limit - warning
Hi
as mentioned in Issue #2
"the model can't handle inputs of longer than 512 tokens "
is there a warning that i can get in cases i exceed the limit?
i split to sentences and in most cases its well withing the limit, but there are exceptions - any way to flag these exceptions before i run the "dictabert-morph" model ?
maybe running the tokenizer only (without the morphology) and is i reach 512 tokens i knpow i probably need to split before runing the morph model?
Right now the code automatically truncates the sentence to 512 tokens, if it exceeds the length.
A good solution would be to run the tokenizer on its own and see if the number tokens exceed 512 tokens.
Alternatively, if you have a preferred way which would need to be added into the interface, feel free to make the modifications and open a PR, we welcome contributions :)