Instructions to use togethercomputer/m2-bert-80M-32k-retrieval with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use togethercomputer/m2-bert-80M-32k-retrieval with Transformers:
# Load model directly from transformers import AutoModelForSequenceClassification model = AutoModelForSequenceClassification.from_pretrained("togethercomputer/m2-bert-80M-32k-retrieval", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
What server configuration is required to run this model?
#2
by hongdouzi - opened
I have max_sql_length=5000, but I still experience memory overflow when inputting text of the corresponding length.
My memory is about 32GB, and the graphics memory is approximately 24GB.
Do you use the model in inference mode or with torch.no_grad() decorator when inferencing?
You want to set it up with the Endpoint's Available on the hugging space hub.. And but they do charge whenever you have it running!
You might be able to get away with it but honestly unless your running super fast gpus then your better off running with endpoints or google Collab or run pod!