Instructions to use michaelfeil/ct2fast-starchat-alpha with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use michaelfeil/ct2fast-starchat-alpha with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("michaelfeil/ct2fast-starchat-alpha", dtype="auto") - Notebooks
- Google Colab
- Kaggle
Some other quantizations
#1
by localAGI - opened
Hey, any chance you add a fp16 variant of the model?
Does it make any difference in executing?
I am running on GPU. Afaik fp16 model would be around 28G, so should do nicely with 80-90% offloading to a 24GVram card.
Might be able to do it.
Just not sure, if a partial offloading is supported with Ctranslate2, and I am also not sure for which reason you would want to load in fp16. fp16 would be 32GB also