Instructions to use Nexusflow/Starling-RM-34B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Nexusflow/Starling-RM-34B with Transformers:
# Load model directly from transformers import AutoTokenizer, LLMForSequenceRegression tokenizer = AutoTokenizer.from_pretrained("Nexusflow/Starling-RM-34B") model = LLMForSequenceRegression.from_pretrained("Nexusflow/Starling-RM-34B") - Notebooks
- Google Colab
- Kaggle
fastest inference
#2
by ehartford - opened
Hi I would like advice about the fastest way to do inference with this?
I wanna run this on 5 million samples, it seems it will take several months, unless i find a faster way.
Hi @ehartford ,
I have found Deepspeed inference to be quite good for inferencing this model which allows you to use tensor parallelism.
Here are some links to get started:
https://deepspeed.readthedocs.io/en/latest/inference-init.html
https://www.deepspeed.ai/tutorials/inference-tutorial/
Also note that it is a bit faster to have the tokenizer pad to 'longest' rather than 'max_length'.
Hope this helps!