Instructions to use Nexusflow/Starling-RM-34B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Nexusflow/Starling-RM-34B with Transformers:
# Load model directly from transformers import AutoTokenizer, LLMForSequenceRegression tokenizer = AutoTokenizer.from_pretrained("Nexusflow/Starling-RM-34B") model = LLMForSequenceRegression.from_pretrained("Nexusflow/Starling-RM-34B") - Notebooks
- Google Colab
- Kaggle
Could you share scripts for fast inference?
#3
by chujiezheng - opened
Thanks for your great work! I am trying to run this 34B RM but find it very slow when loaded by transformers (device_map='auto') and processing long texts (2048). Could you share scripts that enable fast inference, such as using tensor parallel?