Instructions to use OptimalScale/robin-7b-v2-delta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OptimalScale/robin-7b-v2-delta with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="OptimalScale/robin-7b-v2-delta")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("OptimalScale/robin-7b-v2-delta") model = AutoModelForCausalLM.from_pretrained("OptimalScale/robin-7b-v2-delta") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use OptimalScale/robin-7b-v2-delta with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OptimalScale/robin-7b-v2-delta" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OptimalScale/robin-7b-v2-delta", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/OptimalScale/robin-7b-v2-delta
- SGLang
How to use OptimalScale/robin-7b-v2-delta with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OptimalScale/robin-7b-v2-delta" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OptimalScale/robin-7b-v2-delta", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OptimalScale/robin-7b-v2-delta" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OptimalScale/robin-7b-v2-delta", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use OptimalScale/robin-7b-v2-delta with Docker Model Runner:
docker model run hf.co/OptimalScale/robin-7b-v2-delta
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("OptimalScale/robin-7b-v2-delta")
model = AutoModelForCausalLM.from_pretrained("OptimalScale/robin-7b-v2-delta")Robin Model Card
Model Details
Robin is a series of models finetuned from LLaMA on several high-quality data.
- Developed by: LMFlow
- Model type: An auto-regressive language model based on the transformer architecture.
- License: Non-commercial license
- Finetuned from model: LLaMA.
Model Sources
- Repository: https://github.com/OptimalScale/LMFlow/
- Blog: https://medium.com/@hkust.ml/robin-v2-launches-achieves-unparalleled-performance-on-openllm-4f6886e822c1
- Paper: https://arxiv.org/abs/2306.12420
- Demo: https://lmflow.com/
Uses
Robin is primarily utilized for conducting research on extensive language models and chatbots, catering to users specializing in natural language processing, machine learning, and artificial intelligence research.
How to Get Started with the Model
We provide four kinds of demos including:
- Online Service: If you don't want to run any code and just want to try our models, we deploy our instruction-tuned LLaMA you to have a try.
- Colab Chatbot (shell): An interactive shell-based chatbot for you to easily deploy a chatbot on colab.
- Colab Chatbot (web): An interactive web-based chatbot for you to easily deploy your own chatbot on colab.
- Local Deploy: We also provide a way for you to deploy your model/chatbot locally, which means you can deploy much larger model than previous three methods if you have enough resource.
Please refer to https://github.com/OptimalScale/LMFlow#demos
Training Details
Expanding upon the initial idea of self-instruct techniques, we incorporated several different data sources and build a new dataset called LMFlow Dataset. The new training split is created by merging the following datasets:
- ShareGPT: randomly sample 50K English data and 10K Chinese data from ShareGPT.
- GPT-4-LLM: 52K English data from GPT-4-LLM.
- BELLE: randomly sample 80K Chinese data from BELLE.
See more details in the "Instruction Tuning" section in our paper.
Evaluation
Robin is evaluated with LMFlow Benchmark. See more details in this paper.
Citation
If you find this repository useful, please consider giving β and citing our paper:
@misc{lmflow,
author = {Shizhe Diao and Rui Pan and Hanze Dong and KaShun Shum and Jipeng Zhang and Wei Xiong and Tong Zhang},
title = {LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://optimalscale.github.io/LMFlow/}},
}
- Downloads last month
- 1,042
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="OptimalScale/robin-7b-v2-delta")