Instructions to use OptimalScale/robin-7b-v2-delta with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OptimalScale/robin-7b-v2-delta with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="OptimalScale/robin-7b-v2-delta")

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("OptimalScale/robin-7b-v2-delta")
model = AutoModelForCausalLM.from_pretrained("OptimalScale/robin-7b-v2-delta")

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use OptimalScale/robin-7b-v2-delta with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OptimalScale/robin-7b-v2-delta"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OptimalScale/robin-7b-v2-delta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/OptimalScale/robin-7b-v2-delta

SGLang

How to use OptimalScale/robin-7b-v2-delta with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OptimalScale/robin-7b-v2-delta" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OptimalScale/robin-7b-v2-delta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OptimalScale/robin-7b-v2-delta" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OptimalScale/robin-7b-v2-delta",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use OptimalScale/robin-7b-v2-delta with Docker Model Runner:
```
docker model run hf.co/OptimalScale/robin-7b-v2-delta
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Robin Model Card

Model Details

Robin is a series of models finetuned from LLaMA on several high-quality data.

Developed by: LMFlow
Model type: An auto-regressive language model based on the transformer architecture.
License: Non-commercial license
Finetuned from model: LLaMA.

Model Sources

Repository: https://github.com/OptimalScale/LMFlow/
Blog: https://medium.com/@hkust.ml/robin-v2-launches-achieves-unparalleled-performance-on-openllm-4f6886e822c1
Paper: https://arxiv.org/abs/2306.12420
Demo: https://lmflow.com/

Uses

Robin is primarily utilized for conducting research on extensive language models and chatbots, catering to users specializing in natural language processing, machine learning, and artificial intelligence research.

How to Get Started with the Model

We provide four kinds of demos including:

Online Service: If you don't want to run any code and just want to try our models, we deploy our instruction-tuned LLaMA you to have a try.
Colab Chatbot (shell): An interactive shell-based chatbot for you to easily deploy a chatbot on colab.
Colab Chatbot (web): An interactive web-based chatbot for you to easily deploy your own chatbot on colab.
Local Deploy: We also provide a way for you to deploy your model/chatbot locally, which means you can deploy much larger model than previous three methods if you have enough resource.

Please refer to https://github.com/OptimalScale/LMFlow#demos

Training Details

Expanding upon the initial idea of self-instruct techniques, we incorporated several different data sources and build a new dataset called LMFlow Dataset. The new training split is created by merging the following datasets:

ShareGPT: randomly sample 50K English data and 10K Chinese data from ShareGPT.
GPT-4-LLM: 52K English data from GPT-4-LLM.
BELLE: randomly sample 80K Chinese data from BELLE.

See more details in the "Instruction Tuning" section in our paper.

Evaluation

Robin is evaluated with LMFlow Benchmark. See more details in this paper.

Citation

If you find this repository useful, please consider giving ⭐ and citing our paper:

@misc{lmflow,
  author = {Shizhe Diao and Rui Pan and Hanze Dong and KaShun Shum and Jipeng Zhang and Wei Xiong and Tong Zhang},
  title = {LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://optimalscale.github.io/LMFlow/}},
}

Downloads last month: 223

Model tree for OptimalScale/robin-7b-v2-delta

Quantizations

1 model

Spaces using OptimalScale/robin-7b-v2-delta 41

Papers for OptimalScale/robin-7b-v2-delta

LMFlow: An Extensible Toolkit for Finetuning and Inference of Large Foundation Models

Paper • 2306.12420 • Published Jun 21, 2023 • 2

LLaMA: Open and Efficient Foundation Language Models

Paper • 2302.13971 • Published Feb 27, 2023 • 25