Instructions to use openchat/openchat-3.5-0106 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openchat/openchat-3.5-0106 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openchat/openchat-3.5-0106")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("openchat/openchat-3.5-0106")
model = AutoModelForCausalLM.from_pretrained("openchat/openchat-3.5-0106")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openchat/openchat-3.5-0106 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openchat/openchat-3.5-0106"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openchat/openchat-3.5-0106",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/openchat/openchat-3.5-0106

SGLang

How to use openchat/openchat-3.5-0106 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openchat/openchat-3.5-0106" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openchat/openchat-3.5-0106",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openchat/openchat-3.5-0106" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openchat/openchat-3.5-0106",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use openchat/openchat-3.5-0106 with Docker Model Runner:
```
docker model run hf.co/openchat/openchat-3.5-0106
```

Train Mistral 7B 0.2

by mosama - opened Jan 10, 2024

Discussion

mosama

Jan 10, 2024

Why don't you guys train mistral 7b 0.2 which has 32k context length on long context as well as short? Long Context datasets such as:

wckwan/M4LE
THUDM/LongBench
togethercomputer/Long-Data-Collections

or maybe your own long context curated ones.

mosama changed discussion title from Trainv Mistral 7B 0.2 to Train Mistral 7B 0.2 Jan 10, 2024

rombodawg

Jan 10, 2024

Yea i agree, I was considering using this model in a mixtral-merge becuase of the scores but it would difficult considering the context constraints of only 8k. Making any other mistral model in the merge be limited to 8k despite being able to produce 32k tokens of content.

carlos447

Jan 12, 2024

Citaman

Jan 15, 2024

I would say that the Mistral 7B V0.2 is not a pretrained model, but an instruction-tuned one, and therefore already has a bias towards the finetuning phase. For complete control over the model's performance, it is best to start from a pretrained model. That may be why.

riddlechen

Jan 17, 2024

SvCy

Jan 22, 2024

•

edited Jan 22, 2024

~~nvm~~
and i think they should definitely go beyond 7B parameters with openchat!

count-zero

Mar 27, 2024

•

edited Mar 27, 2024

https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 is fine-tuned on the base model mistral-7B-v0.2, which is now officially made available by Mistral AI:

mistral-7B-v0.2

https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar (PyTorch)
https://huggingface.co/alpindale/Mistral-7B-v0.2-hf (Safetensors)
https://huggingface.co/bartowski/Mistral-7B-v0.2-hf-GGUF (GGUF)

I would love to see an OpenChat fine-tune based on mistral-7B-v0.2 with a 32k context length.

Joseph717171

Apr 3, 2024

•

edited Apr 3, 2024

OpenChat team, I Depth Up-Scaled Mistral-7B-v0.2, following UpStage’s paper: SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling, if you want to train OpenChat on a slightly bigger model.

Joseph717171/Mistral-10.7B-v0.2

32K Context Window
🚫 Sliding Window Attention

rombodawg

Apr 3, 2024

@Joseph717171 Your too late bro, they dont care

https://huggingface.co/openchat/openchat-3.5-0106-gemma/discussions/4

Joseph717171

Apr 3, 2024

Oh, well it was worth a shot. 😁

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment