Instructions to use openchat/openchat-3.5-0106 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openchat/openchat-3.5-0106 with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openchat/openchat-3.5-0106") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openchat/openchat-3.5-0106") model = AutoModelForCausalLM.from_pretrained("openchat/openchat-3.5-0106") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openchat/openchat-3.5-0106 with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openchat/openchat-3.5-0106" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openchat/openchat-3.5-0106", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/openchat/openchat-3.5-0106
- SGLang
How to use openchat/openchat-3.5-0106 with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openchat/openchat-3.5-0106" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openchat/openchat-3.5-0106", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openchat/openchat-3.5-0106" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openchat/openchat-3.5-0106", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use openchat/openchat-3.5-0106 with Docker Model Runner:
docker model run hf.co/openchat/openchat-3.5-0106
Train Mistral 7B 0.2
Why don't you guys train mistral 7b 0.2 which has 32k context length on long context as well as short? Long Context datasets such as:
- wckwan/M4LE
- THUDM/LongBench
- togethercomputer/Long-Data-Collections
or maybe your own long context curated ones.
Yea i agree, I was considering using this model in a mixtral-merge becuase of the scores but it would difficult considering the context constraints of only 8k. Making any other mistral model in the merge be limited to 8k despite being able to produce 32k tokens of content.
+1
I would say that the Mistral 7B V0.2 is not a pretrained model, but an instruction-tuned one, and therefore already has a bias towards the finetuning phase. For complete control over the model's performance, it is best to start from a pretrained model. That may be why.
+1
nvm
and i think they should definitely go beyond 7B parameters with openchat!
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.2 is fine-tuned on the base model mistral-7B-v0.2, which is now officially made available by Mistral AI:
mistral-7B-v0.2
- https://models.mistralcdn.com/mistral-7b-v0-2/mistral-7B-v0.2.tar (PyTorch)
- https://huggingface.co/alpindale/Mistral-7B-v0.2-hf (Safetensors)
- https://huggingface.co/bartowski/Mistral-7B-v0.2-hf-GGUF (GGUF)
I would love to see an OpenChat fine-tune based on mistral-7B-v0.2 with a 32k context length.
OpenChat team, I Depth Up-Scaled Mistral-7B-v0.2, following UpStageβs paper: SOLAR 10.7B: Scaling Large Language Models with Simple yet Effective Depth Up-Scaling, if you want to train OpenChat on a slightly bigger model.
Joseph717171/Mistral-10.7B-v0.2
- 32K Context Window
- π« Sliding Window Attention
@Joseph717171 Your too late bro, they dont care
https://huggingface.co/openchat/openchat-3.5-0106-gemma/discussions/4
Oh, well it was worth a shot. π