Instructions to use Open-Orca/Mistral-7B-OpenOrca with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Open-Orca/Mistral-7B-OpenOrca with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="Open-Orca/Mistral-7B-OpenOrca")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("Open-Orca/Mistral-7B-OpenOrca")
model = AutoModelForCausalLM.from_pretrained("Open-Orca/Mistral-7B-OpenOrca")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use Open-Orca/Mistral-7B-OpenOrca with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Open-Orca/Mistral-7B-OpenOrca"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Open-Orca/Mistral-7B-OpenOrca",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/Open-Orca/Mistral-7B-OpenOrca

SGLang

How to use Open-Orca/Mistral-7B-OpenOrca with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Open-Orca/Mistral-7B-OpenOrca" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Open-Orca/Mistral-7B-OpenOrca",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Open-Orca/Mistral-7B-OpenOrca" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Open-Orca/Mistral-7B-OpenOrca",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use Open-Orca/Mistral-7B-OpenOrca with Docker Model Runner:
```
docker model run hf.co/Open-Orca/Mistral-7B-OpenOrca
```

I'm getting error : <unc> set to 0 in the tokenizer config

by Tonic - opened Oct 3, 2023

Discussion

Tonic

Oct 3, 2023

I'm having trouble with the provided tokenizer , unclear what's happening in that error. (sorry for not being more helpful!)

writerflether

Oct 3, 2023

I concur. I'm trying to load this model in text-generation-inference. Here's the stack:

2023-10-03T07:17:50.796666Z  INFO download: text_generation_launcher: Successfully downloaded weights.
2023-10-03T07:17:50.796964Z  INFO shard-manager: text_generation_launcher: Starting shard rank=0
2023-10-03T07:18:00.806182Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2023-10-03T07:18:05.539569Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 83, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 207, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 159, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 252, in get_model
    return FlashMistral(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/flash_mistral.py", line 297, in __init__
    tokenizer = LlamaTokenizerFast.from_pretrained(
  File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1854, in from_pretrained
    return cls._from_pretrained(
  File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 1886, in _from_pretrained
    slow_tokenizer = (cls.slow_tokenizer_class)._from_pretrained(
  File "/opt/conda/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2073, in _from_pretrained
    raise ValueError(
ValueError: Non-consecutive added token '<unk>' found. Should have index 32000 but has index 0 in saved vocabulary.

bleysg

OpenOrca org Oct 3, 2023

You'll need to get into whatever environment you have setup for ooba (e.g. conda) and do:

pip install git+https://github.com/huggingface/transformers

This is because support for Mistral in Transformers is not merged to PyPI yet, so you need to install from the development snapshot.

writerflether

Oct 3, 2023

Thanks, that worked for me.

I assumed that since text-generation-inference:1.1.0 has support for Mistral, that it would work out of the box. Instead I had to create a new image. eg:

FROM ghcr.io/huggingface/text-generation-inference:1.1.0

RUN apt-get update -y && \
 DEBIAN_FRONTEND=noninteractive apt-get install -y git && \
 apt-get clean && \
 rm -rf /var/lib/apt/lists/*

RUN pip3 install --no-cache-dir \
    "git+https://github.com/huggingface/transformers"

Tonic

Oct 3, 2023

is there a way to do this programmatically - yet ? (i'm trying to host it here, on hugging face)

gilnore

Oct 5, 2023

•

edited Oct 5, 2023

I seem to be still getting:
"""raise TypeError(f"{config.model_type} isn't supported yet.")

TypeError: mistral isn't supported yet."""
even after updating with the given command.

I'm just loading it through AutoTokenizer.from_pretrained

Tonic

Oct 6, 2023

normally it's been fixed, you have to set the max token lengths when you deploy

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment