Instructions to use solidrust/Hermes-2-Pro-Mistral-7B-AWQ with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use solidrust/Hermes-2-Pro-Mistral-7B-AWQ with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="solidrust/Hermes-2-Pro-Mistral-7B-AWQ")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("solidrust/Hermes-2-Pro-Mistral-7B-AWQ")
model = AutoModelForCausalLM.from_pretrained("solidrust/Hermes-2-Pro-Mistral-7B-AWQ")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use solidrust/Hermes-2-Pro-Mistral-7B-AWQ with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "solidrust/Hermes-2-Pro-Mistral-7B-AWQ"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "solidrust/Hermes-2-Pro-Mistral-7B-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/solidrust/Hermes-2-Pro-Mistral-7B-AWQ

SGLang

How to use solidrust/Hermes-2-Pro-Mistral-7B-AWQ with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "solidrust/Hermes-2-Pro-Mistral-7B-AWQ" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "solidrust/Hermes-2-Pro-Mistral-7B-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "solidrust/Hermes-2-Pro-Mistral-7B-AWQ" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "solidrust/Hermes-2-Pro-Mistral-7B-AWQ",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use solidrust/Hermes-2-Pro-Mistral-7B-AWQ with Docker Model Runner:
```
docker model run hf.co/solidrust/Hermes-2-Pro-Mistral-7B-AWQ
```

update config.json vocab_size to tokenizer length

by sparsh35 - opened Mar 28, 2024

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

-1

sparsh35

Mar 28, 2024

.i.e., 32000 as for high throughput in vllm there can be sampling of padded tokens which will result in error in vllm , it is an open issue here . https://github.com/vllm-project/vllm/issues/340

update config.json vocab_size to tokenizer lengthada38480

paulml

Mar 28, 2024

The "fix" gives this error: Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/workspace/vllm/entrypoints/openai/api_server.py", line 236, in
engine = AsyncLLMEngine.from_engine_args(engine_args)
File "/workspace/vllm/engine/async_llm_engine.py", line 628, in from_engine_args
engine = cls(parallel_config.worker_use_ray,
File "/workspace/vllm/engine/async_llm_engine.py", line 321, in init
self.engine = self._init_engine(*args, **kwargs)
File "/workspace/vllm/engine/async_llm_engine.py", line 369, in _init_engine
return engine_class(*args, **kwargs)
File "/workspace/vllm/engine/llm_engine.py", line 128, in init
self._init_workers()
File "/workspace/vllm/engine/llm_engine.py", line 181, in _init_workers
self._run_workers("load_model")
File "/workspace/vllm/engine/llm_engine.py", line 1041, in _run_workers
driver_worker_output = getattr(self.driver_worker,
File "/workspace/vllm/worker/worker.py", line 100, in load_model
self.model_runner.load_model()
File "/workspace/vllm/worker/model_runner.py", line 88, in load_model
self.model = get_model(self.model_config,
File "/workspace/vllm/model_executor/utils.py", line 52, in get_model
return get_model_fn(model_config, device_config, **kwargs)
File "/workspace/vllm/model_executor/model_loader.py", line 86, in get_model
model.load_weights(model_config.model, model_config.download_dir,
File "/workspace/vllm/model_executor/models/llama.py", line 391, in load_weights
weight_loader(param, loaded_weight)
File "/workspace/vllm/model_executor/layers/vocab_parallel_embedding.py", line 88, in weight_loader
assert loaded_weight.shape[parallel_dim] == self.org_vocab_size
AssertionError

sparsh35

Mar 28, 2024

my bad

sparsh35

Mar 28, 2024

Yes it is giving error seems like they are working on it , guys from vllm, hopefully it gets merged soon.

Suparious

SolidRusT Networks org Mar 28, 2024

The model was trained with these extra 32 tokens, they seem to be related to the function calling from: https://github.com/NousResearch/Hermes-Function-Calling

Suparious changed pull request status to closed Mar 28, 2024

sparsh35

Apr 8, 2024

It is done https://github.com/vllm-project/vllm/pull/3500 here , now it is working fine as of now . Just upgrade to latest vllm repo. Putting it here for anyone who might have the same difficulty .

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment