Instructions to use sethuiyer/SynthIQ-7b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sethuiyer/SynthIQ-7b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="sethuiyer/SynthIQ-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("sethuiyer/SynthIQ-7b")
model = AutoModelForCausalLM.from_pretrained("sethuiyer/SynthIQ-7b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use sethuiyer/SynthIQ-7b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sethuiyer/SynthIQ-7b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sethuiyer/SynthIQ-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/sethuiyer/SynthIQ-7b

SGLang

How to use sethuiyer/SynthIQ-7b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sethuiyer/SynthIQ-7b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sethuiyer/SynthIQ-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sethuiyer/SynthIQ-7b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sethuiyer/SynthIQ-7b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use sethuiyer/SynthIQ-7b with Docker Model Runner:
```
docker model run hf.co/sethuiyer/SynthIQ-7b
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

SynthIQ

This is SynthIQ, rated 92.23/100 by GPT-4 across varied complex prompts. I used mergekit to merge models.

Benchmark Name	Score
ARC	65.87
HellaSwag	85.82
MMLU	64.75
TruthfulQA	57.00
Winogrande	78.69
GSM8K	64.06
AGIEval	42.67
GPT4All	73.71
Bigbench	44.59

Update - 19/01/2024

Tested to work well with autogen and CrewAI

GGUF Files

Q4_K_M - medium, balanced quality - recommended

Q_6_K - very large, extremely low quality loss

Q8_0 - very large, extremely low quality loss - not recommended

Important Update: SynthIQ is now available on Ollama. You can use it by running the command ollama run stuehieyr/synthiq in your terminal. If you have limited computing resources, check out this video to learn how to run it on a Google Colab backend.

Yaml Config


slices:
  - sources:
      - model: Weyaxi/OpenHermes-2.5-neural-chat-v3-3-openchat-3.5-1210-Slerp
        layer_range: [0, 32]
      - model: uukuguy/speechless-mistral-six-in-one-7b
        layer_range: [0, 32]

merge_method: slerp
base_model: mistralai/Mistral-7B-v0.1

parameters:
  t:
    - filter: self_attn
      value: [0, 0.5, 0.3, 0.7, 1]
    - filter: mlp
      value: [1, 0.5, 0.7, 0.3, 0]
    - value: 0.5 # fallback for rest of tensors
tokenizer_source: union

dtype: bfloat16

Prompt template: ChatML

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

License is LLama2 license as uukuguy/speechless-mistral-six-in-one-7b is llama2 license.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Nous Benchmark Evalation Results

Detailed results can be found here

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	69.37
AI2 Reasoning Challenge (25-Shot)	65.87
HellaSwag (10-Shot)	85.82
MMLU (5-Shot)	64.75
TruthfulQA (0-shot)	57.00
Winogrande (5-shot)	78.69
GSM8k (5-shot)	64.06