Instructions to use wolfram/miqu-1-103b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use wolfram/miqu-1-103b with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="wolfram/miqu-1-103b")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("wolfram/miqu-1-103b")
model = AutoModelForCausalLM.from_pretrained("wolfram/miqu-1-103b", device_map="auto")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use wolfram/miqu-1-103b with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "wolfram/miqu-1-103b"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wolfram/miqu-1-103b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/wolfram/miqu-1-103b

SGLang

How to use wolfram/miqu-1-103b with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "wolfram/miqu-1-103b" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wolfram/miqu-1-103b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "wolfram/miqu-1-103b" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "wolfram/miqu-1-103b",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use wolfram/miqu-1-103b with Docker Model Runner:
```
docker model run hf.co/wolfram/miqu-1-103b
```

Kindly asking for quants

by wolfram - opened Feb 26, 2024

Discussion

wolfram

Owner Feb 26, 2024

Kindly asking @LoneStriker or any other kind soul if you could make GGUF or EXL2 quants of this? I've made some myself but it will take days until the uploads finish, so if you get around to it earlier than that, I'd appreciate that a lot!

mradermacher

Feb 27, 2024

It's in my queue, but might take a week, because my queue is currently quite long (and the number of slow to-quantize methods has recently exploded :), so that shouldn't discourage anybopdy else. I will relatively soon publish some static quants, though. I'll write here once done.

mradermacher

Feb 27, 2024

The static quants will slowly appear at https://huggingface.co/mradermacher/miqu-1-103b-GGUF and (days later) imatrix ones at https://huggingface.co/mradermacher/miqu-1-103b-i1-GGUF

wolfram

Owner Feb 27, 2024

•

edited Feb 27, 2024

Thank you very much, @mradermacher ! I've updated the model card to link to yours. Also mentioned the new quants on Twitter/X. If you have an account there, let me know so I can link you there, too!

mradermacher

Feb 28, 2024

Thanks, linking to the model card is more than enough. To "speed" things up I started to use an old i7-2600 server to jump the queue for this model. Let's hope that llamas non-avx2 code is up to the job (and I wonder how many i-quants per day I will get out of that box). I'm having fun.

mradermacher

Feb 29, 2024

The static quants should be completed by now, and the imatrix repo has a few low-bit quants, and that old server is pumping out another quant every few hours (up to Q5_K). I guess that's it from my side. Hope @LoneStriker finds the opportunity to convert this interesting model, too.

LoneStriker

Mar 1, 2024

The static quants should be completed by now, and the imatrix repo has a few low-bit quants, and that old server is pumping out another quant every few hours (up to Q5_K). I guess that's it from my side. Hope @LoneStriker finds the opportunity to convert this interesting model, too.

A few quants up:
https://huggingface.co/models?search=LoneStriker/miqu-1-103b

wolfram

Owner Mar 2, 2024

Thanks a lot, @mradermacher and @LoneStriker , this has been very helpful. 👍 I've updated the READMEs with links and credits (and made a quick quant announcement tweet).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment