Instructions to use wolfram/miqu-1-103b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use wolfram/miqu-1-103b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="wolfram/miqu-1-103b") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("wolfram/miqu-1-103b") model = AutoModelForCausalLM.from_pretrained("wolfram/miqu-1-103b") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use wolfram/miqu-1-103b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "wolfram/miqu-1-103b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wolfram/miqu-1-103b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/wolfram/miqu-1-103b
- SGLang
How to use wolfram/miqu-1-103b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "wolfram/miqu-1-103b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wolfram/miqu-1-103b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "wolfram/miqu-1-103b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "wolfram/miqu-1-103b", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use wolfram/miqu-1-103b with Docker Model Runner:
docker model run hf.co/wolfram/miqu-1-103b
Kindly asking for quants
Kindly asking @LoneStriker or any other kind soul if you could make GGUF or EXL2 quants of this? I've made some myself but it will take days until the uploads finish, so if you get around to it earlier than that, I'd appreciate that a lot!
It's in my queue, but might take a week, because my queue is currently quite long (and the number of slow to-quantize methods has recently exploded :), so that shouldn't discourage anybopdy else. I will relatively soon publish some static quants, though. I'll write here once done.
The static quants will slowly appear at https://huggingface.co/mradermacher/miqu-1-103b-GGUF and (days later) imatrix ones at https://huggingface.co/mradermacher/miqu-1-103b-i1-GGUF
Thank you very much, @mradermacher ! I've updated the model card to link to yours. Also mentioned the new quants on Twitter/X. If you have an account there, let me know so I can link you there, too!
Thanks, linking to the model card is more than enough. To "speed" things up I started to use an old i7-2600 server to jump the queue for this model. Let's hope that llamas non-avx2 code is up to the job (and I wonder how many i-quants per day I will get out of that box). I'm having fun.
The static quants should be completed by now, and the imatrix repo has a few low-bit quants, and that old server is pumping out another quant every few hours (up to Q5_K). I guess that's it from my side. Hope @LoneStriker finds the opportunity to convert this interesting model, too.
The static quants should be completed by now, and the imatrix repo has a few low-bit quants, and that old server is pumping out another quant every few hours (up to Q5_K). I guess that's it from my side. Hope @LoneStriker finds the opportunity to convert this interesting model, too.
A few quants up:
https://huggingface.co/models?search=LoneStriker/miqu-1-103b
Thanks a lot, @mradermacher and @LoneStriker , this has been very helpful. 👍 I've updated the READMEs with links and credits (and made a quick quant announcement tweet).