Instructions to use sovthpaw/omnistep-new with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use sovthpaw/omnistep-new with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="sovthpaw/omnistep-new") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("sovthpaw/omnistep-new") model = AutoModelForCausalLM.from_pretrained("sovthpaw/omnistep-new") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - llama-cpp-python
How to use sovthpaw/omnistep-new with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="sovthpaw/omnistep-new", filename="OmniStep-new-F16.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- llama.cpp
How to use sovthpaw/omnistep-new with llama.cpp:
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh # Start a local OpenAI-compatible server with a web UI: llama serve -hf sovthpaw/omnistep-new:F16 # Run inference directly in the terminal: llama cli -hf sovthpaw/omnistep-new:F16
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama serve -hf sovthpaw/omnistep-new:F16 # Run inference directly in the terminal: llama cli -hf sovthpaw/omnistep-new:F16
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf sovthpaw/omnistep-new:F16 # Run inference directly in the terminal: ./llama-cli -hf sovthpaw/omnistep-new:F16
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf sovthpaw/omnistep-new:F16 # Run inference directly in the terminal: ./build/bin/llama-cli -hf sovthpaw/omnistep-new:F16
Use Docker
docker model run hf.co/sovthpaw/omnistep-new:F16
- LM Studio
- Jan
- vLLM
How to use sovthpaw/omnistep-new with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "sovthpaw/omnistep-new" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sovthpaw/omnistep-new", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/sovthpaw/omnistep-new:F16
- SGLang
How to use sovthpaw/omnistep-new with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "sovthpaw/omnistep-new" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sovthpaw/omnistep-new", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "sovthpaw/omnistep-new" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "sovthpaw/omnistep-new", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Ollama
How to use sovthpaw/omnistep-new with Ollama:
ollama run hf.co/sovthpaw/omnistep-new:F16
- Unsloth Studio
How to use sovthpaw/omnistep-new with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sovthpaw/omnistep-new to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for sovthpaw/omnistep-new to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for sovthpaw/omnistep-new to start chatting
- Pi
How to use sovthpaw/omnistep-new with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf sovthpaw/omnistep-new:F16
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "sovthpaw/omnistep-new:F16" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use sovthpaw/omnistep-new with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama serve -hf sovthpaw/omnistep-new:F16
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default sovthpaw/omnistep-new:F16
Run Hermes
hermes
- Atomic Chat new
- Docker Model Runner
How to use sovthpaw/omnistep-new with Docker Model Runner:
docker model run hf.co/sovthpaw/omnistep-new:F16
- Lemonade
How to use sovthpaw/omnistep-new with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull sovthpaw/omnistep-new:F16
Run and chat with the model
lemonade run user.omnistep-new-F16
List all available models
lemonade list
llm.create_chat_completion(
messages = [
{
"role": "user",
"content": [
{
"type": "text",
"text": "Describe this image in one sentence."
},
{
"type": "image_url",
"image_url": {
"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
}
}
]
}
]
)OmniStep-new
An omnimodal music generator + chat assistant. Replaces sovthpaw/omnistep-12a3b (v1 transitional baseline) as the current OmniStep in the OmniSenter family.
What it does
OmniStep-new is a single bundled model that handles:
| Capability | How |
|---|---|
| Chat / instruction following | Qwen3-8B text backbone |
| Image + video understanding | Cosmos multimodal heads (built into the chat template as `< |
| Music generation (lyrics → music) | ACE-Step v1.5 turbo DiT + 1.7B LM + VAE, attached as modules |
| Tool / function calling | <tool_call> tags in chat template, supports OpenAI-style tool schemas |
| Long context | 40K native, YaRN-extendable to 256K+ |
It is not the agentic SFT variant — that's a separate LoRA on top. This is the base, ready for SFT warm-start or direct inference.
Architecture
OmniStep-new (16GB F16 GGUF bundle)
├─ Qwen3-8B text backbone (8.2B params, 36 layers, q_norm/k_norm)
├─ Cosmos multimodal heads (vision encoder + projector)
└─ ACE-Step music modules (1.7B LM + DiT v1.5 turbo + VAE)
Built via Darwin Family weight-space merging (MRI-Trust Fusion) from:
- Parent A:
nvidia/Cosmos3-Nano(text body extracted) - Parent B:
Qwen/Qwen3-8B(gen-1 SFT-merged base) - Genome: rho_b=0.5, tau=0.4
- Merge script:
SouthpawIN/evolutionary-training/scripts/cosmos_qwen3_darwin_merge.py
Files
| File | Size | Notes |
|---|---|---|
OmniStep-new-F16.gguf |
16 GB | llama.cpp F16 single-file GGUF |
model-00001..7-of-7.safetensors |
16 GB | HF sharded safetensors |
config.json |
<1 KB | Qwen3-compatible |
tokenizer.json / vocab.json / merges.txt |
~16 MB | Standard Qwen3 tokenizer + 26 multimodal/tool special tokens |
chat_template.json |
~5 KB | Tool-calling + vision/video/image_pad tokens |
Usage
llama.cpp (recommended — fits a single 24GB GPU)
llama-server -m OmniStep-new-F16.gguf \
--host 127.0.0.1 --port 9080 \
-ngl 99 -c 262144 -fa on \
-ctk turbo4 -ctv turbo4 --no-mmap
HuggingFace transformers (text-only mode)
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("sovthpaw/omnistep-new")
model = AutoModelForCausalLM.from_pretrained(
"sovthpaw/omnistep-new",
device_map="auto",
torch_dtype="bfloat16",
)
out = model.generate(**tok("Hello, who are you?", return_tensors="pt").to("cuda"),
max_new_tokens=200, do_sample=True, temperature=0.7)
print(tok.decode(out[0]))
Tool calling
import json
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("sovthpaw/omnistep-new")
model = AutoModelForCausalLM.from_pretrained("sovthpaw/omnistep-new", device_map="auto", torch_dtype="bfloat16")
tools = [{
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
}]
msgs = [{"role": "user", "content": "What's the weather in Tokyo?"}]
prompt = tok.apply_chat_template(msgs, tools=tools, add_generation_prompt=True, tokenize=False)
out = model.generate(**tok(prompt, return_tensors="pt").to("cuda"), max_new_tokens=200)
print(tok.decode(out[0][len(tok.encode(prompt)):]))
# → <tool_call>{"name": "get_weather", "arguments": {"city": "Tokyo"}}</tool_call>
What's next
This is gen 0. The roadmap:
- ✅ omnistep-new (this repo) — Darwin-merged 8B base + Cosmos heads + ACE-Step music
- ⏳ omnistep-sft-8b — QLoRA agentic SFT (Stage 1, finished 3954/3954 steps, loss 0.2352)
- ⏳ omnistep-new-merged — SFT adapter merged into this base for the deployable chat agent
- ⏳ senter-ohm-32a8b — sparse-upcycled 32A8B MoE (Stage 3) with this as the base expert
Provenance
- Built with
SouthpawIN/evolutionary-training - Merge script:
scripts/cosmos_qwen3_darwin_merge.py - Date: 2026-06-22
- Naming: per
the-omni-family.md— OmniStep = multimodal native + music + agentic backbone. This is the multimodal + music half; the agentic SFT is the LoRA adapter on top.
TOWARDS SELF-IMPROVEMENT — Chris (via Nous Girl), 2026-06-22
- Downloads last month
- 82

# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="sovthpaw/omnistep-new", filename="OmniStep-new-F16.gguf", )