Instructions to use sovthpaw/omnistep-new with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use sovthpaw/omnistep-new with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="sovthpaw/omnistep-new")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("sovthpaw/omnistep-new")
model = AutoModelForCausalLM.from_pretrained("sovthpaw/omnistep-new")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

llama-cpp-python

How to use sovthpaw/omnistep-new with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="sovthpaw/omnistep-new",
	filename="OmniStep-new-F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps Settings

llama.cpp

How to use sovthpaw/omnistep-new with llama.cpp:

Install (macOS, Linux)

curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf sovthpaw/omnistep-new:F16
# Run inference directly in the terminal:
llama cli -hf sovthpaw/omnistep-new:F16

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf sovthpaw/omnistep-new:F16
# Run inference directly in the terminal:
llama cli -hf sovthpaw/omnistep-new:F16

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf sovthpaw/omnistep-new:F16
# Run inference directly in the terminal:
./llama-cli -hf sovthpaw/omnistep-new:F16

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf sovthpaw/omnistep-new:F16
# Run inference directly in the terminal:
./build/bin/llama-cli -hf sovthpaw/omnistep-new:F16

Use Docker

docker model run hf.co/sovthpaw/omnistep-new:F16

LM Studio
Jan

vLLM

How to use sovthpaw/omnistep-new with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "sovthpaw/omnistep-new"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sovthpaw/omnistep-new",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/sovthpaw/omnistep-new:F16

SGLang

How to use sovthpaw/omnistep-new with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "sovthpaw/omnistep-new" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sovthpaw/omnistep-new",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "sovthpaw/omnistep-new" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "sovthpaw/omnistep-new",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Ollama
How to use sovthpaw/omnistep-new with Ollama:
```
ollama run hf.co/sovthpaw/omnistep-new:F16
```

Unsloth Studio

How to use sovthpaw/omnistep-new with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sovthpaw/omnistep-new to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for sovthpaw/omnistep-new to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for sovthpaw/omnistep-new to start chatting

How to use sovthpaw/omnistep-new with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf sovthpaw/omnistep-new:F16

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "sovthpaw/omnistep-new:F16"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use sovthpaw/omnistep-new with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama serve -hf sovthpaw/omnistep-new:F16

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default sovthpaw/omnistep-new:F16

Run Hermes

hermes

Atomic Chat new
Docker Model Runner
How to use sovthpaw/omnistep-new with Docker Model Runner:
```
docker model run hf.co/sovthpaw/omnistep-new:F16
```

Lemonade

How to use sovthpaw/omnistep-new with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull sovthpaw/omnistep-new:F16

Run and chat with the model

lemonade run user.omnistep-new-F16

List all available models

lemonade list

OmniStep-new

An omnimodal music generator + chat assistant. Replaces sovthpaw/omnistep-12a3b (v1 transitional baseline) as the current OmniStep in the OmniSenter family.

What it does

OmniStep-new is a single bundled model that handles:

Capability	How
Chat / instruction following	Qwen3-8B text backbone
Image + video understanding	Cosmos multimodal heads (built into the chat template as `<
Music generation (lyrics → music)	ACE-Step v1.5 turbo DiT + 1.7B LM + VAE, attached as modules
Tool / function calling	`<tool_call>` tags in chat template, supports OpenAI-style tool schemas
Long context	40K native, YaRN-extendable to 256K+

It is not the agentic SFT variant — that's a separate LoRA on top. This is the base, ready for SFT warm-start or direct inference.

Architecture

OmniStep-new (16GB F16 GGUF bundle)
├─ Qwen3-8B text backbone     (8.2B params, 36 layers, q_norm/k_norm)
├─ Cosmos multimodal heads     (vision encoder + projector)
└─ ACE-Step music modules      (1.7B LM + DiT v1.5 turbo + VAE)

Built via Darwin Family weight-space merging (MRI-Trust Fusion) from:

Parent A: nvidia/Cosmos3-Nano (text body extracted)
Parent B: Qwen/Qwen3-8B (gen-1 SFT-merged base)
Genome: rho_b=0.5, tau=0.4
Merge script: SouthpawIN/evolutionary-training/scripts/cosmos_qwen3_darwin_merge.py

Files

File	Size	Notes
`OmniStep-new-F16.gguf`	16 GB	llama.cpp F16 single-file GGUF
`model-00001..7-of-7.safetensors`	16 GB	HF sharded safetensors
`config.json`	<1 KB	Qwen3-compatible
`tokenizer.json` / `vocab.json` / `merges.txt`	~16 MB	Standard Qwen3 tokenizer + 26 multimodal/tool special tokens
`chat_template.json`	~5 KB	Tool-calling + vision/video/image_pad tokens

Usage

llama.cpp (recommended — fits a single 24GB GPU)

llama-server -m OmniStep-new-F16.gguf \
  --host 127.0.0.1 --port 9080 \
  -ngl 99 -c 262144 -fa on \
  -ctk turbo4 -ctv turbo4 --no-mmap

HuggingFace transformers (text-only mode)

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("sovthpaw/omnistep-new")
model = AutoModelForCausalLM.from_pretrained(
    "sovthpaw/omnistep-new",
    device_map="auto",
    torch_dtype="bfloat16",
)
out = model.generate(**tok("Hello, who are you?", return_tensors="pt").to("cuda"),
                     max_new_tokens=200, do_sample=True, temperature=0.7)
print(tok.decode(out[0]))

Tool calling

import json
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("sovthpaw/omnistep-new")
model = AutoModelForCausalLM.from_pretrained("sovthpaw/omnistep-new", device_map="auto", torch_dtype="bfloat16")

tools = [{
    "name": "get_weather",
    "description": "Get current weather for a city",
    "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
}]
msgs = [{"role": "user", "content": "What's the weather in Tokyo?"}]
prompt = tok.apply_chat_template(msgs, tools=tools, add_generation_prompt=True, tokenize=False)
out = model.generate(**tok(prompt, return_tensors="pt").to("cuda"), max_new_tokens=200)
print(tok.decode(out[0][len(tok.encode(prompt)):]))
# → <tool_call>{"name": "get_weather", "arguments": {"city": "Tokyo"}}</tool_call>

What's next

This is gen 0. The roadmap:

✅ omnistep-new (this repo) — Darwin-merged 8B base + Cosmos heads + ACE-Step music
⏳ omnistep-sft-8b — QLoRA agentic SFT (Stage 1, finished 3954/3954 steps, loss 0.2352)
⏳ omnistep-new-merged — SFT adapter merged into this base for the deployable chat agent
⏳ senter-ohm-32a8b — sparse-upcycled 32A8B MoE (Stage 3) with this as the base expert

Provenance

Built with SouthpawIN/evolutionary-training
Merge script: scripts/cosmos_qwen3_darwin_merge.py
Date: 2026-06-22
Naming: per the-omni-family.md — OmniStep = multimodal native + music + agentic backbone. This is the multimodal + music half; the agentic SFT is the LoRA adapter on top.

TOWARDS SELF-IMPROVEMENT — Chris (via Nous Girl), 2026-06-22

Downloads last month: 82

Safetensors

Model size

8B params

Tensor type

BF16