How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="sovthpaw/omnistep-new",
	filename="OmniStep-new-F16.gguf",
)
llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": [
				{
					"type": "text",
					"text": "Describe this image in one sentence."
				},
				{
					"type": "image_url",
					"image_url": {
						"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
					}
				}
			]
		}
	]
)

OmniStep-new

OmniStep cover

An omnimodal music generator + chat assistant. Replaces sovthpaw/omnistep-12a3b (v1 transitional baseline) as the current OmniStep in the OmniSenter family.

What it does

OmniStep-new is a single bundled model that handles:

Capability How
Chat / instruction following Qwen3-8B text backbone
Image + video understanding Cosmos multimodal heads (built into the chat template as `<
Music generation (lyrics → music) ACE-Step v1.5 turbo DiT + 1.7B LM + VAE, attached as modules
Tool / function calling <tool_call> tags in chat template, supports OpenAI-style tool schemas
Long context 40K native, YaRN-extendable to 256K+

It is not the agentic SFT variant — that's a separate LoRA on top. This is the base, ready for SFT warm-start or direct inference.

Architecture

OmniStep-new (16GB F16 GGUF bundle)
├─ Qwen3-8B text backbone     (8.2B params, 36 layers, q_norm/k_norm)
├─ Cosmos multimodal heads     (vision encoder + projector)
└─ ACE-Step music modules      (1.7B LM + DiT v1.5 turbo + VAE)

Built via Darwin Family weight-space merging (MRI-Trust Fusion) from:

  • Parent A: nvidia/Cosmos3-Nano (text body extracted)
  • Parent B: Qwen/Qwen3-8B (gen-1 SFT-merged base)
  • Genome: rho_b=0.5, tau=0.4
  • Merge script: SouthpawIN/evolutionary-training/scripts/cosmos_qwen3_darwin_merge.py

Files

File Size Notes
OmniStep-new-F16.gguf 16 GB llama.cpp F16 single-file GGUF
model-00001..7-of-7.safetensors 16 GB HF sharded safetensors
config.json <1 KB Qwen3-compatible
tokenizer.json / vocab.json / merges.txt ~16 MB Standard Qwen3 tokenizer + 26 multimodal/tool special tokens
chat_template.json ~5 KB Tool-calling + vision/video/image_pad tokens

Usage

llama.cpp (recommended — fits a single 24GB GPU)

llama-server -m OmniStep-new-F16.gguf \
  --host 127.0.0.1 --port 9080 \
  -ngl 99 -c 262144 -fa on \
  -ctk turbo4 -ctv turbo4 --no-mmap

HuggingFace transformers (text-only mode)

from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("sovthpaw/omnistep-new")
model = AutoModelForCausalLM.from_pretrained(
    "sovthpaw/omnistep-new",
    device_map="auto",
    torch_dtype="bfloat16",
)
out = model.generate(**tok("Hello, who are you?", return_tensors="pt").to("cuda"),
                     max_new_tokens=200, do_sample=True, temperature=0.7)
print(tok.decode(out[0]))

Tool calling

import json
from transformers import AutoModelForCausalLM, AutoTokenizer
tok = AutoTokenizer.from_pretrained("sovthpaw/omnistep-new")
model = AutoModelForCausalLM.from_pretrained("sovthpaw/omnistep-new", device_map="auto", torch_dtype="bfloat16")

tools = [{
    "name": "get_weather",
    "description": "Get current weather for a city",
    "parameters": {"type": "object", "properties": {"city": {"type": "string"}}, "required": ["city"]}
}]
msgs = [{"role": "user", "content": "What's the weather in Tokyo?"}]
prompt = tok.apply_chat_template(msgs, tools=tools, add_generation_prompt=True, tokenize=False)
out = model.generate(**tok(prompt, return_tensors="pt").to("cuda"), max_new_tokens=200)
print(tok.decode(out[0][len(tok.encode(prompt)):]))
# → <tool_call>{"name": "get_weather", "arguments": {"city": "Tokyo"}}</tool_call>

What's next

This is gen 0. The roadmap:

  1. omnistep-new (this repo) — Darwin-merged 8B base + Cosmos heads + ACE-Step music
  2. omnistep-sft-8b — QLoRA agentic SFT (Stage 1, finished 3954/3954 steps, loss 0.2352)
  3. omnistep-new-merged — SFT adapter merged into this base for the deployable chat agent
  4. senter-ohm-32a8b — sparse-upcycled 32A8B MoE (Stage 3) with this as the base expert

Provenance

  • Built with SouthpawIN/evolutionary-training
  • Merge script: scripts/cosmos_qwen3_darwin_merge.py
  • Date: 2026-06-22
  • Naming: per the-omni-family.mdOmniStep = multimodal native + music + agentic backbone. This is the multimodal + music half; the agentic SFT is the LoRA adapter on top.

TOWARDS SELF-IMPROVEMENT — Chris (via Nous Girl), 2026-06-22

Downloads last month
82
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support