How to use from
llama.cpp
Install (macOS, Linux)
curl -LsSf https://llama.app/install.sh | sh
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf IDEAHQ/ava-nautilus:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf IDEAHQ/ava-nautilus:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama serve -hf IDEAHQ/ava-nautilus:Q4_K_M
# Run inference directly in the terminal:
llama cli -hf IDEAHQ/ava-nautilus:Q4_K_M
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf IDEAHQ/ava-nautilus:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf IDEAHQ/ava-nautilus:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf IDEAHQ/ava-nautilus:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf IDEAHQ/ava-nautilus:Q4_K_M
Use Docker
docker model run hf.co/IDEAHQ/ava-nautilus:Q4_K_M
Quick Links

Tali Nautilus

Tali Nautilus is the reasoning-focused on-device LLM family for TaliOS.

What it does

Where Tali Storm handles general dialogue and routine command interpretation, Nautilus is the model TaliOS reaches for when a user request requires multi-step reasoning or looking at images:

  • Multi-step planning — "find every email from the legal team about contract X this quarter, and summarize the open items" requires Nautilus to plan, dispatch sub-actions through TaliOS, and reason over the results.
  • Vision-language — "what is highlighted on the screen and what should I do next?" requires the VL variant to read pixels, not just accessibility metadata.

Where it sits in TaliOS

Nautilus runs only when Storm (the cheaper general-purpose LLM) signals it cannot complete the task. The TaliOS runtime escalates from Storm → Nautilus on a best-effort heuristic; the user never selects a model manually.

User speech → Tali STT → text → Tali NLU → vector match
                                     │
                                     ▼ (NLU miss)
                               Tali Storm → action / reply
                                     │
                                     ▼ (needs reasoning or vision)
                               Tali Nautilus → action / reply

Variants

Tali ID Active Params Architecture Target
TALI-NAUTILUS-4B 4B Dense Phone
TALI-NAUTILUS-9B 9B Dense Desktop / tablet
TALI-NAUTILUS-30B-A3B 3B active Mixture-of-Experts Phone (flagship)
TALI-NAUTILUS-120B-A12B 12B active Mixture-of-Experts Desktop
TALI-NAUTILUS-CASCADE 3B active MoE, reasoning-tuned Phone (deep reasoning)
TALI-NAUTILUS-VL-8B 8B Vision-Language Vision tasks

Mixture-of-Experts (MoE) variants only load "active" parameters for any given token; memory cost is proportional to active params, not total.

Vision-Language (VL) accepts an image as part of its input — used for screen-content understanding and document scenes.

Quantization: Q4_K_M for on-device targets; full-precision weights retained for desktop.

File format

Shipped weights are wrapped in AON — Tali's encrypted, signed asset container. The .aon extension is the only format the OS or external tooling sees.

License

Proprietary — Intelligent Devices LLC.

Downloads last month
-
GGUF
Model size
4B params
Architecture
nemotron_h
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support