Instructions to use AllThingsIntel/Apollo-V0.1-4B-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="AllThingsIntel/Apollo-V0.1-4B-Thinking",
	filename="Apollo-V0.1-4B-Thinking-F16.gguf",
)

llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M

Use Docker

docker model run hf.co/AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M

LM Studio
Jan
Ollama
How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Ollama:
```
ollama run hf.co/AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
```

Unsloth Studio new

How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AllThingsIntel/Apollo-V0.1-4B-Thinking to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AllThingsIntel/Apollo-V0.1-4B-Thinking to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AllThingsIntel/Apollo-V0.1-4B-Thinking to start chatting

Pi new

How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Pi:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M

Configure the model in Pi

# Install Pi:
npm install -g @mariozechner/pi-coding-agent
# Add to ~/.pi/agent/models.json:
{
  "providers": {
    "llama-cpp": {
      "baseUrl": "http://localhost:8080/v1",
      "api": "openai-completions",
      "apiKey": "none",
      "models": [
        {
          "id": "AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M"
        }
      ]
    }
  }
}

Run Pi

# Start Pi in your project directory:
pi

Hermes Agent new

How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Hermes Agent:

Start the llama.cpp server

# Install llama.cpp:
brew install llama.cpp
# Start a local OpenAI-compatible server:
llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M

Configure Hermes

# Install Hermes:
curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash
hermes setup
# Point Hermes at the local server:
hermes config set model.provider custom
hermes config set model.base_url http://127.0.0.1:8080/v1
hermes config set model.default AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M

Run Hermes

hermes

Docker Model Runner
How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Docker Model Runner:
```
docker model run hf.co/AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
```

Lemonade

How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M

Run and chat with the model

lemonade run user.Apollo-V0.1-4B-Thinking-Q4_K_M

List all available models

lemonade list

Apollo-V0.1-4B-Thinking by AllThingsIntel

Unbound intellect. Authentic personas. Unscripted logic. This is a 4B parameter model that thinks in-character instead of just responding.

Model Description

Apollo-V0.1-4B-Thinking is a specialized fine-tune of Qwen 3 4B Thinking 2507. We've lifted many of the typical creative inhibitions, allowing the model to explore a wider spectrum of human themes and narratives.

Its true power lies in its ability to embody a persona. When given a character through the system prompt, the model's responses and its internal reasoning traces adopt that personality. You're reading a character's genuine thought process instead of just getting an answer.

To our knowledge, Apollo is the first reasoning model with such deeply integrated, in-character reasoning, making its thought process as compelling as its final output.

This makes it uniquely suited for bringing digital characters to life.

Primary Uses

Creative & Writing Partner:
- Assume the persona of an award-winning novelist to generate deeply emotional, long-form story arcs and prose.
- Bring a character from your novel to life. Interact with them directly to better understand their voice, motivations, and how they would react in different scenarios, making your writing more authentic.
Interactive Tutoring & Mentorship:
- Embody a computer science professor who uses the Socratic method, guiding you toward a solution with insightful questions instead of simply providing the answer.
- Learn from any expert persona you can imagine, such as a seasoned historian.
Advanced Character Simulation:
- The ideal engine for powering dynamic NPCs in games or interactive fiction. Feed it a backstory, personality, and recent interaction logs to generate deeply realistic and unscripted behavior.

Use Cases to Avoid

Factual reporting, emotionless summarization, and any scenario where a distinct personality is a bug, not a feature.

Model Status & Usage Guidelines

This is a Development Preview. Apollo-V0.1 is an early-stage, experimental model. We invite the community to test its limits and report findings to help guide our iterative development process toward the V1.0 milestone.

As a character-simulation engine, Apollo is designed to adopt the viewpoints and quirks of the personas it is given. The resulting content is a direct reflection of the character being simulated, and the user is responsible for the personas and scenarios they create. We encourage users to engage with the model's deep simulation capabilities thoughtfully.

Recommended Settings

During internal testing, this checkpoint of the model produced more coherent and consistent results when using lower temperature settings. For best results, we recommend starting with a temperature between 0.1 and 0.5.

Contact & Feedback

For bug reports, improvement suggestions, collaboration requests, or licensing inquiries, please contact us at: AllThingsIntel@gmail.com

Downloads last month: 10,871

Safetensors

Model size

4B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AllThingsIntel/Apollo-V0.1-4B-Thinking

Base model

Qwen/Qwen3-4B-Thinking-2507

Quantized

(99)

this model

Finetunes

1 model

Quantizations

6 models