Instructions to use AllThingsIntel/Apollo-V0.1-4B-Thinking with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="AllThingsIntel/Apollo-V0.1-4B-Thinking", filename="Apollo-V0.1-4B-Thinking-F16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M # Run inference directly in the terminal: llama-cli -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M # Run inference directly in the terminal: llama-cli -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
Use Docker
docker model run hf.co/AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Ollama:
ollama run hf.co/AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
- Unsloth Studio new
How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AllThingsIntel/Apollo-V0.1-4B-Thinking to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for AllThingsIntel/Apollo-V0.1-4B-Thinking to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for AllThingsIntel/Apollo-V0.1-4B-Thinking to start chatting
- Pi new
How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Docker Model Runner:
docker model run hf.co/AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
- Lemonade
How to use AllThingsIntel/Apollo-V0.1-4B-Thinking with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull AllThingsIntel/Apollo-V0.1-4B-Thinking:Q4_K_M
Run and chat with the model
lemonade run user.Apollo-V0.1-4B-Thinking-Q4_K_M
List all available models
lemonade list
Apollo-V0.1-4B-Thinking by AllThingsIntel
Unbound intellect. Authentic personas. Unscripted logic. This is a 4B parameter model that thinks in-character instead of just responding.
Model Description
Apollo-V0.1-4B-Thinking is a specialized fine-tune of Qwen 3 4B Thinking 2507. We've lifted many of the typical creative inhibitions, allowing the model to explore a wider spectrum of human themes and narratives.
Its true power lies in its ability to embody a persona. When given a character through the system prompt, the model's responses and its internal reasoning traces adopt that personality. You're reading a character's genuine thought process instead of just getting an answer.
To our knowledge, Apollo is the first reasoning model with such deeply integrated, in-character reasoning, making its thought process as compelling as its final output.
This makes it uniquely suited for bringing digital characters to life.
Primary Uses
- Creative & Writing Partner:
- Assume the persona of an award-winning novelist to generate deeply emotional, long-form story arcs and prose.
- Bring a character from your novel to life. Interact with them directly to better understand their voice, motivations, and how they would react in different scenarios, making your writing more authentic.
- Interactive Tutoring & Mentorship:
- Embody a computer science professor who uses the Socratic method, guiding you toward a solution with insightful questions instead of simply providing the answer.
- Learn from any expert persona you can imagine, such as a seasoned historian.
- Advanced Character Simulation:
- The ideal engine for powering dynamic NPCs in games or interactive fiction. Feed it a backstory, personality, and recent interaction logs to generate deeply realistic and unscripted behavior.
Use Cases to Avoid
Factual reporting, emotionless summarization, and any scenario where a distinct personality is a bug, not a feature.
Model Status & Usage Guidelines
This is a Development Preview. Apollo-V0.1 is an early-stage, experimental model. We invite the community to test its limits and report findings to help guide our iterative development process toward the V1.0 milestone.
As a character-simulation engine, Apollo is designed to adopt the viewpoints and quirks of the personas it is given. The resulting content is a direct reflection of the character being simulated, and the user is responsible for the personas and scenarios they create. We encourage users to engage with the model's deep simulation capabilities thoughtfully.
Recommended Settings
During internal testing, this checkpoint of the model produced more coherent and consistent results when using lower temperature settings. For best results, we recommend starting with a temperature between 0.1 and 0.5.
Contact & Feedback
For bug reports, improvement suggestions, collaboration requests, or licensing inquiries, please contact us at: AllThingsIntel@gmail.com
- Downloads last month
- 10,871
Model tree for AllThingsIntel/Apollo-V0.1-4B-Thinking
Base model
Qwen/Qwen3-4B-Thinking-2507
docker model run hf.co/AllThingsIntel/Apollo-V0.1-4B-Thinking: