Instructions to use WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF", filename="Opus4.7-Distill-GODsGhost-Codex-4B-Q4_K_M.gguf", )
llm.create_chat_completion( messages = [ { "role": "user", "content": "What is the capital of France?" } ] ) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M # Run inference directly in the terminal: llama-cli -hf WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
Use Docker
docker model run hf.co/WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
- LM Studio
- Jan
- vLLM
How to use WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
- Ollama
How to use WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF with Ollama:
ollama run hf.co/WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
- Unsloth Studio new
How to use WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF to start chatting
- Pi new
How to use WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF with Pi:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
Configure the model in Pi
# Install Pi: npm install -g @mariozechner/pi-coding-agent # Add to ~/.pi/agent/models.json: { "providers": { "llama-cpp": { "baseUrl": "http://localhost:8080/v1", "api": "openai-completions", "apiKey": "none", "models": [ { "id": "WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M" } ] } } }Run Pi
# Start Pi in your project directory: pi
- Hermes Agent new
How to use WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF with Hermes Agent:
Start the llama.cpp server
# Install llama.cpp: brew install llama.cpp # Start a local OpenAI-compatible server: llama-server -hf WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
Configure Hermes
# Install Hermes: curl -fsSL https://hermes-agent.nousresearch.com/install.sh | bash hermes setup # Point Hermes at the local server: hermes config set model.provider custom hermes config set model.base_url http://127.0.0.1:8080/v1 hermes config set model.default WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
Run Hermes
hermes
- Docker Model Runner
How to use WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF with Docker Model Runner:
docker model run hf.co/WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
- Lemonade
How to use WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull WithinUsAI/Opus4.7-GODs.Ghost.Codex-4B.GGuF:Q4_K_M
Run and chat with the model
lemonade run user.Opus4.7-GODs.Ghost.Codex-4B.GGuF-Q4_K_M
List all available models
lemonade list
Do can you make other new model with version like Llava like similar "Opus4.7-GODs.Ghost.Codex-4B.GGuF"?
Because model can add ability ocr (e.g: can read pdf, docx, doc, rtf, txt, md, json), vision (e.g: webcam, pattern recognition, detect thermal, detect expression facial)? do you can also too Model ai can generation image and video?
Thanks for the feedback! The current Opus4.7-GODs.Ghost.Codex-4B is highly specialized for text, reasoning, and coding tasks.
Adding vision and OCR (like LLaVA) requires a Vision-Language Model (VLM) architecture, where a vision encoder is attached to the base model. While this 4B model doesn't have that, I am actively developing upcoming multimodal models (like the Ghost Codex XI line) that will integrate image, audio, and video understanding!
Also, just to clarify: models like LLaVA only understand images. To actually generate images and video, the model needs to be hooked up to a diffusion generator, which is a different beast entirely. Stay tuned to the WithinUsAI page for future multimodal releases!
Thank you. :)
Your welcome
Also
Check out my Gemma4-Overlooked.Thinker.Uncensored-E2B model. Unlike LLaVA which needs a separate vision encoder, my Gemma 4 build is natively multimodal. It handles deep OCR, reads PDFs, and analyzes images and video frames right out of the box.
I just deployed a free interactive Space for it, so you can test its vision capabilities right now without downloading anything:
https://huggingface.co/spaces/WithinUsAI/Gemma4-Overlooked.Thinker.Uncensored-E2B.gguf
One quick note: it can read and analyze images/video perfectly, but it doesn't generate them (that requires a separate diffusion model). Give the Space a try and let me know what you think!
LLaVA models are multimodal video, which means they combine: Vision understanding (images), Language understanding and generation (text). Vision-Language Integration: LLaVA works by connecting a visual encoder (which "sees" the image) to a Large Language Model (like Vicuna or Llama, which understands both text and visual like animated .gif and .mp4, .avi or non-animated like .png, .webp, .jpg, .bmp). because model can reaction and watching and analyse video and image. e.g: Stable Diffusion → image generation. Sora → video generation. thank you :)
Yes, exactly! That is a perfect breakdown of how LLaVA combines a visual encoder with an LLM to understand images and text.
I brought up my Gemma4-Overlooked.Thinker.Uncensored-E2B model because it handles those exact vision tasks, but it is built differently. It handles deep OCR, reading PDFs, and analyzing images and video frames natively—meaning it doesn't need a separate visual encoder bolted on to "see" like LLaVA does.
Since you are looking for a model with strong OCR and vision capabilities, I highly recommend testing it out. You can just drop an image or document straight into the free interactive Space I deployed here to see how it performs:
https://huggingface.co/spaces/WithinUsAI/Gemma4-Overlooked.Thinker.Uncensored-E2B.gguf
Drop a screenshot or a PDF in there and let me know how its analysis compares to your experience with LLaVA!