Cosmos-Reason2-8B-GGUF
GGUF quantizations of nvidia/Cosmos-Reason2-8B for use with llama.cpp and compatible tools.
Built on NVIDIA Cosmos
About the Model
NVIDIA Cosmos Reason 2 is an open, 8B-parameter reasoning vision-language model (VLM) for physical AI and robotics. It is post-trained from Qwen3-VL-8B-Instruct and understands space, time, and fundamental physics.
Key capabilities:
- Physical AI reasoning with spatio-temporal understanding
- Object detection with 2D/3D point localization and bounding boxes
- Long-context understanding up to 256K input tokens
- Video analytics, data curation, and robot planning
For full details, see the original model card.
Quantization Details
| File | Quant | Size |
|---|---|---|
Cosmos-Reason2-8B-F16.gguf |
F16 | 16 GB |
Cosmos-Reason2-8B-Q8_0.gguf |
Q8_0 | 8.2 GB |
Cosmos-Reason2-8B-Q4_K_M.gguf |
Q4_K_M | 4.7 GB |
mmproj-Cosmos-Reason2-8B-F16.gguf |
F16 | 1.1 GB |
Note: The vision encoder (
mmproj) is kept at F16 precision.
How to Use
llama-server -hf Kbenkhaled/Cosmos-Reason2-8B-GGUF:Q8_0
llama-server -hf Kbenkhaled/Cosmos-Reason2-8B-GGUF:F16
llama-server -hf Kbenkhaled/Cosmos-Reason2-8B-GGUF:Q4_K_M
Recommended Parameters
Vision / Multimodal
| Parameter | Value |
|---|---|
| temperature | 0.7 |
| top_k | 20 |
| top_p | 0.8 |
| presence_penalty | 1.5 |
| max_tokens | 4096+ |
Text Only
| Parameter | Value |
|---|---|
| temperature | 1.0 |
| top_k | 40 |
| top_p | 1.0 |
| presence_penalty | 2.0 |
| max_tokens | 32768 |
Tips
- Video input: Use
fps=4to match the model's training setup. - Chain-of-thought: Append
Answer the question in the following format: <think>\nyour reasoning\n</think>\n\n<answer>\nyour answer\n</answer>.to the system prompt for detailed reasoning. - Context window: The model supports up to 256K tokens natively. Use
-c 8192or higher as needed. - GPU offload: Add
-ngl 99to offload all layers to GPU for faster inference.
License
This model is released under the NVIDIA Open Model License. Use of this model must be consistent with NVIDIA's Trustworthy AI terms.
When redistributing or building products with this model, include:
"Built on NVIDIA Cosmos"
Credits
- Original model: NVIDIA โ Cosmos-Reason2-8B
- Architecture: Qwen3-VL-8B-Instruct
- Quantization tooling: llama.cpp by ggml-org
- Downloads last month
- -
Hardware compatibility
Log In
to add your hardware
4-bit
8-bit
16-bit