Cosmos-Reason2-8B-GGUF

GGUF quantizations of nvidia/Cosmos-Reason2-8B for use with llama.cpp and compatible tools.

Built on NVIDIA Cosmos

About the Model

NVIDIA Cosmos Reason 2 is an open, 8B-parameter reasoning vision-language model (VLM) for physical AI and robotics. It is post-trained from Qwen3-VL-8B-Instruct and understands space, time, and fundamental physics.

Key capabilities:

Physical AI reasoning with spatio-temporal understanding
Object detection with 2D/3D point localization and bounding boxes
Long-context understanding up to 256K input tokens
Video analytics, data curation, and robot planning

For full details, see the original model card.

Quantization Details

File	Quant	Size
`Cosmos-Reason2-8B-F16.gguf`	F16	16 GB
`Cosmos-Reason2-8B-Q8_0.gguf`	Q8_0	8.2 GB
`Cosmos-Reason2-8B-Q4_K_M.gguf`	Q4_K_M	4.7 GB
`mmproj-Cosmos-Reason2-8B-F16.gguf`	F16	1.1 GB

Note: The vision encoder (mmproj) is kept at F16 precision.

How to Use

llama-server -hf Kbenkhaled/Cosmos-Reason2-8B-GGUF:Q8_0
llama-server -hf Kbenkhaled/Cosmos-Reason2-8B-GGUF:F16
llama-server -hf Kbenkhaled/Cosmos-Reason2-8B-GGUF:Q4_K_M

Recommended Parameters

Vision / Multimodal

Parameter	Value
temperature	0.7
top_k	20
top_p	0.8
presence_penalty	1.5
max_tokens	4096+

Text Only

Parameter	Value
temperature	1.0
top_k	40
top_p	1.0
presence_penalty	2.0
max_tokens	32768

Tips

Video input: Use fps=4 to match the model's training setup.
Chain-of-thought: Append Answer the question in the following format: <think>\nyour reasoning\n</think>\n\n<answer>\nyour answer\n</answer>. to the system prompt for detailed reasoning.
Context window: The model supports up to 256K tokens natively. Use -c 8192 or higher as needed.
GPU offload: Add -ngl 99 to offload all layers to GPU for faster inference.

License

This model is released under the NVIDIA Open Model License. Use of this model must be consistent with NVIDIA's Trustworthy AI terms.

When redistributing or building products with this model, include:

"Built on NVIDIA Cosmos"

Credits

Original model: NVIDIA — Cosmos-Reason2-8B
Architecture: Qwen3-VL-8B-Instruct
Quantization tooling: llama.cpp by ggml-org

Downloads last month: -

GGUF

Model size

8B params

Architecture

qwen3vl

Hardware compatibility

4-bit

8-bit

16-bit

Model tree for Kbenkhaled/Cosmos-Reason2-8B-GGUF

Base model

Qwen/Qwen3-VL-8B-Instruct

Finetuned

nvidia/Cosmos-Reason2-8B

Quantized

(5)

this model