Cosmos-Reason2-8B-GGUF

GGUF quantizations of nvidia/Cosmos-Reason2-8B for use with llama.cpp and compatible tools.

Built on NVIDIA Cosmos

About the Model

NVIDIA Cosmos Reason 2 is an open, 8B-parameter reasoning vision-language model (VLM) for physical AI and robotics. It is post-trained from Qwen3-VL-8B-Instruct and understands space, time, and fundamental physics.

Key capabilities:

  • Physical AI reasoning with spatio-temporal understanding
  • Object detection with 2D/3D point localization and bounding boxes
  • Long-context understanding up to 256K input tokens
  • Video analytics, data curation, and robot planning

For full details, see the original model card.

Quantization Details

File Quant Size
Cosmos-Reason2-8B-F16.gguf F16 16 GB
Cosmos-Reason2-8B-Q8_0.gguf Q8_0 8.2 GB
Cosmos-Reason2-8B-Q4_K_M.gguf Q4_K_M 4.7 GB
mmproj-Cosmos-Reason2-8B-F16.gguf F16 1.1 GB

Note: The vision encoder (mmproj) is kept at F16 precision.

How to Use

llama-server -hf Kbenkhaled/Cosmos-Reason2-8B-GGUF:Q8_0
llama-server -hf Kbenkhaled/Cosmos-Reason2-8B-GGUF:F16
llama-server -hf Kbenkhaled/Cosmos-Reason2-8B-GGUF:Q4_K_M

Recommended Parameters

Vision / Multimodal

Parameter Value
temperature 0.7
top_k 20
top_p 0.8
presence_penalty 1.5
max_tokens 4096+

Text Only

Parameter Value
temperature 1.0
top_k 40
top_p 1.0
presence_penalty 2.0
max_tokens 32768

Tips

  • Video input: Use fps=4 to match the model's training setup.
  • Chain-of-thought: Append Answer the question in the following format: <think>\nyour reasoning\n</think>\n\n<answer>\nyour answer\n</answer>. to the system prompt for detailed reasoning.
  • Context window: The model supports up to 256K tokens natively. Use -c 8192 or higher as needed.
  • GPU offload: Add -ngl 99 to offload all layers to GPU for faster inference.

License

This model is released under the NVIDIA Open Model License. Use of this model must be consistent with NVIDIA's Trustworthy AI terms.

When redistributing or building products with this model, include:

"Built on NVIDIA Cosmos"

Credits

Downloads last month
-
GGUF
Model size
8B params
Architecture
qwen3vl
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Kbenkhaled/Cosmos-Reason2-8B-GGUF

Quantized
(5)
this model