Ex0bit
/

OLMo-3-7B-Instruct-NVFP4-1M

Text Generation

8-bit precision

Model card Files Files and versions

Ex0bit commited on Nov 22, 2025

Commit

2f26161

·

verified ·

1 Parent(s): ad2b339

Upload README.md with huggingface_hub

Files changed (1) hide show

README.md +22 -1

README.md CHANGED Viewed

@@ -1,10 +1,31 @@
 # OLMo-3-7B-Instruct-NVFP4-1M
 NVFP4 quantized version of [allenai/Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) with extended 1M token context support via linear RoPE scaling.
 ## Model Description
-This model is the NVFP4 (4-bit floating point) quantized version of OLMo-3-7B-Instruct, optimized for NVIDIA GPUs with Blackwell/Ada Lovelace architecture support. The quantization uses NVIDIA's ModelOpt library with two-level scaling: E4M3 FP8 per block plus FP32 global scale.
 ### Key Features

+---
+language:
+- en
+license: apache-2.0
+library_name: transformers
+tags:
+- olmo
+- nvfp4
+- quantized
+- long-context
+- vllm
+- modelopt
+datasets:
+- allenai/c4
+base_model: allenai/Olmo-3-7B-Instruct
+pipeline_tag: text-generation
+model-index:
+- name: OLMo-3-7B-Instruct-NVFP4-1M
+  results: []
+---
 # OLMo-3-7B-Instruct-NVFP4-1M
 NVFP4 quantized version of [allenai/Olmo-3-7B-Instruct](https://huggingface.co/allenai/Olmo-3-7B-Instruct) with extended 1M token context support via linear RoPE scaling.
 ## Model Description
+This model is the NVFP4 (4-bit floating point) quantized version of OLMo-3-7B-Instruct, optimized for NVIDIA DGX Spark systems with Blackwell GB10 GPUs and Ada Lovelace architecture support. The quantization uses NVIDIA's ModelOpt library with two-level scaling: E4M3 FP8 per block plus FP32 global scale.
 ### Key Features