--- license: other base_model: TheDrummer/Behemoth-R1-123B-v2 tags: - nvfp4 - modelopt - quantized - blackwell - b200 library_name: transformers --- # Behemoth-R1-V2 ModelOpt NVFP4 NVFP4 quantized version of [TheDrummer/Behemoth-R1-123B-v2](https://huggingface.co/TheDrummer/Behemoth-R1-123B-v2) using NVIDIA Model Optimizer. ## Quantization Details | Property | Value | |----------|-------| | **Original Model** | TheDrummer/Behemoth-R1-123B-v2 | | **Quantization** | NVFP4 (FP4 weights, FP16 activations) | | **Method** | NVIDIA ModelOpt PTQ | | **Calibration Samples** | 512 | | **Max Sequence Length** | 4096 | ## Hardware Requirements - **Optimal**: NVIDIA Blackwell GPUs (B100, B200, RTX PRO 6000 Blackwell) - **Compatible**: Hopper/Ampere (will use weight-only mode) ## Usage with vLLM ```python from vllm import LLM, SamplingParams llm = LLM( model="TheHouseOfTheDude/Behemoth-R1-V2_ModelOpt-NVFP4", quantization="modelopt", trust_remote_code=True, ) sampling_params = SamplingParams(temperature=0.8, top_p=0.95, max_tokens=512) outputs = llm.generate(["Write a story about..."], sampling_params) print(outputs[0].outputs[0].text) ``` ## Chat Template Uses Mistral v7 (Non-Tekken) format. See the original model card for usage details. ## Credits - Original Model: [TheDrummer](https://huggingface.co/TheDrummer) - Quantization: TheHouseOfTheDude - Quantization Framework: NVIDIA ModelOpt