--- license: mit base_model: LocoreMind/LocoOperator-4B tags: - nvfp4 - quantized - qwen3 - agent - tool-calling - code - nvidia - modelopt - spark pipeline_tag: text-generation --- # LocoOperator-4B — NVFP4 Quantized NVFP4-quantized version of [LocoreMind/LocoOperator-4B](https://huggingface.co/LocoreMind/LocoOperator-4B), an agent/tool-calling model based on Qwen3-4B-Instruct. ## Quantization Details | Property | Value | |----------|-------| | **Base model** | LocoreMind/LocoOperator-4B (Qwen3-4B finetune) | | **Quantization** | NVFP4 (weights) + FP8 (KV cache) | | **Group size** | 16 | | **Tool** | NVIDIA TensorRT Model Optimizer (modelopt 0.35.0) | | **Calibration** | cnn_dailymail (default) | | **Original size** | ~8 GB (BF16) | | **Quantized size** | 2.7 GB | | **Excluded** | `lm_head` (kept in higher precision) | ## Intended Use Optimized for deployment on NVIDIA Blackwell GPUs (GB10/GB100), particularly the DGX Spark. The NVFP4 format leverages Blackwell's native FP4 tensor cores for maximum throughput. Best suited for: - Agent/tool-calling workflows - Code generation - Instruction following ## Usage ### With transformers + modelopt ```python from transformers import AutoModelForCausalLM, AutoTokenizer model = AutoModelForCausalLM.from_pretrained( "DJLougen/LocoOperator-4B-NVFP4", device_map="auto", torch_dtype="auto", ) tokenizer = AutoTokenizer.from_pretrained("DJLougen/LocoOperator-4B-NVFP4") ``` ### With TensorRT-LLM Convert to TensorRT-LLM engine for optimal inference performance on Spark/Blackwell hardware. ## Quality Check Example outputs (cnn_dailymail calibration text): **Before quantization:** > "I'm excited to be doing the final two films," he said. "I can't wait to see what happens." **After NVFP4 quantization:** > "I don't think I'll be particularly extravagant," Radcliffe said. "I don't think I'll be one of those people who, as soon as they turn 18, suddenly buy themselves a massive sports car collection or something similar." Both outputs are coherent and contextually appropriate. ## Hardware - **Quantized on:** NVIDIA DGX Spark (GB10, 128 GB unified memory) - **Docker image:** `nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev` - **Target deployment:** Any NVIDIA Blackwell GPU with FP4 tensor core support