File size: 2,328 Bytes
7acae37 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | ---
license: mit
base_model: LocoreMind/LocoOperator-4B
tags:
- nvfp4
- quantized
- qwen3
- agent
- tool-calling
- code
- nvidia
- modelopt
- spark
pipeline_tag: text-generation
---
# LocoOperator-4B — NVFP4 Quantized
NVFP4-quantized version of [LocoreMind/LocoOperator-4B](https://huggingface.co/LocoreMind/LocoOperator-4B), an agent/tool-calling model based on Qwen3-4B-Instruct.
## Quantization Details
| Property | Value |
|----------|-------|
| **Base model** | LocoreMind/LocoOperator-4B (Qwen3-4B finetune) |
| **Quantization** | NVFP4 (weights) + FP8 (KV cache) |
| **Group size** | 16 |
| **Tool** | NVIDIA TensorRT Model Optimizer (modelopt 0.35.0) |
| **Calibration** | cnn_dailymail (default) |
| **Original size** | ~8 GB (BF16) |
| **Quantized size** | 2.7 GB |
| **Excluded** | `lm_head` (kept in higher precision) |
## Intended Use
Optimized for deployment on NVIDIA Blackwell GPUs (GB10/GB100), particularly the DGX Spark. The NVFP4 format leverages Blackwell's native FP4 tensor cores for maximum throughput.
Best suited for:
- Agent/tool-calling workflows
- Code generation
- Instruction following
## Usage
### With transformers + modelopt
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(
"DJLougen/LocoOperator-4B-NVFP4",
device_map="auto",
torch_dtype="auto",
)
tokenizer = AutoTokenizer.from_pretrained("DJLougen/LocoOperator-4B-NVFP4")
```
### With TensorRT-LLM
Convert to TensorRT-LLM engine for optimal inference performance on Spark/Blackwell hardware.
## Quality Check
Example outputs (cnn_dailymail calibration text):
**Before quantization:**
> "I'm excited to be doing the final two films," he said. "I can't wait to see what happens."
**After NVFP4 quantization:**
> "I don't think I'll be particularly extravagant," Radcliffe said. "I don't think I'll be one of those people who, as soon as they turn 18, suddenly buy themselves a massive sports car collection or something similar."
Both outputs are coherent and contextually appropriate.
## Hardware
- **Quantized on:** NVIDIA DGX Spark (GB10, 128 GB unified memory)
- **Docker image:** `nvcr.io/nvidia/tensorrt-llm/release:spark-single-gpu-dev`
- **Target deployment:** Any NVIDIA Blackwell GPU with FP4 tensor core support
|