cost estimates?

by lightenup - opened Dec 2, 2025

Dec 2, 2025

Hi - looks very nice!
How much does it roughly cost to create such a NVFP4 checkpoint of ~100B MoE models assuming one has to use rented GPUs?

Firworks

Owner Dec 2, 2025

Doing the quantization to NVFP4 of an existing model? It's pretty cheap. It should take less than an hour of time on a cloud instance. I used a single RTX Pro 6000 Blackwell to make this one.

lightenup

Dec 2, 2025

hmm interesting! I always assumed you have to load the original higher/full-precision model weights for calibration.

Firworks

Owner Dec 2, 2025

I use llm-compressor to do the quantization and it does it in chunks so you don't have to load the whole model into VRAM at once, it only loads a partial set of layers as it progresses through, calibrates, and quantizes them. You do have to download the full weights which does take a while but once you've got them it goes pretty quick.

lightenup

Dec 2, 2025

of course - that makes a lot of sense. Thanks!!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment