cost estimates?
Hi - looks very nice!
How much does it roughly cost to create such a NVFP4 checkpoint of ~100B MoE models assuming one has to use rented GPUs?
Doing the quantization to NVFP4 of an existing model? It's pretty cheap. It should take less than an hour of time on a cloud instance. I used a single RTX Pro 6000 Blackwell to make this one.
hmm interesting! I always assumed you have to load the original higher/full-precision model weights for calibration.
I use llm-compressor to do the quantization and it does it in chunks so you don't have to load the whole model into VRAM at once, it only loads a partial set of layers as it progresses through, calibrates, and quantizes them. You do have to download the full weights which does take a while but once you've got them it goes pretty quick.
of course - that makes a lot of sense. Thanks!!