This is Alibaba-Apsara/DASD-4B-Thinking quantized with LLM Compressor with FP8 W8A8. The model has been created and tested by The Valdanito. The model is compatible with SGlang v0.5.7 & vLLM v0.14.0. Tested with an NVIDIA H800.

Downloads last month
18
Safetensors
Model size
4B params
Tensor type
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for valdanito/DASD-4B-Thinking-FP8-DYNAMIC

Quantized
(12)
this model