Quantization of model
#1
by kuliev-vitaly - opened
Model is to large for starting on even 2 A100. Quantizations should help with hardware requirements.
Could you please make awq(4bit) or fp8 quantizations?
Model is to large for starting on even 2 A100. Quantizations should help with hardware requirements.
Could you please make awq(4bit) or fp8 quantizations?