Trinity-Large-Thinking-GGUF

QAT?

by Downtown-Case - opened 3 days ago

•

Has Arcee considered a highly quantized (2-3 bit?) QAT run, like Baidu did for Ernie 4.5, to make tight quantization more usable?

One could even do “targeted” QAT with just the MoE FFN layers, and leave the other layers as ~8-bit.

I understand commercial deployments are a priority, but:

A more cheaply deployable 400B would put Trinity Large in a unique niche.
At the risk of speaking cynically, you could take advantage of the current TurboQuant hype with a major QAT release.
It’d be cheap compared to the finetuning cost.
It’d be local-inference friendly (opening up inference on 128GB RAM machines).

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment