This is a GQA version of the original model facebook/opt-125m. In this version, the original MHA architecture is preserved but instead of having a single K/V head, different K/V heads corresponding to the same group have the same mean-pooled K or V values. It has 6 groups of KV heads per layer instead of original 12 KV heads in the MHA implementation.

Downloads last month
955
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support