Upstream transformers support with `use_bidirectional_attention`

#13
by michaelfeil - opened

Can you make a PR in transformers for use_bidirectional_attention for the llama arch?

@michaelfeil do you mean adding bidirectional support to LLama's architecture in transformers? Is there any issue with using custom LlamaBidirectionalModel class?

Yeah, similar to Gemma3Embedding.

E.g. refer to enum implmentation in Text-embeddings-inference!

michaelfeil changed discussion status to closed

Sign up or log in to comment