Upstream transformers support with `use_bidirectional_attention`
#13
by
michaelfeil - opened
Can you make a PR in transformers for use_bidirectional_attention for the llama arch?
@michaelfeil do you mean adding bidirectional support to LLama's architecture in transformers? Is there any issue with using custom LlamaBidirectionalModel class?
Yeah, similar to Gemma3Embedding.
E.g. refer to enum implmentation in Text-embeddings-inference!
since its merged here, https://huggingface.co/nvidia/llama-embed-nemotron-8b/commit/1acaf42b890bafa464ef9a58d1c0db0dd26120d4 I am closing.
michaelfeil changed discussion status to
closed