Instructions to use eagerworks/eager-embed-v1 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use eagerworks/eager-embed-v1 with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
Integrate with Transformers v5 and Sentence Transformers v5.4
Hello!
Pull Request overview
- Integrate eager-embed-v1 with Sentence Transformers v5.4
- Restore training-consistent embeddings under transformers 5.x via a thin custom modeling class, fully backwards compatible with existing user code
Details
While integrating, I found a subtle but important behavioral change in transformers. The model was trained with transformers==4.57.1 using model_outputs.hidden_states[-1] for embedding extraction. In 4.57.1, that field on Qwen3VLForConditionalGeneration was the state of the text decoder BEFORE the model's final RMSNorm. In transformers 5.x the capture logic was changed (the tie_last_hidden_states default in the new capture_outputs decorator), so hidden_states[-1] now equals the post-norm last_hidden_state. Anyone running the README example on a modern transformers install silently gets a different embedding space than the one the model was optimized for (ranking is preserved, but absolute scores differ materially: 0.29 vs 0.23 on the README's query/text-1 example).
To fix this, I added a small modeling_eager_embed.py with two thin subclasses (EagerEmbedModel and EagerEmbedForConditionalGeneration) that replace the text model's final RMSNorm with nn.Identity(). Registered via auto_map for AutoModel and AutoModelForImageTextToText, this makes both the Sentence Transformers path and the AutoModelForImageTextToText path produce the exact pre-norm representation the model was trained on, under any transformers version. The swap requires trust_remote_code=True. architectures in config.json is unchanged (still Qwen3VLForConditionalGeneration), so existing code using Qwen3VLForConditionalGeneration.from_pretrained(...) or AutoModel.from_pretrained(...) without trust_remote_code keeps its original behaviour.
I also extended the chat template with a new optional add_embedding_token flag that, when set, appends <|endoftext|> after the generation prompt. The existing add_generation_prompt conditional is unchanged, so code that does apply_chat_template(... add_generation_prompt=True) + "<|endoftext|>" continues to produce the exact same string. Sentence Transformers opts into the new flag via processing_kwargs.chat_template.add_embedding_token=true in sentence_bert_config.json.
Finally, I added {"query": "Query: ", "document": ""} to the prompts dict in config_sentence_transformers.json so Sentence Transformers users can call model.encode_query(...) / model.encode_document(...) without manually prefixing "Query: " to their queries. For message format models, ST injects prompts as a system-role message, so I also adjusted the chat template: when a system message is present (and no tools are declared), its text is inlined into the user message instead of being rendered as a separate <|im_start|>system\n...<|im_end|>\n block. This matches the flat-text tokenization the model was trained on. The plain-user-only path (no system message) is bit-for-bit identical to the original template.
Added files:
modeling_eager_embed.py: customEagerEmbedModel/EagerEmbedForConditionalGenerationclasses that make the text model's final RMSNorm a no-op to restore pre-norm embeddingsmodules.json: ST module pipeline (Transformer, Pooling, Normalize)config_sentence_transformers.json: ST model config with cosine similarity and{"query": "Query: ", "document": ""}prompts somodel.encode_query(...)prepends the training-time prefix automaticallysentence_bert_config.json: transformer task, modality config for text/image/video/message, processing kwargs withadd_generation_promptandadd_embedding_tokenfor the chat template,unpad_inputs: false1_Pooling/config.json: lasttoken pooling at dimension 2560
Modified files:
config.json: addedauto_mappointingAutoModelandAutoModelForImageTextToTextat the new classes;architecturesunchanged for backwards compatibilitychat_template.jinja: added an optionaladd_embedding_tokenconditional that appends<|endoftext|>; the originaladd_generation_promptconditional is unchanged; in the non-tools path, a leading system message is inlined into the user message instead of being wrapped in a<|im_start|>system\n...<|im_end|>\nblock (no-op when no system message is present, so plain-user inputs are bit-for-bit identical to before)README.md: addedsentence-transformerstag; added a "Using Sentence Transformers" section with text and image retrieval examples; updated the "Using transformers" example to useAutoModelForImageTextToTextwithtrust_remote_code=Trueso it loads the no-op-norm subclass and therefore also produces training-consistent embeddings under transformers 5.x; added expected-output comments to every code snippet
import requests
from io import BytesIO
from PIL import Image
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("eagerworks/eager-embed-v1", revision="refs/pr/2", trust_remote_code=True)
# Multilingual text retrieval
# `encode_query` automatically prepends the "Query: " prefix the model was trained on.
queries = ["What is the capital city of Uruguay?"]
documents = [
"Montevideo es la capital y la ciudad más poblada de la República Oriental del Uruguay, así como la capital del departamento homónimo",
"El río Uruguay es un río internacional que forma parte de la cuenca del Plata. Nace en Brasil, recorre unos 1.800 km y desemboca en el Río de la Plata",
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# (1, 2560) (2, 2560)
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.2907, 0.1573]])
# Image document retrieval
MAX_IMAGE_SIZE = 784
def fetch_image(url):
img = Image.open(BytesIO(requests.get(url).content)).convert("RGB")
return img.resize((MAX_IMAGE_SIZE, MAX_IMAGE_SIZE))
queries = ["Where can we find the animal llama?"]
documents = [
fetch_image("https://huggingface.co/Tevatron/dse-phi3-docmatix-v2/resolve/main/animal-llama.png"),
fetch_image("https://huggingface.co/Tevatron/dse-phi3-docmatix-v2/resolve/main/meta-llama.png"),
]
query_embeddings = model.encode_query(queries)
document_embeddings = model.encode_document(documents)
print(query_embeddings.shape, document_embeddings.shape)
# (1, 2560) (2, 2560)
similarities = model.similarity(query_embeddings, document_embeddings)
print(similarities)
# tensor([[0.2709, 0.0930]])
Note that none of the old behaviour is affected or changed. Existing code paths (direct Qwen3VLForConditionalGeneration imports, AutoModel without trust_remote_code, apply_chat_template(... add_generation_prompt=True) + "<|endoftext|>") all produce bit-for-bit identical output to before. The custom modeling class and the new template flag are purely additive opt-ins that restore training-consistent behaviour for Sentence Transformers users and for users on transformers 5.x.
- Tom Aarsen