Explicit Title Format Performs Worse Than Simple Format (Contradicts Docs)
The documentation states:
"providing a title, if available, will improve model performance"
But testing shows the opposite - simple formats perform up to 18.7% better.
from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("google/embeddinggemma-300m")
title = "Marketing Manager"
content = "Lead marketing team and drive brand growth."
query = "Marketing Manager"
Compare formats
doc_official = f"title: {title} | text: {content}" # Per docs
doc_simple = f"{title} | {content}" # Simple format
query_prompt = f"task: search result | query: {query}"
q_emb = model.encode(query_prompt)
sim_official = util.cos_sim(q_emb, model.encode(doc_official)).item()
sim_simple = util.cos_sim(q_emb, model.encode(doc_simple)).item()
print(f"Official format: {sim_official:.4f}") # 0.5867
print(f"Simple format: {sim_simple:.4f}") # 0.6961 (+18.7%!)
Official format: 0.5867
Simple format: 0.6961 (+18.7% better!)
Additional Findings
Large-scale eval (500 queries, 80k docs): Official format has 9.3% worse NDCG@100
Title-only documents: Official format loses 17.5% performance when text: is empty
Longer titles suffer more: 60-char titles show bigger performance gaps
Question
Should I use simple concatenation ({title} | {content}) instead of the documented format (title: {title} | text: {content}) for retrieval tasks?
The simple format consistently outperforms. Is this expected behavior or should documentation be updated?
Hi @Farhadtehrani , apologies for the delayed response.
Yes, you can use the simple concatenation format if it consistently improves NDCG on your dataset. Embedding models are highly sensitive to input distribution and your empirical retrieval metrics should always guide your format choice.
This is happening because the title: ... | text: ... template encodes structural priors learned during training. If your data deviates from the distribution, those priors can degrade representation quality. Removing the explicit field prefixes eliminates this structural bias and ca improve robustness under schema mismatch.
This is expected as this is a classic distribution shift effect in instruction tuned embedding models. The recommended template is optimised for typical document shapes(short titles, substantive body text) and is not guaranteed to dominate across all corpora.
The current recommendation is a general baseline, but your findings highlight that it could benefit from clarification. Optimal formatting is highly dataset-dependent and should always be validated against domain-specific retriieval metrics. I will pass this feedback along to the team to see if we can add a clarification to the model card.
Thank you!