rdmlx Claude Opus 4.6 commited on
Commit
5763d3f
·
1 Parent(s): d9c155c

Add BGE instruction prefix for query embeddings

Browse files

BGE models are designed to use instruction prefixes for queries
(not documents). Adding the prefix improves retrieval quality by
better aligning query embeddings with pre-computed document
embeddings. Addresses dssjon/biblos#35.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Files changed (1) hide show
  1. app.py +2 -0
app.py CHANGED
@@ -141,6 +141,8 @@ async def load_model_and_data():
141
 
142
  def generate_embedding(text: str) -> np.ndarray:
143
  """Generate embedding for input text using loaded model"""
 
 
144
  # Tokenize
145
  inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
146
 
 
141
 
142
  def generate_embedding(text: str) -> np.ndarray:
143
  """Generate embedding for input text using loaded model"""
144
+ # BGE instruction prefix for queries (improves retrieval quality)
145
+ text = "Represent this sentence for searching relevant passages: " + text
146
  # Tokenize
147
  inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
148