Spaces:
Running
Running
rdmlx Claude Opus 4.6 commited on
Commit ·
5763d3f
1
Parent(s): d9c155c
Add BGE instruction prefix for query embeddings
Browse filesBGE models are designed to use instruction prefixes for queries
(not documents). Adding the prefix improves retrieval quality by
better aligning query embeddings with pre-computed document
embeddings. Addresses dssjon/biblos#35.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
app.py
CHANGED
|
@@ -141,6 +141,8 @@ async def load_model_and_data():
|
|
| 141 |
|
| 142 |
def generate_embedding(text: str) -> np.ndarray:
|
| 143 |
"""Generate embedding for input text using loaded model"""
|
|
|
|
|
|
|
| 144 |
# Tokenize
|
| 145 |
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
|
| 146 |
|
|
|
|
| 141 |
|
| 142 |
def generate_embedding(text: str) -> np.ndarray:
|
| 143 |
"""Generate embedding for input text using loaded model"""
|
| 144 |
+
# BGE instruction prefix for queries (improves retrieval quality)
|
| 145 |
+
text = "Represent this sentence for searching relevant passages: " + text
|
| 146 |
# Tokenize
|
| 147 |
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True, max_length=512)
|
| 148 |
|