add know issues
Browse files
README.md
CHANGED
|
@@ -93,6 +93,18 @@ print(f"Similarity: {similarity.item():.4f}")
|
|
| 93 |
- Cross-lingual retrieval
|
| 94 |
- Text classification with embeddings
|
| 95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
## Limitations
|
| 97 |
|
| 98 |
- Performance may vary across different domains and languages
|
|
|
|
| 93 |
- Cross-lingual retrieval
|
| 94 |
- Text classification with embeddings
|
| 95 |
|
| 96 |
+
## Known Issues
|
| 97 |
+
|
| 98 |
+
When encoding documents without any instruction prefix, you may encounter unexpected behavior due to an [upstream issue in Qwen3-Embedding](https://huggingface.co/Qwen/Qwen3-Embedding-8B/discussions/21). To avoid this issue, we recommend adding `"- "` (dash followed by space) at the beginning of your text when encoding documents:
|
| 99 |
+
|
| 100 |
+
```python
|
| 101 |
+
# Recommended: Add "- " prefix for document encoding
|
| 102 |
+
documents = ["- " + doc for doc in documents]
|
| 103 |
+
embeddings = model.encode(documents)
|
| 104 |
+
```
|
| 105 |
+
|
| 106 |
+
This workaround ensures consistent and expected embedding behavior.
|
| 107 |
+
|
| 108 |
## Limitations
|
| 109 |
|
| 110 |
- Performance may vary across different domains and languages
|