Spaces:
Runtime error
Runtime error
Create README.md
Browse files
README.md
CHANGED
|
@@ -1,14 +1,50 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
-
|
| 3 |
-
|
| 4 |
-
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
|
| 10 |
-
|
| 11 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 12 |
---
|
| 13 |
|
| 14 |
-
|
|
|
|
|
|
| 1 |
+
# SBERT + FAISS Semantic Search
|
| 2 |
+
|
| 3 |
+
This Hugging Face Space hosts a **semantic search system** built with:
|
| 4 |
+
|
| 5 |
+
- [Sentence-BERT (SBERT)](https://www.sbert.net/) for embeddings
|
| 6 |
+
- [FAISS](https://faiss.ai/) for fast vector search
|
| 7 |
+
- [MS MARCO v1.1 dataset](https://microsoft.github.io/msmarco/) (10,000 passages subset)
|
| 8 |
+
- [Gradio](https://gradio.app/) for the interactive interface
|
| 9 |
+
|
| 10 |
+
---
|
| 11 |
+
|
| 12 |
+
## πΉ Features
|
| 13 |
+
- Enter a **query** to retrieve the **Top-10 most similar passages**.
|
| 14 |
+
- Optionally provide **ground truth relevant passages** (one per line) to compute **IR metrics**:
|
| 15 |
+
- Precision@10
|
| 16 |
+
- Recall@10
|
| 17 |
+
- F1-score
|
| 18 |
+
- Mean Reciprocal Rank (MRR)
|
| 19 |
+
- Normalized Discounted Cumulative Gain (nDCG@10)
|
| 20 |
+
|
| 21 |
---
|
| 22 |
+
|
| 23 |
+
## πΉ How to Use
|
| 24 |
+
1. Type a query into the input box.
|
| 25 |
+
2. (Optional) Paste one or more relevant passages into the second box, each on a new line.
|
| 26 |
+
3. Press **Submit**.
|
| 27 |
+
4. View:
|
| 28 |
+
- **Top-10 retrieved passages** with FAISS similarity scores
|
| 29 |
+
- **Evaluation metrics** if ground truth passages were provided
|
| 30 |
+
|
| 31 |
+
---
|
| 32 |
+
|
| 33 |
+
## πΉ Tech Stack
|
| 34 |
+
- **Embeddings:** `sentence-transformers/all-mpnet-base-v2`
|
| 35 |
+
- **Indexing:** FAISS (L2 similarity)
|
| 36 |
+
- **Dataset:** MS MARCO v1.1 (first 10,000 passages)
|
| 37 |
+
- **Interface:** Gradio
|
| 38 |
+
|
| 39 |
+
---
|
| 40 |
+
|
| 41 |
+
## πΉ Citation
|
| 42 |
+
If you use this system in research, please cite:
|
| 43 |
+
|
| 44 |
+
- [Sentence-BERT](https://arxiv.org/abs/1908.10084)
|
| 45 |
+
- [MS MARCO](https://microsoft.github.io/msmarco/)
|
| 46 |
+
|
| 47 |
---
|
| 48 |
|
| 49 |
+
## πΉ Author
|
| 50 |
+
Built for a research project on **user-centered evaluation of semantic search systems**.
|