Spaces:
Running
Running
Add usage instructions to retrieve_kaggle.py
Browse files
benchmarks/retrieval/retrieve_kaggle.py
CHANGED
|
@@ -1,4 +1,19 @@
|
|
| 1 |
-
"""Script to call retrieval on the Kaggle dataset.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 2 |
|
| 3 |
import csv
|
| 4 |
import json
|
|
|
|
| 1 |
+
"""Script to call retrieval on the Kaggle dataset.
|
| 2 |
+
|
| 3 |
+
Steps:
|
| 4 |
+
1. Make sure that your repository is already indexed. You can find instructions in the README for how to run the `sage-index` command.
|
| 5 |
+
2. Download the test file from the Kaggle competition (https://www.kaggle.com/competitions/code-retrieval-for-hugging-face-transformers/data). You will pass the path to this file via the --benchmark flag below.
|
| 6 |
+
3. Run this script:
|
| 7 |
+
```
|
| 8 |
+
# After you cloned the repository:
|
| 9 |
+
cd sage
|
| 10 |
+
pip install -e .
|
| 11 |
+
|
| 12 |
+
# Run the actual retrieval script. Your flags may vary, but this is one example:
|
| 13 |
+
python benchmarks/retrieval/retrieve_kaggle.py --benchmark=/path/to/kaggle/test/file.csv --mode=remote --pinecone-index-name=your-index --index-namespace=your-namespace
|
| 14 |
+
```
|
| 15 |
+
To see a full list of flags, checkout config.py (https://github.com/Storia-AI/sage/blob/main/sage/config.py).
|
| 16 |
+
"""
|
| 17 |
|
| 18 |
import csv
|
| 19 |
import json
|