Spaces:

Asish22
/

code-crawler

Sleeping

juliaturc commited on Oct 3, 2024

Commit

7cec107

1 Parent(s): 292215b

Default --retrieval-alpha to 1.0

We've shown on our benchmark that BM-25 is actively damaging when retrieving code. Also, it comes with some overhead (needing nltk models, etc.) so it makes sense to default to dense-retrieval only.

Files changed (1) hide show

sage/config.py +1 -1

sage/config.py CHANGED Viewed

@@ -137,7 +137,7 @@ def add_vector_store_args(parser: ArgumentParser) -> Callable:
     )
     parser.add(
         "--retrieval-alpha",
-        default=0.5,
         type=float,
         help="Takes effect for Pinecone retriever only. The weight of the dense (embeddings-based) vs sparse (BM25) "
         "encoder in the final retrieval score. A value of 0.0 means BM25 only, 1.0 means embeddings only.",

     )
     parser.add(
         "--retrieval-alpha",
+        default=1.0,
         type=float,
         help="Takes effect for Pinecone retriever only. The weight of the dense (embeddings-based) vs sparse (BM25) "
         "encoder in the final retrieval score. A value of 0.0 means BM25 only, 1.0 means embeddings only.",