Update README.md
Browse files
README.md
CHANGED
|
@@ -14,7 +14,7 @@ base_model: sentence-transformers/all-MiniLM-L6-v2
|
|
| 14 |
|
| 15 |
# DARE: Distribution-Aware Retrieval for R Functions
|
| 16 |
|
| 17 |
-
DARE (Distribution-Aware Retrieval Embedding) is a specialized bi-encoder model designed to retrieve statistical and data analysis tools (R functions) based on **both
|
| 18 |
|
| 19 |
It is fine-tuned from `sentence-transformers/all-MiniLM-L6-v2` to serve as a high-precision tool retrieval module for Large Language Model (LLM) Agents in automated data science workflows.
|
| 20 |
|
|
@@ -22,13 +22,12 @@ It is fine-tuned from `sentence-transformers/all-MiniLM-L6-v2` to serve as a hig
|
|
| 22 |
- **Architecture:** Bi-encoder (Sentence Transformer)
|
| 23 |
- **Base Model:** `sentence-transformers/all-MiniLM-L6-v2` (22.7M parameters)
|
| 24 |
- **Task:** Dense Retrieval for Tool-Augmented LLMs
|
|
|
|
| 25 |
- **Domain:** R programming language, Data Science, Statistical Analysis functions
|
| 26 |
- **Max Sequence Length:** 256 tokens
|
| 27 |
|
| 28 |
-
## 💡 Why DARE? (The Input Formatting)
|
| 29 |
-
Unlike traditional semantic search models that only take a natural language query, DARE is trained to be **distribution-conditional**. It expects a concatenated input of the user's intent AND the data profile (e.g., high-dimensional, sparse, categorical).
|
| 30 |
-
|
| 31 |
-
To get optimal retrieval results, **do not just pass the raw query**. Append the data constraints as a JSON-like string at the end of the query.
|
| 32 |
|
| 33 |
### Usage (Sentence-Transformers)
|
| 34 |
|
|
|
|
| 14 |
|
| 15 |
# DARE: Distribution-Aware Retrieval for R Functions
|
| 16 |
|
| 17 |
+
DARE (Distribution-Aware Retrieval Embedding) is a specialized bi-encoder model designed to retrieve statistical and data analysis tools (R functions) based on **both user queries and conditional on data profile**.
|
| 18 |
|
| 19 |
It is fine-tuned from `sentence-transformers/all-MiniLM-L6-v2` to serve as a high-precision tool retrieval module for Large Language Model (LLM) Agents in automated data science workflows.
|
| 20 |
|
|
|
|
| 22 |
- **Architecture:** Bi-encoder (Sentence Transformer)
|
| 23 |
- **Base Model:** `sentence-transformers/all-MiniLM-L6-v2` (22.7M parameters)
|
| 24 |
- **Task:** Dense Retrieval for Tool-Augmented LLMs
|
| 25 |
+
- **Performance**: SoTA on R package retrieval tasks.
|
| 26 |
- **Domain:** R programming language, Data Science, Statistical Analysis functions
|
| 27 |
- **Max Sequence Length:** 256 tokens
|
| 28 |
|
| 29 |
+
<!-- ## 💡 Why DARE? (The Input Formatting)
|
| 30 |
+
Unlike traditional semantic search models that only take a natural language query, DARE is trained to be **distribution-conditional**. It expects a concatenated input of the user's intent AND the data profile (e.g., high-dimensional, sparse, categorical). -->
|
|
|
|
|
|
|
| 31 |
|
| 32 |
### Usage (Sentence-Transformers)
|
| 33 |
|