Stephen-SMJ commited on
Commit
1d99f91
·
verified ·
1 Parent(s): 4a8f15d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -5
README.md CHANGED
@@ -14,7 +14,7 @@ base_model: sentence-transformers/all-MiniLM-L6-v2
14
 
15
  # DARE: Distribution-Aware Retrieval for R Functions
16
 
17
- DARE (Distribution-Aware Retrieval Embedding) is a specialized bi-encoder model designed to retrieve statistical and data analysis tools (R functions) based on **both natural language user queries and underlying data constraints** (data profiles).
18
 
19
  It is fine-tuned from `sentence-transformers/all-MiniLM-L6-v2` to serve as a high-precision tool retrieval module for Large Language Model (LLM) Agents in automated data science workflows.
20
 
@@ -22,13 +22,12 @@ It is fine-tuned from `sentence-transformers/all-MiniLM-L6-v2` to serve as a hig
22
  - **Architecture:** Bi-encoder (Sentence Transformer)
23
  - **Base Model:** `sentence-transformers/all-MiniLM-L6-v2` (22.7M parameters)
24
  - **Task:** Dense Retrieval for Tool-Augmented LLMs
 
25
  - **Domain:** R programming language, Data Science, Statistical Analysis functions
26
  - **Max Sequence Length:** 256 tokens
27
 
28
- ## 💡 Why DARE? (The Input Formatting)
29
- Unlike traditional semantic search models that only take a natural language query, DARE is trained to be **distribution-conditional**. It expects a concatenated input of the user's intent AND the data profile (e.g., high-dimensional, sparse, categorical).
30
-
31
- To get optimal retrieval results, **do not just pass the raw query**. Append the data constraints as a JSON-like string at the end of the query.
32
 
33
  ### Usage (Sentence-Transformers)
34
 
 
14
 
15
  # DARE: Distribution-Aware Retrieval for R Functions
16
 
17
+ DARE (Distribution-Aware Retrieval Embedding) is a specialized bi-encoder model designed to retrieve statistical and data analysis tools (R functions) based on **both user queries and conditional on data profile**.
18
 
19
  It is fine-tuned from `sentence-transformers/all-MiniLM-L6-v2` to serve as a high-precision tool retrieval module for Large Language Model (LLM) Agents in automated data science workflows.
20
 
 
22
  - **Architecture:** Bi-encoder (Sentence Transformer)
23
  - **Base Model:** `sentence-transformers/all-MiniLM-L6-v2` (22.7M parameters)
24
  - **Task:** Dense Retrieval for Tool-Augmented LLMs
25
+ - **Performance**: SoTA on R package retrieval tasks.
26
  - **Domain:** R programming language, Data Science, Statistical Analysis functions
27
  - **Max Sequence Length:** 256 tokens
28
 
29
+ <!-- ## 💡 Why DARE? (The Input Formatting)
30
+ Unlike traditional semantic search models that only take a natural language query, DARE is trained to be **distribution-conditional**. It expects a concatenated input of the user's intent AND the data profile (e.g., high-dimensional, sparse, categorical). -->
 
 
31
 
32
  ### Usage (Sentence-Transformers)
33