dejanseo
/

query-grounding

@@ -20,40 +20,50 @@ tags:
 [![Dejan AI Logo](https://dejan.ai/wp-content/uploads/2024/02/dejan.png)](https://dejan.ai/blog/grounding-classifier/)
-## Prompt Grounding Classifier — DeBERTa v3 Large (Fine-Tuned)
-This model predicts whether a natural language prompt **requires grounding** in external sources such as search, database, or retrieval-augmented generation (RAG).
-It was fine-tuned from [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) using a binary label format (`1 = requires grounding`, `0 = self-contained`).
-### Why this matters
-Routing decisions matter. This classifier acts as a gatekeeper for LLM pipelines by predicting whether a prompt should trigger external retrieval. It optimizes performance, reduces latency, and avoids unnecessary API calls.
 ---
-## Model Details
-- 🧠 **Architecture**: DeBERTa v3 Large
-- ⚙️ **Training**: Full fine-tuning (no PEFT)
-- 🧪 **Batch size**: 24 (with accumulation)
-- 🔁 **Scheduler**: Cosine learning rate decay with warmup
-- 📉 **Dropout adjusted**: 0.1 for attention and hidden layers
-- 📦 **Final checkpoint size**: ~1.7 GB
 ---
-## Example Predictions
-| Prompt                                               | Grounding | Confidence |
-|------------------------------------------------------|-----------|------------|
-| What’s the exchange rate for USD to Yen right now?   |     1     |   0.999    |
-| Tell me a bedtime story about a robot and a dragon.  |     0     |   0.996    |
-| Who is the current CEO of Microsoft?                 |     1     |   0.998    |
 ---
-## How to Use
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
@@ -68,3 +78,13 @@ outputs = model(**inputs).logits
 probs = F.softmax(outputs, dim=-1)
 label = probs.argmax().item()
 confidence = probs[0][label].item()

 [![Dejan AI Logo](https://dejan.ai/wp-content/uploads/2024/02/dejan.png)](https://dejan.ai/blog/grounding-classifier/)
+# Prompt Grounding Classifier — DeBERTa v3 Large (Fine-Tuned)
+This model predicts whether a prompt **requires grounding** in external sources like web search, databases, or RAG pipelines.
+It was fine-tuned from [microsoft/deberta-v3-large](https://huggingface.co/microsoft/deberta-v3-large) using binary labels:
+- `1` = grounding required
+- `0` = self-contained prompt
+---
+## 🚀 Use Case
+This classifier acts as a **routing layer** in an LLM pipeline, helping decide:
+- When to trigger retrieval
+- When to let the model respond from internal knowledge
+- How to optimize for latency and cost
 ---
+## 📦 Training Details
+- Model: DeBERTa v3 Large
+- Fine-tuning: Full (no adapters)
+- Dropout: 0.1
+- Scheduler: Cosine with warmup
+- Batch size: 24 (accumulated)
+- Evaluation: every 500 steps
+- Metric used for best checkpoint: F1
 ---
+## 🧪 Example Predictions
+| Prompt                                                  | Grounding | Confidence |
+|---------------------------------------------------------|-----------|------------|
+| What’s the exchange rate for USD to Yen right now?      | 1         | 0.999      |
+| Tell me a bedtime story about a robot and a dragon.     | 0         | 0.9961     |
+| Who is the current CEO of Microsoft?                    | 1         | 0.9986     |
 ---
+## 🧠 How to Use
 ```python
 from transformers import AutoTokenizer, AutoModelForSequenceClassification
 probs = F.softmax(outputs, dim=-1)
 label = probs.argmax().item()
 confidence = probs[0][label].item()
+```
+---
+## 🧾 Dataset Origin
+Prompts were collected using a Gemini 2.5 Pro + Google Search toolchain with grounding enabled. Each prompt's response was parsed to extract Gemini's grounding confidence, used as soft supervision for binary labeling:
+- Label 1 if grounded confidence present
+- Label 0 if response required no external evidence