monsimas
/

la-route-2

+---
+datasets:
+- ministere-culture/comparia-conversations
+- anon8231489123/ShareGPT_Vicuna_unfiltered
+language:
+- fr
+- en
+base_model:
+- answerdotai/ModernBERT-base
+pipeline_tag: text-classification
+---
+# 🚦 La Route 2.0 — AI Prompt Router
+La Route 2.0 is like **a GPS for AI prompts.**
+When you give it a piece of text (a question, a request, or any message), it analyzes it and decides:
+- **How sensitive** the content is (low / high)
+- **What size model** you need (small / large)
+- **Which tool** is best to answer (an offline LLM, an LLM with extra research abilities, or a search engine)
+The goal: ✅ **save resources, improve safety, and get better answers** by sending each prompt to the right place instead of using the same heavy model for everything.
+---
+## 📊 What It Predicts
+| Task        | Labels                                                       |
+|-------------|--------------------------------------------------------------|
+| Sensitivity | `low`, `high`                                               |
+| Model size  | `small`, `large`                                            |
+| Best tool   | `LLM-with-research-mode`, `Offline-LLM`, `Search-engine`    |
+---
+## 🔎 How It Works (In Simple Terms)
+1. **You send a prompt** (e.g. *"Who is the Prime Minister of Canada?"*)
+2. The model classifies it:
+   - Sensitivity → Low
+   - Model size → Small
+   - Best tool → Search engine
+3. The system then **routes the prompt** to the cheapest, safest, or most efficient tool.
+It’s like a **traffic controller** for prompts — making sure each one takes the best route to the right “answering engine.”
+---
+## 🖼️ Workflow Diagram
+*(add an exported image file `workflow.png` with this chart so it displays on Hugging Face)*
+```text
+User Prompt
+     │
+     ▼
+Shared ModernBERT Encoder
+     │
+     ├── Sensitivity → low/high
+     ├── Model Size → small/large
+     └── Best Tool → LLM / Offline-LLM / Search Engine
+     │
+     ▼
+ Route to Best Model for Answer
+```
+---
+## 💡 Why use La Route 2.0?
+- **⚖️ Safer by design**: Prompts are automatically routed to the **most appropriate model**. Instead of forcing *all* requests through the strictest (or loosest) setup, you can use **cloud LLMs for everyday, non‑sensitive queries** and keep **sensitive prompts on secure, on‑premise models**.
+- **💸 More efficient**: Don’t waste compute on heavyweight models when a smaller one will do. This saves **costs, energy, and latency** by balancing resources intelligently.
+- **🛠 Right tool for the job**: Not all prompts need an LLM. For factual lookups, a **search engine** may be faster and more accurate. For longer reasoning, a **research‑mode LLM** is better. Routing ensures **each request is solved by the tool best suited to it**.
+---
+## 🔧 Quick Usage Example
+```python
+from transformers import AutoTokenizer, AutoModel
+from huggingface_hub import snapshot_download
+import torch, json, torch.nn.functional as F
+repo_id = "monsimas/la-route-2"
+model_dir = snapshot_download(repo_id)
+tokenizer = AutoTokenizer.from_pretrained(model_dir)
+# Load label maps
+with open(f"{model_dir}/label_maps.json") as f:
+    label_maps = json.load(f)
+with open(f"{model_dir}/num_labels.json") as f:
+    num_labels_dict = json.load(f)
+# Define model
+class MultiTaskModel(torch.nn.Module):
+    def __init__(self, shared_model, num_labels_dict):
+        super().__init__()
+        self.shared_model = shared_model
+        h = shared_model.config.hidden_size
+        self.heads = torch.nn.ModuleDict({
+            task: torch.nn.Linear(h, n) for task, n in num_labels_dict.items()
+        })
+    def forward(self, input_ids, attention_mask):
+        out = self.shared_model(input_ids=input_ids, attention_mask=attention_mask)
+        pooled = out.last_hidden_state[:,0]
+        return {t: self.heads[t](pooled) for t in self.heads}
+# Load base encoder + multitask heads
+base_model = AutoModel.from_pretrained("answerdotai/ModernBERT-base")
+model = MultiTaskModel(base_model, num_labels_dict)
+state_dict = torch.load(f"{model_dir}/model_state.pt", map_location="cpu")
+model.load_state_dict(state_dict)
+model.eval()
+def classify_text(text):
+    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=384, padding=True)
+    with torch.no_grad():
+        logits = model(**inputs)
+    predictions = {}
+    for task, logit in logits.items():
+        probs = F.softmax(logit, dim=-1)
+        pred = torch.argmax(probs, dim=-1).item()
+        predictions[task] = {
+            "label": label_maps[task][str(pred)],
+            "confidence": float(probs[0, pred])
+        }
+    return predictions
+print(classify_text("Who is the Prime Minister of Canada?"))
+```
+---
+## 🛠️ Training Details
+- **Base model:** `answerdotai/ModernBERT-base`
+- **Data:** Compar:IA-conversations + ShareGPT (augmented for coverage)
+- **Max length:** 384 tokens
+- **Batch size:** 8
+- **Learning rate:** 5e‑5
+- **Multitask heads:** Sensitivity, Model Size, Best Tool
+---
+## ⚖️ Limitations
+- Tool and label definitions are domain-specific.
+- The classifier does **not** generate answers itself — only routes prompts.
+- Sensitive classification may mislabel edge cases.
+---
+```