monsimas commited on
Commit
1479328
·
verified ·
1 Parent(s): 8e7749a

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +152 -0
README.md ADDED
@@ -0,0 +1,152 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - ministere-culture/comparia-conversations
4
+ - anon8231489123/ShareGPT_Vicuna_unfiltered
5
+ language:
6
+ - fr
7
+ - en
8
+ base_model:
9
+ - answerdotai/ModernBERT-base
10
+ pipeline_tag: text-classification
11
+ ---
12
+
13
+ # 🚦 La Route 2.0 — AI Prompt Router
14
+
15
+ La Route 2.0 is like **a GPS for AI prompts.**
16
+ When you give it a piece of text (a question, a request, or any message), it analyzes it and decides:
17
+
18
+ - **How sensitive** the content is (low / high)
19
+ - **What size model** you need (small / large)
20
+ - **Which tool** is best to answer (an offline LLM, an LLM with extra research abilities, or a search engine)
21
+
22
+ The goal: ✅ **save resources, improve safety, and get better answers** by sending each prompt to the right place instead of using the same heavy model for everything.
23
+
24
+ ---
25
+
26
+ ## 📊 What It Predicts
27
+
28
+ | Task | Labels |
29
+ |-------------|--------------------------------------------------------------|
30
+ | Sensitivity | `low`, `high` |
31
+ | Model size | `small`, `large` |
32
+ | Best tool | `LLM-with-research-mode`, `Offline-LLM`, `Search-engine` |
33
+
34
+ ---
35
+
36
+ ## 🔎 How It Works (In Simple Terms)
37
+
38
+ 1. **You send a prompt** (e.g. *"Who is the Prime Minister of Canada?"*)
39
+ 2. The model classifies it:
40
+ - Sensitivity → Low
41
+ - Model size → Small
42
+ - Best tool → Search engine
43
+ 3. The system then **routes the prompt** to the cheapest, safest, or most efficient tool.
44
+
45
+ It’s like a **traffic controller** for prompts — making sure each one takes the best route to the right “answering engine.”
46
+
47
+ ---
48
+
49
+ ## 🖼️ Workflow Diagram
50
+
51
+ *(add an exported image file `workflow.png` with this chart so it displays on Hugging Face)*
52
+
53
+ ```text
54
+ User Prompt
55
+
56
+
57
+ Shared ModernBERT Encoder
58
+
59
+ ├── Sensitivity → low/high
60
+ ├── Model Size → small/large
61
+ └── Best Tool → LLM / Offline-LLM / Search Engine
62
+
63
+
64
+ Route to Best Model for Answer
65
+ ```
66
+
67
+ ---
68
+
69
+ ## 💡 Why use La Route 2.0?
70
+
71
+ - **⚖️ Safer by design**: Prompts are automatically routed to the **most appropriate model**. Instead of forcing *all* requests through the strictest (or loosest) setup, you can use **cloud LLMs for everyday, non‑sensitive queries** and keep **sensitive prompts on secure, on‑premise models**.
72
+ - **💸 More efficient**: Don’t waste compute on heavyweight models when a smaller one will do. This saves **costs, energy, and latency** by balancing resources intelligently.
73
+ - **🛠 Right tool for the job**: Not all prompts need an LLM. For factual lookups, a **search engine** may be faster and more accurate. For longer reasoning, a **research‑mode LLM** is better. Routing ensures **each request is solved by the tool best suited to it**.
74
+
75
+ ---
76
+
77
+ ## 🔧 Quick Usage Example
78
+
79
+ ```python
80
+ from transformers import AutoTokenizer, AutoModel
81
+ from huggingface_hub import snapshot_download
82
+ import torch, json, torch.nn.functional as F
83
+
84
+ repo_id = "monsimas/la-route-2"
85
+ model_dir = snapshot_download(repo_id)
86
+
87
+ tokenizer = AutoTokenizer.from_pretrained(model_dir)
88
+
89
+ # Load label maps
90
+ with open(f"{model_dir}/label_maps.json") as f:
91
+ label_maps = json.load(f)
92
+ with open(f"{model_dir}/num_labels.json") as f:
93
+ num_labels_dict = json.load(f)
94
+
95
+ # Define model
96
+ class MultiTaskModel(torch.nn.Module):
97
+ def __init__(self, shared_model, num_labels_dict):
98
+ super().__init__()
99
+ self.shared_model = shared_model
100
+ h = shared_model.config.hidden_size
101
+ self.heads = torch.nn.ModuleDict({
102
+ task: torch.nn.Linear(h, n) for task, n in num_labels_dict.items()
103
+ })
104
+ def forward(self, input_ids, attention_mask):
105
+ out = self.shared_model(input_ids=input_ids, attention_mask=attention_mask)
106
+ pooled = out.last_hidden_state[:,0]
107
+ return {t: self.heads[t](pooled) for t in self.heads}
108
+
109
+ # Load base encoder + multitask heads
110
+ base_model = AutoModel.from_pretrained("answerdotai/ModernBERT-base")
111
+ model = MultiTaskModel(base_model, num_labels_dict)
112
+ state_dict = torch.load(f"{model_dir}/model_state.pt", map_location="cpu")
113
+ model.load_state_dict(state_dict)
114
+ model.eval()
115
+
116
+ def classify_text(text):
117
+ inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=384, padding=True)
118
+ with torch.no_grad():
119
+ logits = model(**inputs)
120
+ predictions = {}
121
+ for task, logit in logits.items():
122
+ probs = F.softmax(logit, dim=-1)
123
+ pred = torch.argmax(probs, dim=-1).item()
124
+ predictions[task] = {
125
+ "label": label_maps[task][str(pred)],
126
+ "confidence": float(probs[0, pred])
127
+ }
128
+ return predictions
129
+
130
+ print(classify_text("Who is the Prime Minister of Canada?"))
131
+ ```
132
+
133
+ ---
134
+
135
+ ## 🛠️ Training Details
136
+ - **Base model:** `answerdotai/ModernBERT-base`
137
+ - **Data:** Compar:IA-conversations + ShareGPT (augmented for coverage)
138
+ - **Max length:** 384 tokens
139
+ - **Batch size:** 8
140
+ - **Learning rate:** 5e‑5
141
+ - **Multitask heads:** Sensitivity, Model Size, Best Tool
142
+
143
+ ---
144
+
145
+ ## ⚖️ Limitations
146
+ - Tool and label definitions are domain-specific.
147
+ - The classifier does **not** generate answers itself — only routes prompts.
148
+ - Sensitive classification may mislabel edge cases.
149
+
150
+ ---
151
+
152
+ ```