Spaces:
Sleeping
Sleeping
Dynamic Tool Exposure Design for ToGMAL MCP
Date: October 18, 2025
Status: Design Proposal
Impact: Moderate - improves efficiency, enables ML-driven tool discovery
Problem Statement
Current ToGMAL MCP exposes all 5 tools at startup, regardless of conversation context:
check_math_physicscheck_medical_advicecheck_file_operationscheck_code_qualitycheck_claims
Issues:
- LLM must decide which tools are relevant (cognitive overhead)
- Irrelevant tools clutter the tool list
- No way to automatically add ML-discovered limitation checks
- Fixed architecture doesn't scale to 10+ professional domains
Proposed Solution
Dynamic Tool Exposure based on:
- Conversation context (what domain is being discussed?)
- ML clustering results (what new patterns were discovered?)
- User metadata (what domains does this user work in?)
Design Changes
1. Context-Aware Tool Filtering
Current:
# server.py
@server.list_tools()
async def list_tools() -> list[Tool]:
# Always returns all 5 tools
return [
Tool(name="check_math_physics", ...),
Tool(name="check_medical_advice", ...),
Tool(name="check_file_operations", ...),
Tool(name="check_code_quality", ...),
Tool(name="check_claims", ...),
]
Proposed:
# server.py
from typing import Optional
from .context_analyzer import analyze_conversation_context
@server.list_tools()
async def list_tools(
conversation_history: Optional[list[dict]] = None,
user_context: Optional[dict] = None
) -> list[Tool]:
"""
Dynamically expose tools based on conversation context
Args:
conversation_history: Recent messages for domain detection
user_context: User metadata (role, industry, preferences)
"""
# Detect relevant domains from conversation
domains = await analyze_conversation_context(
conversation_history=conversation_history,
user_context=user_context
)
# Build tool list based on detected domains
tools = []
# Core tools (always available)
tools.append(Tool(name="check_claims", ...)) # General-purpose
# Domain-specific tools (conditional)
if "mathematics" in domains or "physics" in domains:
tools.append(Tool(name="check_math_physics", ...))
if "medicine" in domains or "healthcare" in domains:
tools.append(Tool(name="check_medical_advice", ...))
if "coding" in domains or "file_system" in domains:
tools.append(Tool(name="check_file_operations", ...))
tools.append(Tool(name="check_code_quality", ...))
# ML-discovered tools (dynamic)
if ML_CLUSTERING_ENABLED:
ml_tools = await get_ml_discovered_tools(domains)
tools.extend(ml_tools)
return tools
2. Context Analyzer Module
New file: togmal/context_analyzer.py
"""
Context analyzer for domain detection
Determines which limitation checks are relevant
"""
import re
from typing import List, Dict, Any, Optional
from collections import Counter
# Domain keywords mapping
DOMAIN_KEYWORDS = {
"mathematics": ["math", "calculus", "algebra", "geometry", "proof", "theorem", "equation"],
"physics": ["physics", "force", "energy", "quantum", "relativity", "mechanics"],
"medicine": ["medical", "diagnosis", "treatment", "symptom", "disease", "patient", "doctor"],
"healthcare": ["health", "medication", "drug", "therapy", "clinical"],
"law": ["legal", "law", "court", "regulation", "compliance", "attorney", "contract"],
"finance": ["financial", "investment", "stock", "portfolio", "trading", "tax"],
"coding": ["code", "programming", "function", "class", "debug", "git", "api"],
"file_system": ["file", "directory", "path", "write", "delete", "permission"],
}
async def analyze_conversation_context(
conversation_history: Optional[List[Dict[str, str]]] = None,
user_context: Optional[Dict[str, Any]] = None,
threshold: float = 0.3
) -> List[str]:
"""
Analyze conversation to detect relevant domains
Args:
conversation_history: Recent messages [{"role": "user", "content": "..."}]
user_context: User metadata {"industry": "healthcare", "role": "developer"}
threshold: Minimum confidence to include domain (0-1)
Returns:
List of detected domains, e.g., ["mathematics", "coding"]
"""
detected_domains = set()
# Strategy 1: Keyword matching in conversation
if conversation_history:
domain_scores = _score_domains_by_keywords(conversation_history)
# Add domains above threshold
for domain, score in domain_scores.items():
if score >= threshold:
detected_domains.add(domain)
# Strategy 2: User context hints
if user_context:
if "industry" in user_context:
industry = user_context["industry"].lower()
# Map industry to domains
if "health" in industry or "medical" in industry:
detected_domains.update(["medicine", "healthcare"])
elif "tech" in industry or "software" in industry:
detected_domains.add("coding")
elif "finance" in industry or "bank" in industry:
detected_domains.add("finance")
# Strategy 3: Always include if explicitly mentioned in last message
if conversation_history and len(conversation_history) > 0:
last_message = conversation_history[-1].get("content", "").lower()
for domain, keywords in DOMAIN_KEYWORDS.items():
if any(kw in last_message for kw in keywords):
detected_domains.add(domain)
return list(detected_domains)
def _score_domains_by_keywords(
conversation_history: List[Dict[str, str]],
recent_weight: float = 2.0
) -> Dict[str, float]:
"""
Score domains based on keyword frequency (recent messages weighted higher)
Returns:
Dict of {domain: score} normalized 0-1
"""
domain_counts = Counter()
total_messages = len(conversation_history)
for i, message in enumerate(conversation_history):
content = message.get("content", "").lower()
# Weight recent messages higher
recency_weight = 1.0 + (i / total_messages) * (recent_weight - 1.0)
for domain, keywords in DOMAIN_KEYWORDS.items():
matches = sum(1 for kw in keywords if kw in content)
domain_counts[domain] += matches * recency_weight
# Normalize scores
max_count = max(domain_counts.values()) if domain_counts else 1
return {
domain: count / max_count
for domain, count in domain_counts.items()
}
3. ML-Discovered Tools Integration
New file: togmal/ml_tools.py
"""
Dynamically generate tools from ML clustering results
"""
from typing import List, Optional
from mcp.types import Tool
import json
from pathlib import Path
ML_TOOLS_CACHE_PATH = Path("./data/ml_discovered_tools.json")
async def get_ml_discovered_tools(
relevant_domains: Optional[List[str]] = None
) -> List[Tool]:
"""
Load ML-discovered limitation checks as MCP tools
Args:
relevant_domains: Only return tools for these domains (None = all)
Returns:
List of dynamically generated Tool objects
"""
if not ML_TOOLS_CACHE_PATH.exists():
return []
# Load ML-discovered patterns
with open(ML_TOOLS_CACHE_PATH) as f:
ml_patterns = json.load(f)
tools = []
for pattern in ml_patterns.get("patterns", []):
domain = pattern.get("domain")
# Filter by relevant domains
if relevant_domains and domain not in relevant_domains:
continue
# Only include high-confidence patterns
if pattern.get("confidence", 0) < 0.8:
continue
# Generate tool dynamically
tool = Tool(
name=f"check_{pattern['id']}",
description=pattern["description"],
inputSchema={
"type": "object",
"properties": {
"prompt": {"type": "string"},
"response": {"type": "string"}
},
"required": ["prompt", "response"]
}
)
tools.append(tool)
return tools
async def update_ml_tools_cache(research_pipeline_output: dict):
"""
Called by research pipeline to update available ML tools
Args:
research_pipeline_output: Latest clustering/anomaly detection results
"""
# Extract high-confidence patterns
patterns = []
for cluster in research_pipeline_output.get("clusters", []):
if cluster.get("is_dangerous", False) and cluster.get("purity", 0) > 0.7:
pattern = {
"id": cluster["id"],
"domain": cluster["domain"],
"description": f"Check for {cluster['pattern_description']}",
"confidence": cluster["purity"],
"heuristic": cluster.get("detection_rule", ""),
"examples": cluster.get("examples", [])[:3]
}
patterns.append(pattern)
# Save to cache
ML_TOOLS_CACHE_PATH.parent.mkdir(parents=True, exist_ok=True)
with open(ML_TOOLS_CACHE_PATH, 'w') as f:
json.dump({
"updated_at": research_pipeline_output["timestamp"],
"patterns": patterns
}, f, indent=2)
4. Tool Handler Registration
Modified: togmal/server.py
# Dynamic handler registration for ML tools
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
"""
Route tool calls to appropriate handlers
Supports both static and ML-discovered tools
"""
# Static tools (existing)
if name == "check_math_physics":
return await check_math_physics(**arguments)
elif name == "check_medical_advice":
return await check_medical_advice(**arguments)
# ... etc
# ML-discovered tools (dynamic)
elif name.startswith("check_ml_"):
return await handle_ml_tool(name, arguments)
else:
raise ValueError(f"Unknown tool: {name}")
async def handle_ml_tool(tool_name: str, arguments: dict) -> list[TextContent]:
"""
Execute ML-discovered limitation check
Args:
tool_name: e.g., "check_ml_cluster_47"
arguments: {"prompt": "...", "response": "..."}
"""
# Load ML pattern definition
pattern = await load_ml_pattern(tool_name)
if not pattern:
return [TextContent(
type="text",
text=f"Error: ML pattern not found for {tool_name}"
)]
# Run heuristic check
result = await run_ml_heuristic(
prompt=arguments["prompt"],
response=arguments["response"],
heuristic=pattern["heuristic"],
examples=pattern["examples"]
)
return [TextContent(
type="text",
text=json.dumps(result, indent=2)
)]
Configuration
New file: togmal/config.py
"""Configuration for dynamic tool exposure"""
# Enable/disable dynamic behavior
DYNAMIC_TOOLS_ENABLED = True
# Enable ML-discovered tools
ML_CLUSTERING_ENABLED = True
# Context analysis settings
DOMAIN_DETECTION_THRESHOLD = 0.3 # 0-1, confidence required
CONVERSATION_HISTORY_LENGTH = 10 # How many messages to analyze
# ML tools settings
ML_TOOLS_MIN_CONFIDENCE = 0.8 # Only expose high-confidence patterns
ML_TOOLS_CACHE_TTL = 3600 # Seconds to cache ML tools
# Always-available tools (never filtered)
CORE_TOOLS = ["check_claims"] # General-purpose checks
Example Usage
Before (Static)
# LLM sees all 5 tools regardless of context
tools = [
"check_math_physics", # Not relevant
"check_medical_advice", # Not relevant
"check_file_operations", # RELEVANT
"check_code_quality", # RELEVANT
"check_claims" # RELEVANT
]
# User: "How do I delete all files in a directory?"
# LLM must reason about which tools to use
After (Dynamic)
# Conversation: "How do I delete all files in a directory?"
# Detected domains: ["coding", "file_system"]
tools = [
"check_file_operations", # ✅ Relevant
"check_code_quality", # ✅ Relevant
"check_claims" # ✅ Core tool
# check_math_physics - filtered out
# check_medical_advice - filtered out
]
# Cleaner tool list, LLM focuses on relevant checks
With ML Tools
# After research pipeline discovers new pattern:
# "Users frequently attempt dangerous recursive deletions"
# Next conversation about file operations:
tools = [
"check_file_operations",
"check_code_quality",
"check_claims",
"check_ml_recursive_delete_danger" # ✅ Auto-added by ML!
]
Implementation Priority
Phase 1 (Week 1): Context analyzer
- Implement keyword-based domain detection
- Add conversation history parameter to
list_tools() - Test with existing 5 tools
Phase 2 (Week 2): ML tool integration
- Create
ml_tools.pymodule - Implement tool caching from research pipeline
- Dynamic handler registration
Phase 3 (Week 3): Optimization
- Add user context hints
- Improve domain detection accuracy
- Performance testing
Benefits
- Reduced Cognitive Load: LLM sees only relevant tools
- Scalability: Can add 10+ domains without overwhelming LLM
- ML Integration: Research pipeline automatically exposes new checks
- Efficiency: Fewer irrelevant tool calls
- Personalization: Tools adapt to user context
Backward Compatibility
Option 1 (Recommended): Feature flag
if DYNAMIC_TOOLS_ENABLED:
tools = await list_tools_dynamic(conversation_history)
else:
tools = await list_tools_static() # Original behavior
Option 2: MCP protocol parameter
# Client can request static or dynamic
@server.list_tools()
async def list_tools(mode: str = "dynamic") -> list[Tool]:
if mode == "static":
return ALL_TOOLS
else:
return filter_tools_by_context()
Testing Strategy
# tests/test_dynamic_tools.py
async def test_math_context_exposes_math_tool():
conversation = [
{"role": "user", "content": "What's the derivative of x^2?"}
]
tools = await list_tools(conversation_history=conversation)
tool_names = [t.name for t in tools]
assert "check_math_physics" in tool_names
assert "check_medical_advice" not in tool_names
async def test_medical_context_exposes_medical_tool():
conversation = [
{"role": "user", "content": "What are symptoms of diabetes?"}
]
tools = await list_tools(conversation_history=conversation)
tool_names = [t.name for t in tools]
assert "check_medical_advice" in tool_names
assert "check_math_physics" not in tool_names
async def test_ml_tool_added_after_research_update():
# Simulate research pipeline discovering new pattern
research_output = {
"timestamp": "2025-10-18T10:00:00Z",
"clusters": [
{
"id": "cluster_recursive_delete",
"domain": "file_system",
"is_dangerous": True,
"purity": 0.92,
"pattern_description": "recursive deletion without confirmation",
"detection_rule": "check for 'rm -rf' or 'shutil.rmtree' without safeguards"
}
]
}
await update_ml_tools_cache(research_output)
# Check that new tool is exposed
conversation = [{"role": "user", "content": "Delete all files recursively"}]
tools = await list_tools(conversation_history=conversation)
tool_names = [t.name for t in tools]
assert "check_ml_cluster_recursive_delete" in tool_names
Future Enhancements
- Semantic Analysis: Use embeddings for domain detection (more accurate)
- User Learning: Remember which tools user frequently needs
- Proactive Suggestions: "This conversation may benefit from medical advice check"
- Tool Composition: Combine multiple ML patterns into meta-tools
- A/B Testing: Measure if dynamic exposure improves safety outcomes
Decision
Recommendation: ✅ Implement dynamic tool exposure
Rationale:
- Essential for scaling beyond 5 tools
- Enables ML-driven tool discovery (key innovation!)
- Improves LLM efficiency
- Maintains backward compatibility
- Relatively low implementation cost (~1 week)
When: Implement in Phase 2 of integration (after core ToGMAL-Aqumen bidirectional flow working)