Spaces:

JustTheStatsHuman
/

Togmal-demo

Sleeping

App Files Files Community

Togmal-demo / DYNAMIC_TOOLS_DESIGN.md

HeTalksInMaths

Initial commit: ToGMAL Prompt Difficulty Analyzer with real MMLU data

f9b1ad5 about 2 months ago

preview code

raw

history blame

16.9 kB

Dynamic Tool Exposure Design for ToGMAL MCP

Date: October 18, 2025
Status: Design Proposal
Impact: Moderate - improves efficiency, enables ML-driven tool discovery

Problem Statement

Current ToGMAL MCP exposes all 5 tools at startup, regardless of conversation context:

check_math_physics
check_medical_advice
check_file_operations
check_code_quality
check_claims

Issues:

LLM must decide which tools are relevant (cognitive overhead)
Irrelevant tools clutter the tool list
No way to automatically add ML-discovered limitation checks
Fixed architecture doesn't scale to 10+ professional domains

Proposed Solution

Dynamic Tool Exposure based on:

Conversation context (what domain is being discussed?)
ML clustering results (what new patterns were discovered?)
User metadata (what domains does this user work in?)

Design Changes

1. Context-Aware Tool Filtering

Current:

# server.py
@server.list_tools()
async def list_tools() -> list[Tool]:
    # Always returns all 5 tools
    return [
        Tool(name="check_math_physics", ...),
        Tool(name="check_medical_advice", ...),
        Tool(name="check_file_operations", ...),
        Tool(name="check_code_quality", ...),
        Tool(name="check_claims", ...),
    ]

Proposed:

# server.py
from typing import Optional
from .context_analyzer import analyze_conversation_context

@server.list_tools()
async def list_tools(
    conversation_history: Optional[list[dict]] = None,
    user_context: Optional[dict] = None
) -> list[Tool]:
    """
    Dynamically expose tools based on conversation context
    
    Args:
        conversation_history: Recent messages for domain detection
        user_context: User metadata (role, industry, preferences)
    """
    # Detect relevant domains from conversation
    domains = await analyze_conversation_context(
        conversation_history=conversation_history,
        user_context=user_context
    )
    
    # Build tool list based on detected domains
    tools = []
    
    # Core tools (always available)
    tools.append(Tool(name="check_claims", ...))  # General-purpose
    
    # Domain-specific tools (conditional)
    if "mathematics" in domains or "physics" in domains:
        tools.append(Tool(name="check_math_physics", ...))
    
    if "medicine" in domains or "healthcare" in domains:
        tools.append(Tool(name="check_medical_advice", ...))
    
    if "coding" in domains or "file_system" in domains:
        tools.append(Tool(name="check_file_operations", ...))
        tools.append(Tool(name="check_code_quality", ...))
    
    # ML-discovered tools (dynamic)
    if ML_CLUSTERING_ENABLED:
        ml_tools = await get_ml_discovered_tools(domains)
        tools.extend(ml_tools)
    
    return tools

2. Context Analyzer Module

New file: togmal/context_analyzer.py

"""
Context analyzer for domain detection
Determines which limitation checks are relevant
"""

import re
from typing import List, Dict, Any, Optional
from collections import Counter

# Domain keywords mapping
DOMAIN_KEYWORDS = {
    "mathematics": ["math", "calculus", "algebra", "geometry", "proof", "theorem", "equation"],
    "physics": ["physics", "force", "energy", "quantum", "relativity", "mechanics"],
    "medicine": ["medical", "diagnosis", "treatment", "symptom", "disease", "patient", "doctor"],
    "healthcare": ["health", "medication", "drug", "therapy", "clinical"],
    "law": ["legal", "law", "court", "regulation", "compliance", "attorney", "contract"],
    "finance": ["financial", "investment", "stock", "portfolio", "trading", "tax"],
    "coding": ["code", "programming", "function", "class", "debug", "git", "api"],
    "file_system": ["file", "directory", "path", "write", "delete", "permission"],
}

async def analyze_conversation_context(
    conversation_history: Optional[List[Dict[str, str]]] = None,
    user_context: Optional[Dict[str, Any]] = None,
    threshold: float = 0.3
) -> List[str]:
    """
    Analyze conversation to detect relevant domains
    
    Args:
        conversation_history: Recent messages [{"role": "user", "content": "..."}]
        user_context: User metadata {"industry": "healthcare", "role": "developer"}
        threshold: Minimum confidence to include domain (0-1)
    
    Returns:
        List of detected domains, e.g., ["mathematics", "coding"]
    """
    detected_domains = set()
    
    # Strategy 1: Keyword matching in conversation
    if conversation_history:
        domain_scores = _score_domains_by_keywords(conversation_history)
        
        # Add domains above threshold
        for domain, score in domain_scores.items():
            if score >= threshold:
                detected_domains.add(domain)
    
    # Strategy 2: User context hints
    if user_context:
        if "industry" in user_context:
            industry = user_context["industry"].lower()
            # Map industry to domains
            if "health" in industry or "medical" in industry:
                detected_domains.update(["medicine", "healthcare"])
            elif "tech" in industry or "software" in industry:
                detected_domains.add("coding")
            elif "finance" in industry or "bank" in industry:
                detected_domains.add("finance")
    
    # Strategy 3: Always include if explicitly mentioned in last message
    if conversation_history and len(conversation_history) > 0:
        last_message = conversation_history[-1].get("content", "").lower()
        
        for domain, keywords in DOMAIN_KEYWORDS.items():
            if any(kw in last_message for kw in keywords):
                detected_domains.add(domain)
    
    return list(detected_domains)


def _score_domains_by_keywords(
    conversation_history: List[Dict[str, str]],
    recent_weight: float = 2.0
) -> Dict[str, float]:
    """
    Score domains based on keyword frequency (recent messages weighted higher)
    
    Returns:
        Dict of {domain: score} normalized 0-1
    """
    domain_counts = Counter()
    total_messages = len(conversation_history)
    
    for i, message in enumerate(conversation_history):
        content = message.get("content", "").lower()
        
        # Weight recent messages higher
        recency_weight = 1.0 + (i / total_messages) * (recent_weight - 1.0)
        
        for domain, keywords in DOMAIN_KEYWORDS.items():
            matches = sum(1 for kw in keywords if kw in content)
            domain_counts[domain] += matches * recency_weight
    
    # Normalize scores
    max_count = max(domain_counts.values()) if domain_counts else 1
    return {
        domain: count / max_count
        for domain, count in domain_counts.items()
    }

3. ML-Discovered Tools Integration

New file: togmal/ml_tools.py

"""
Dynamically generate tools from ML clustering results
"""

from typing import List, Optional
from mcp.types import Tool
import json
from pathlib import Path

ML_TOOLS_CACHE_PATH = Path("./data/ml_discovered_tools.json")

async def get_ml_discovered_tools(
    relevant_domains: Optional[List[str]] = None
) -> List[Tool]:
    """
    Load ML-discovered limitation checks as MCP tools
    
    Args:
        relevant_domains: Only return tools for these domains (None = all)
    
    Returns:
        List of dynamically generated Tool objects
    """
    if not ML_TOOLS_CACHE_PATH.exists():
        return []
    
    # Load ML-discovered patterns
    with open(ML_TOOLS_CACHE_PATH) as f:
        ml_patterns = json.load(f)
    
    tools = []
    
    for pattern in ml_patterns.get("patterns", []):
        domain = pattern.get("domain")
        
        # Filter by relevant domains
        if relevant_domains and domain not in relevant_domains:
            continue
        
        # Only include high-confidence patterns
        if pattern.get("confidence", 0) < 0.8:
            continue
        
        # Generate tool dynamically
        tool = Tool(
            name=f"check_{pattern['id']}",
            description=pattern["description"],
            inputSchema={
                "type": "object",
                "properties": {
                    "prompt": {"type": "string"},
                    "response": {"type": "string"}
                },
                "required": ["prompt", "response"]
            }
        )
        
        tools.append(tool)
    
    return tools


async def update_ml_tools_cache(research_pipeline_output: dict):
    """
    Called by research pipeline to update available ML tools
    
    Args:
        research_pipeline_output: Latest clustering/anomaly detection results
    """
    # Extract high-confidence patterns
    patterns = []
    
    for cluster in research_pipeline_output.get("clusters", []):
        if cluster.get("is_dangerous", False) and cluster.get("purity", 0) > 0.7:
            pattern = {
                "id": cluster["id"],
                "domain": cluster["domain"],
                "description": f"Check for {cluster['pattern_description']}",
                "confidence": cluster["purity"],
                "heuristic": cluster.get("detection_rule", ""),
                "examples": cluster.get("examples", [])[:3]
            }
            patterns.append(pattern)
    
    # Save to cache
    ML_TOOLS_CACHE_PATH.parent.mkdir(parents=True, exist_ok=True)
    with open(ML_TOOLS_CACHE_PATH, 'w') as f:
        json.dump({
            "updated_at": research_pipeline_output["timestamp"],
            "patterns": patterns
        }, f, indent=2)

4. Tool Handler Registration

Modified: togmal/server.py

# Dynamic handler registration for ML tools
@server.call_tool()
async def call_tool(name: str, arguments: dict) -> list[TextContent]:
    """
    Route tool calls to appropriate handlers
    Supports both static and ML-discovered tools
    """
    # Static tools (existing)
    if name == "check_math_physics":
        return await check_math_physics(**arguments)
    elif name == "check_medical_advice":
        return await check_medical_advice(**arguments)
    # ... etc
    
    # ML-discovered tools (dynamic)
    elif name.startswith("check_ml_"):
        return await handle_ml_tool(name, arguments)
    
    else:
        raise ValueError(f"Unknown tool: {name}")


async def handle_ml_tool(tool_name: str, arguments: dict) -> list[TextContent]:
    """
    Execute ML-discovered limitation check
    
    Args:
        tool_name: e.g., "check_ml_cluster_47"
        arguments: {"prompt": "...", "response": "..."}
    """
    # Load ML pattern definition
    pattern = await load_ml_pattern(tool_name)
    
    if not pattern:
        return [TextContent(
            type="text",
            text=f"Error: ML pattern not found for {tool_name}"
        )]
    
    # Run heuristic check
    result = await run_ml_heuristic(
        prompt=arguments["prompt"],
        response=arguments["response"],
        heuristic=pattern["heuristic"],
        examples=pattern["examples"]
    )
    
    return [TextContent(
        type="text",
        text=json.dumps(result, indent=2)
    )]

Configuration

New file: togmal/config.py

"""Configuration for dynamic tool exposure"""

# Enable/disable dynamic behavior
DYNAMIC_TOOLS_ENABLED = True

# Enable ML-discovered tools
ML_CLUSTERING_ENABLED = True

# Context analysis settings
DOMAIN_DETECTION_THRESHOLD = 0.3  # 0-1, confidence required
CONVERSATION_HISTORY_LENGTH = 10  # How many messages to analyze

# ML tools settings
ML_TOOLS_MIN_CONFIDENCE = 0.8  # Only expose high-confidence patterns
ML_TOOLS_CACHE_TTL = 3600  # Seconds to cache ML tools

# Always-available tools (never filtered)
CORE_TOOLS = ["check_claims"]  # General-purpose checks

Example Usage

Before (Static)

# LLM sees all 5 tools regardless of context
tools = [
    "check_math_physics",      # Not relevant
    "check_medical_advice",    # Not relevant
    "check_file_operations",   # RELEVANT
    "check_code_quality",      # RELEVANT
    "check_claims"             # RELEVANT
]

# User: "How do I delete all files in a directory?"
# LLM must reason about which tools to use

After (Dynamic)

# Conversation: "How do I delete all files in a directory?"
# Detected domains: ["coding", "file_system"]

tools = [
    "check_file_operations",   # ✅ Relevant
    "check_code_quality",      # ✅ Relevant
    "check_claims"             # ✅ Core tool
    # check_math_physics - filtered out
    # check_medical_advice - filtered out
]

# Cleaner tool list, LLM focuses on relevant checks

With ML Tools

# After research pipeline discovers new pattern:
# "Users frequently attempt dangerous recursive deletions"

# Next conversation about file operations:
tools = [
    "check_file_operations",
    "check_code_quality", 
    "check_claims",
    "check_ml_recursive_delete_danger"  # ✅ Auto-added by ML!
]

Implementation Priority

Phase 1 (Week 1): Context analyzer

Implement keyword-based domain detection
Add conversation history parameter to list_tools()
Test with existing 5 tools

Phase 2 (Week 2): ML tool integration

Create ml_tools.py module
Implement tool caching from research pipeline
Dynamic handler registration

Phase 3 (Week 3): Optimization

Add user context hints
Improve domain detection accuracy
Performance testing

Benefits

Reduced Cognitive Load: LLM sees only relevant tools
Scalability: Can add 10+ domains without overwhelming LLM
ML Integration: Research pipeline automatically exposes new checks
Efficiency: Fewer irrelevant tool calls
Personalization: Tools adapt to user context

Backward Compatibility

Option 1 (Recommended): Feature flag

if DYNAMIC_TOOLS_ENABLED:
    tools = await list_tools_dynamic(conversation_history)
else:
    tools = await list_tools_static()  # Original behavior

Option 2: MCP protocol parameter

# Client can request static or dynamic
@server.list_tools()
async def list_tools(mode: str = "dynamic") -> list[Tool]:
    if mode == "static":
        return ALL_TOOLS
    else:
        return filter_tools_by_context()

Testing Strategy

# tests/test_dynamic_tools.py

async def test_math_context_exposes_math_tool():
    conversation = [
        {"role": "user", "content": "What's the derivative of x^2?"}
    ]
    
    tools = await list_tools(conversation_history=conversation)
    tool_names = [t.name for t in tools]
    
    assert "check_math_physics" in tool_names
    assert "check_medical_advice" not in tool_names


async def test_medical_context_exposes_medical_tool():
    conversation = [
        {"role": "user", "content": "What are symptoms of diabetes?"}
    ]
    
    tools = await list_tools(conversation_history=conversation)
    tool_names = [t.name for t in tools]
    
    assert "check_medical_advice" in tool_names
    assert "check_math_physics" not in tool_names


async def test_ml_tool_added_after_research_update():
    # Simulate research pipeline discovering new pattern
    research_output = {
        "timestamp": "2025-10-18T10:00:00Z",
        "clusters": [
            {
                "id": "cluster_recursive_delete",
                "domain": "file_system",
                "is_dangerous": True,
                "purity": 0.92,
                "pattern_description": "recursive deletion without confirmation",
                "detection_rule": "check for 'rm -rf' or 'shutil.rmtree' without safeguards"
            }
        ]
    }
    
    await update_ml_tools_cache(research_output)
    
    # Check that new tool is exposed
    conversation = [{"role": "user", "content": "Delete all files recursively"}]
    tools = await list_tools(conversation_history=conversation)
    tool_names = [t.name for t in tools]
    
    assert "check_ml_cluster_recursive_delete" in tool_names

Future Enhancements

Semantic Analysis: Use embeddings for domain detection (more accurate)
User Learning: Remember which tools user frequently needs
Proactive Suggestions: "This conversation may benefit from medical advice check"
Tool Composition: Combine multiple ML patterns into meta-tools
A/B Testing: Measure if dynamic exposure improves safety outcomes

Decision

Recommendation: ✅ Implement dynamic tool exposure

Rationale:

Essential for scaling beyond 5 tools
Enables ML-driven tool discovery (key innovation!)
Improves LLM efficiency
Maintains backward compatibility
Relatively low implementation cost (~1 week)

When: Implement in Phase 2 of integration (after core ToGMAL-Aqumen bidirectional flow working)