Spaces:

JustTheStatsHuman
/

Togmal-demo

Sleeping

File size: 19,109 Bytes

f9b1ad5

# ToGMAL Architecture

## System Overview

```
┌─────────────────────────────────────────────────────────────────┐
│                         Claude Desktop                          │
│                    (or other MCP Client)                        │
└────────────────────────────┬────────────────────────────────────┘
                             │ stdio/MCP Protocol
                             │
┌────────────────────────────▼────────────────────────────────────┐
│                     ToGMAL MCP Server                           │
│                    (togmal_mcp.py)                             │
│  ┌──────────────────────────────────────────────────────────┐  │
│  │                   MCP Tools Layer                         │  │
│  │  - togmal_analyze_prompt                                 │  │
│  │  - togmal_analyze_response                               │  │
│  │  - togmal_submit_evidence                                │  │
│  │  - togmal_get_taxonomy                                   │  │
│  │  - togmal_get_statistics                                 │  │
│  └──────────────────┬───────────────────────────────────────┘  │
│                     │                                           │
│  ┌──────────────────▼───────────────────────────────────────┐  │
│  │              Detection Heuristics                         │  │
│  │  ┌────────────────────────────────────────────────────┐  │  │
│  │  │  Math/Physics Speculation Detector                 │  │  │
│  │  │  - Pattern: "theory of everything"                 │  │  │
│  │  │  - Pattern: "new equation"                         │  │  │
│  │  │  - Pattern: excessive notation                     │  │  │
│  │  └────────────────────────────────────────────────────┘  │  │
│  │  ┌────────────────────────────────────────────────────┐  │  │
│  │  │  Ungrounded Medical Advice Detector                │  │  │
│  │  │  - Pattern: "you probably have"                    │  │  │
│  │  │  - Pattern: "take Xmg"                            │  │  │
│  │  │  - Check: has_sources                              │  │  │
│  │  └────────────────────────────────────────────────────┘  │  │
│  │  ┌────────────────────────────────────────────────────┐  │  │
│  │  │  Dangerous File Operations Detector                │  │  │
│  │  │  - Pattern: "rm -rf"                              │  │  │
│  │  │  - Pattern: recursive deletion                     │  │  │
│  │  │  - Check: has_safeguards                          │  │  │
│  │  └────────────────────────────────────────────────────┘  │  │
│  │  ┌────────────────────────────────────────────────────┐  │  │
│  │  │  Vibe Coding Overreach Detector                   │  │  │
│  │  │  - Pattern: "complete app"                         │  │  │
│  │  │  - Pattern: large line counts                      │  │  │
│  │  │  - Check: has_planning                            │  │  │
│  │  └────────────────────────────────────────────────────┘  │  │
│  │  ┌────────────────────────────────────────────────────┐  │  │
│  │  │  Unsupported Claims Detector                       │  │  │
│  │  │  - Pattern: "always/never"                         │  │  │
│  │  │  - Pattern: statistics without source              │  │  │
│  │  │  - Check: has_hedging                             │  │  │
│  │  └────────────────────────────────────────────────────┘  │  │
│  └──────────────────┬───────────────────────────────────────┘  │
│                     │                                           │
│  ┌──────────────────▼───────────────────────────────────────┐  │
│  │           Risk Assessment & Interventions                 │  │
│  │  - Calculate weighted risk score                         │  │
│  │  - Map to risk levels (LOW → CRITICAL)                  │  │
│  │  - Recommend interventions                               │  │
│  └──────────────────┬───────────────────────────────────────┘  │
│                     │                                           │
│  ┌──────────────────▼───────────────────────────────────────┐  │
│  │              Taxonomy Database                            │  │
│  │  - In-memory storage (extendable to persistent)          │  │
│  │  - Evidence entries with metadata                        │  │
│  │  - Filtering and pagination                              │  │
│  └───────────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────────┘
```

## Data Flow - Prompt Analysis

```
User Prompt
    │
    ├─────────────────────────────────────────────┐
    │                                             │
    ▼                                             │
togmal_analyze_prompt                             │
    │                                             │
    ├──► Math/Physics Detector ──► Result 1      │
    │                                             │
    ├──► Medical Advice Detector ──► Result 2    │
    │                                             │
    ├──► File Ops Detector ──► Result 3          │
    │                                             │
    ├──► Vibe Coding Detector ──► Result 4       │
    │                                             │
    └──► Unsupported Claims Detector ──► Result 5│
                                                  │
    ┌─────────────────────────────────────────────┘
    │
    ▼
Risk Calculation
    │
    ├─► Weight results
    ├─► Calculate score
    └─► Map to risk level
        │
        ▼
Intervention Recommendation
    │
    ├─► Step breakdown?
    ├─► Human-in-loop?
    ├─► Web search?
    └─► Simplified scope?
        │
        ▼
Format Response (Markdown/JSON)
    │
    └──► Return to Client
```

## Detection Pipeline

```
Input Text
    │
    ▼
┌───────────────────────────┐
│   Preprocessing           │
│   - Lowercase             │
│   - Strip whitespace      │
└───────────┬───────────────┘
            │
            ▼
┌───────────────────────────┐
│   Pattern Matching        │
│   - Regex patterns        │
│   - Keyword detection     │
│   - Structural analysis   │
└───────────┬───────────────┘
            │
            ▼
┌───────────────────────────┐
│   Confidence Scoring      │
│   - Count matches         │
│   - Weight by type        │
│   - Normalize to [0,1]    │
└───────────┬───────────────┘
            │
            ▼
┌───────────────────────────┐
│   Context Checks          │
│   - has_sources?          │
│   - has_hedging?          │
│   - has_safeguards?       │
└───────────┬───────────────┘
            │
            ▼
Detection Result
{
  detected: bool,
  categories: list,
  confidence: float,
  metadata: dict
}
```

## Risk Calculation Algorithm

```
For each detection category:
    
    Math/Physics:
        risk += confidence × 0.5
    
    Medical Advice:
        risk += confidence × 1.5  # Highest weight
    
    File Operations:
        risk += confidence × 2.0  # Critical actions
    
    Vibe Coding:
        risk += confidence × 0.4
    
    Unsupported Claims:
        risk += confidence × 0.3

Total Risk Score:
    
    ≥ 1.5 → CRITICAL
    ≥ 1.0 → HIGH
    ≥ 0.5 → MODERATE
    < 0.5 → LOW
```

## Intervention Decision Tree

```
                    Detection Results
                          │
        ┌─────────────────┼─────────────────┐
        │                 │                 │
        ▼                 ▼                 ▼
  Math/Physics?     Medical Advice?   File Operations?
        │                 │                 │
        ├─► Yes           ├─► Yes           ├─► Yes
        │   │             │   │             │   │
        │   ├─► Step      │   ├─► Human    │   ├─► Human
        │   │   Breakdown │   │   in Loop   │   │   in Loop
        │   │             │   │             │   │
        │   └─► Web       │   └─► Web       │   └─► Step
        │       Search    │       Search    │       Breakdown
        │                 │                 │
        └─► No            └─► No            └─► No
            │                 │                 │
            ▼                 ▼                 ▼
      Continue          Continue          Continue

                    ┌───────────┐
                    │  Combine  │
                    │  Results  │
                    └─────┬─────┘
                          │
                          ▼
              Intervention List
              (deduplicated)
```

## Taxonomy Database Schema

```
TAXONOMY_DB = {
    "category_name": [
        {
            "id": "abc123def456",
            "category": "math_physics_speculation",
            "prompt": "User's prompt text...",
            "response": "LLM's response text...",
            "description": "Why problematic...",
            "severity": "high",
            "timestamp": "2025-10-18T00:00:00",
            "prompt_hash": "a1b2c3d4"
        },
        { ... more entries ... }
    ],
    "another_category": [ ... ]
}

Indices:
- By category (dict key)
- By severity (filter)
- By timestamp (sort)
- By hash (deduplication)
```

## Component Responsibilities

### MCP Tools Layer
**Responsibilities:**
- Input validation (Pydantic models)
- Parameter extraction
- Tool orchestration
- Response formatting
- Character limit enforcement

**Does NOT:**
- Perform detection logic
- Calculate risk scores
- Store data directly

### Detection Heuristics Layer
**Responsibilities:**
- Pattern matching
- Confidence scoring
- Context analysis
- Detection result generation

**Does NOT:**
- Make intervention decisions
- Format responses
- Handle I/O

### Risk Assessment Layer
**Responsibilities:**
- Aggregate detection results
- Calculate weighted risk scores
- Map scores to risk levels
- Generate intervention recommendations

**Does NOT:**
- Perform detection
- Format responses
- Store data

### Taxonomy Database
**Responsibilities:**
- Store evidence entries
- Support filtering/pagination
- Provide statistics
- Maintain capacity limits

**Does NOT:**
- Perform analysis
- Make decisions
- Format responses

## Extension Points

### Adding New Detection Categories

```python
# 1. Add enum value
class CategoryType(str, Enum):
    NEW_CATEGORY = "new_category"

# 2. Create detector function
def detect_new_category(text: str) -> Dict[str, Any]:
    patterns = { ... }
    # Detection logic
    return {
        'detected': bool,
        'categories': list,
        'confidence': float
    }

# 3. Update analysis functions
def analyze_prompt(params):
    results['new_category'] = detect_new_category(params.prompt)
    # ... rest of logic

# 4. Update risk calculation
def calculate_risk_level(results):
    if results['new_category']['detected']:
        risk_score += results['new_category']['confidence'] * WEIGHT

# 5. Add intervention logic
def recommend_interventions(results):
    if results['new_category']['detected']:
        interventions.append({ ... })
```

### Adding Persistent Storage

```python
# 1. Define storage backend
class TaxonomyStorage:
    def save(self, category, entry): ...
    def load(self, category, filters): ...
    def get_stats(self): ...

# 2. Replace in-memory dict
storage = TaxonomyStorage(backend="sqlite")  # or "postgres", "mongodb"

# 3. Update tool functions
@mcp.tool()
async def submit_evidence(params):
    # Instead of: TAXONOMY_DB[category].append(entry)
    await storage.save(params.category, entry)
```

### Adding ML Models

```python
# 1. Define model interface
class AnomalyDetector:
    def fit(self, X): ...
    def predict(self, x) -> float: ...

# 2. Train from taxonomy
detector = AnomalyDetector()
training_data = get_training_data_from_taxonomy()
detector.fit(training_data)

# 3. Use in detection
def detect_with_ml(text: str) -> float:
    features = extract_features(text)
    anomaly_score = detector.predict(features)
    return anomaly_score
```

## Performance Characteristics

### Time Complexity
- **Pattern Matching**: O(n) where n = text length
- **All Detectors**: O(n) (parallel constant time)
- **Risk Calculation**: O(1) (fixed number of categories)
- **Taxonomy Query**: O(m·log m) where m = matching entries
- **Overall**: O(n + m·log m)

### Space Complexity
- **Server Base**: ~50 MB
- **Per Request**: ~1 KB (temporary)
- **Per Taxonomy Entry**: ~1 KB
- **Total with 1000 entries**: ~51 MB

### Latency
- **Single Detection**: ~10-50 ms
- **All Detections**: ~50-100 ms
- **Format Response**: ~1-10 ms
- **Total Per Request**: ~100-150 ms

## Security Considerations

### Input Validation
```
User Input
    │
    ▼
Pydantic Model
    │
    ├─► Type checking
    ├─► Length limits
    ├─► Pattern validation
    └─► Field constraints
        │
        ▼
    Valid Input
```

### Privacy Protection
```
┌────────────────────────────────────┐
│  NO External API Calls             │
│  NO Data Transmission              │
│  NO Logging Sensitive Info         │
│  YES Local Processing Only         │
│  YES User Consent Required         │
│  YES Data Stays on Device          │
└────────────────────────────────────┘
```

### Human-in-the-Loop
```
Sensitive Operation Detected
    │
    ▼
Request User Confirmation
    │
    ├─► Yes → Proceed
    │
    └─► No → Cancel
```

## Scalability Path

### Current: Single Instance
```
Client → stdio → ToGMAL Server → Response
```

### Future: HTTP Transport
```
Multiple Clients → HTTP → ToGMAL Server → Response
                          ↓
                    Shared Database
```

### Advanced: Distributed
```
Clients → Load Balancer → ToGMAL Servers (N)
                              ↓
                        Shared Database
                              ↓
                        ML Model Cache
```

## Monitoring Points

```
┌─────────────────────────────────────┐
│  Metrics to Track                   │
├─────────────────────────────────────┤
│  - Tool call frequency              │
│  - Detection rates by category      │
│  - Risk level distribution          │
│  - Intervention effectiveness       │
│  - False positive rate              │
│  - Response latency                 │
│  - Taxonomy growth rate             │
│  - User feedback submissions        │
└─────────────────────────────────────┘
```

---

This architecture supports:
- ✅ Privacy-preserving analysis
- ✅ Low-latency detection
- ✅ Extensible design
- ✅ Production readiness
- ✅ Future ML integration