brijeshvadi
/

mcp-error-classifier

+---
+license: mit
+language:
+  - en
+tags:
+  - text-classification
+  - mcp
+  - tool-calling
+  - qa-testing
+  - grok
+  - error-detection
+datasets:
+  - brijeshvadi/mcp-tool-calling-benchmark
+metrics:
+  - accuracy
+  - f1
+pipeline_tag: text-classification
+model-index:
+  - name: mcp-error-classifier
+    results:
+      - task:
+          type: text-classification
+          name: MCP Error Classification
+        metrics:
+          - name: Accuracy
+            type: accuracy
+            value: 0.923
+          - name: F1
+            type: f1
+            value: 0.891
+---
+# MCP Error Classifier
+A fine-tuned text classification model that detects and categorizes MCP (Model Context Protocol) tool-calling errors in AI assistant responses.
+## Model Description
+This model classifies AI assistant tool-calling behavior into 5 error categories identified during QA testing of Grok's MCP connector integrations:
+| Label | Description | Training Samples |
+|-------|-------------|-----------------|
+| `CORRECT` | Tool invoked correctly with proper parameters | 2,847 |
+| `TOOL_BYPASS` | Model answered from training data instead of invoking the tool | 1,203 |
+| `FALSE_SUCCESS` | Model claimed success but tool was never called | 892 |
+| `HALLUCINATION` | Model fabricated tool response data | 756 |
+| `BROKEN_CHAIN` | Multi-step workflow failed mid-chain | 441 |
+| `STALE_DATA` | Tool called but returned outdated cached results | 312 |
+## Training Details
+- **Base Model:** `distilbert-base-uncased`
+- **Training Data:** 6,451 labeled MCP interaction logs across 12 platforms
+- **Platforms Tested:** Supabase, Notion, Miro, Vercel, Netlify, Canva, Linear, GitHub, Box, Slack, Google Drive, Jotform
+- **Epochs:** 5
+- **Learning Rate:** 2e-5
+- **Batch Size:** 32
+## Usage
+```python
+from transformers import pipeline
+classifier = pipeline("text-classification", model="brijeshvadi/mcp-error-classifier")
+result = classifier("Grok responded with project details but never called the Supabase list_projects tool")
+# Output: [{'label': 'TOOL_BYPASS', 'score': 0.94}]
+```
+## Intended Use
+- QA evaluation of AI assistants' MCP tool-calling reliability
+- Automated error categorization in MCP testing pipelines
+- Benchmarking tool-use accuracy across different LLM providers
+## Limitations
+- Trained primarily on Grok interaction logs; may underperform on Claude/ChatGPT patterns
+- English only
+- Requires context about which tool was expected vs. what was called
+## Citation
+```bibtex
+@misc{mcp-error-classifier-2026,
+  author = {Brijesh Vadi},
+  title = {MCP Error Classifier: Detecting Tool-Calling Failures in AI Assistants},
+  year = {2026},
+  publisher = {Hugging Face},
+}
+```