Spaces:

MCP-1st-Birthday
/

MissionControlMCP

Running

{
    "text": str,           # Full text content from all pages
    "pages": int,          # Number of pages
    "metadata": {          # Document metadata
        "author": str,
        "creator": str,
        "producer": str,
        "subject": str,
        "title": str,
        "creation_date": str,
        "modification_date": str
    }
}

Example:

from tools.pdf_reader import read_pdf

result = read_pdf("C:/docs/report.pdf")
print(f"Pages: {result['pages']}")
print(f"Author: {result['metadata']['author']}")
print(result['text'][:500])  # First 500 chars

Errors:

FileNotFoundError: PDF file not found
ImportError: PyPDF2 not installed
Exception: Invalid or corrupted PDF

`get_pdf_info(file_path: str) -> Dict[str, Any]`

Get basic PDF information without extracting text.

Parameters:

file_path (str): Path to PDF file

Returns:

{
    "page_count": int,
    "is_encrypted": bool,
    "file_size_bytes": int,
    "file_name": str
}

2. Text Extractor

`extract_text(text: str, operation: str, **kwargs) -> Dict[str, Any]`

Process and extract information from text.

Parameters:

text (str): Input text to process
operation (str): Operation type
- "clean" - Remove extra whitespace
- "summarize" - Create summary
- "chunk" - Split into chunks
- "keywords" - Extract keywords
**kwargs: Operation-specific parameters

Operation: clean

extract_text(text, operation="clean")
# Returns: {"result": str, "word_count": int}

Operation: summarize

extract_text(text, operation="summarize", max_length=500)
# max_length: Maximum summary length (default: 500)
# Returns: {"result": str, "word_count": int, "original_length": int}

Operation: chunk

extract_text(text, operation="chunk", chunk_size=100, overlap=20)
# chunk_size: Characters per chunk (default: 100)
# overlap: Overlapping characters (default: 20)
# Returns: {"chunks": List[str], "chunk_count": int}

Operation: keywords

extract_text(text, operation="keywords", top_n=10)
# top_n: Number of keywords (default: 10)
# Returns: {"result": str, "keywords": List[str]}

Example:

from tools.text_extractor import extract_text

# Get keywords
result = extract_text("Your text here...", operation="keywords")
print(result['result'])  # "keyword1, keyword2, keyword3"

# Summarize
summary = extract_text("Long text...", operation="summarize", max_length=200)
print(summary['result'])

3. Web Fetcher

`fetch_web_content(url: str, timeout: int = 30) -> Dict[str, Any]`

Fetch and parse web page content.

Parameters:

url (str): Website URL
timeout (int): Request timeout in seconds (default: 30)

Returns:

{
    "url": str,
    "title": str,
    "content": str,         # Clean text content
    "html": str,            # Raw HTML
    "links": List[str],     # All URLs found
    "status_code": int,     # HTTP status
    "timestamp": str
}

Example:

from tools.web_fetcher import fetch_web_content

result = fetch_web_content("https://example.com")
print(f"Title: {result['title']}")
print(f"Content: {result['content'][:200]}")
print(f"Links found: {len(result['links'])}")

Errors:

requests.exceptions.Timeout: Request timed out
requests.exceptions.RequestException: Network error
Exception: Invalid URL or parsing error

4. RAG Search

`search_documents(query: str, documents: List[str], top_k: int = 3) -> Dict[str, Any]`

Semantic search using vector embeddings and FAISS.

Parameters:

query (str): Search query
documents (List[str]): List of documents to search
top_k (int): Number of results to return (default: 3)

Returns:

{
    "query": str,
    "total_documents": int,
    "returned_results": int,
    "results": [
        {
            "rank": int,
            "document": str,
            "score": float,      # 0.0 to 1.0 (higher = more relevant)
            "distance": float    # L2 distance
        }
    ]
}

Example:

from tools.rag_search import search_documents

docs = [
    "Machine learning is a subset of AI",
    "Python is a programming language",
    "Data science uses statistics"
]

result = search_documents("artificial intelligence", docs, top_k=2)

for item in result['results']:
    print(f"Score: {item['score']:.4f} - {item['document']}")

Features:

Semantic matching (understands meaning, not just keywords)
Uses sentence-transformers (all-MiniLM-L6-v2)
FAISS for fast vector search

`multi_query_search(queries: List[str], documents: List[str], top_k: int = 3) -> Dict[str, Any]`

Search multiple queries at once.

Returns:

{
    "queries": List[str],
    "results": {
        "query1": [results],
        "query2": [results]
    }
}

5. Data Visualizer

`visualize_data(data: str, chart_type: str, x_column: str = None, y_column: str = None, title: str = "Data Visualization") -> Dict[str, Any]`

Create charts from CSV or JSON data.

Parameters:

data (str): CSV or JSON string
chart_type (str): Chart type
- "bar" - Bar chart
- "line" - Line chart
- "pie" - Pie chart
- "scatter" - Scatter plot
x_column (str): X-axis column name
y_column (str): Y-axis column name
title (str): Chart title

Returns:

{
    "image_base64": str,     # Base64-encoded PNG image
    "dimensions": {
        "width": int,
        "height": int
    },
    "chart_type": str,
    "title": str,
    "columns_used": {
        "x": str,
        "y": str
    }
}

Example:

from tools.data_visualizer import visualize_data
import base64

csv_data = """month,revenue
Jan,5000000
Feb,5200000
Mar,5400000"""

result = visualize_data(
    data=csv_data,
    chart_type="line",
    x_column="month",
    y_column="revenue",
    title="Revenue Trends"
)

# Save chart
with open("chart.png", "wb") as f:
    f.write(base64.b64decode(result['image_base64']))

6. File Converter

`convert_file(input_path: str, output_path: str, conversion_type: str) -> Dict[str, Any]`

Convert between PDF, TXT, and CSV formats.

Parameters:

input_path (str): Input file path
output_path (str): Output file path
conversion_type (str): Conversion type
- "pdf_to_txt" - PDF → Text
- "txt_to_pdf" - Text → PDF
- "csv_to_txt" - CSV → Text
- "txt_to_csv" - Text → CSV

Returns:

{
    "success": bool,
    "input_file": str,
    "output_file": str,
    "conversion_type": str,
    "file_size_bytes": int
}

Example:

from tools.file_converter import convert_file

result = convert_file(
    input_path="document.pdf",
    output_path="document.txt",
    conversion_type="pdf_to_txt"
)

print(f"Converted: {result['success']}")
print(f"Output: {result['output_file']}")

7. Email Intent Classifier

`classify_email_intent(email_text: str) -> Dict[str, Any]`

Classify email intent using NLP pattern matching.

Parameters:

email_text (str): Email content (subject + body)

Returns:

{
    "intent": str,          # Primary intent
    "confidence": float,    # 0.0 to 1.0
    "secondary_intents": [
        {
            "intent": str,
            "confidence": float
        }
    ],
    "explanation": str
}

Intent Types:

complaint - Customer complaints
inquiry - Information requests
request - Action requests
feedback - Suggestions/reviews
order - Purchase-related
meeting - Meeting scheduling
urgent - High priority issues
application - Job applications
sales - Sales pitches
other - Unclassified

Example:

from tools.email_intent_classifier import classify_email_intent

email = """
Subject: Order Issue
My order #12345 hasn't arrived yet. Can you help?
"""

result = classify_email_intent(email)
print(f"Intent: {result['intent']}")          # "complaint"
print(f"Confidence: {result['confidence']}")  # 0.85

`classify_batch(emails: List[str]) -> Dict[str, Any]`

Classify multiple emails at once.

Returns:

{
    "results": [
        {"email_index": int, "intent": str, "confidence": float},
        ...
    ],
    "total_processed": int
}

8. KPI Generator

`generate_kpis(data: str, metrics: List[str] = None) -> Dict[str, Any]`

Calculate business KPIs from financial data.

Parameters:

data (str): JSON string with business data
metrics (List[str]): Metric categories (optional)
- "revenue" - Revenue-related KPIs
- "growth" - Growth rates
- "efficiency" - Efficiency metrics
- "customer" - Customer metrics
- "operational" - Operational metrics

Input Data Format:

{
    "revenue": 5000000,
    "costs": 3000000,
    "customers": 2500,
    "current_revenue": 5000000,
    "previous_revenue": 4500000,
    "current_customers": 2500,
    "previous_customers": 2300,
    "employees": 50,
    "marketing_spend": 500000,
    "sales": 5000000,
    "cogs": 2000000
}

Returns:

{
    "kpis": {
        "total_revenue": float,
        "profit": float,
        "profit_margin_percent": float,
        "revenue_growth": float,
        "revenue_per_customer": float,
        "revenue_per_employee": float,
        "customer_growth_rate": float,
        ...
    },
    "summary": str,              # Executive summary
    "trends": List[str],         # Identified trends
    "metrics_analyzed": List[str],
    "data_points": int
}

Example:

from tools.kpi_generator import generate_kpis
import json

data = {
    "revenue": 5000000,
    "costs": 3000000,
    "customers": 2500,
    "employees": 50
}

result = generate_kpis(json.dumps(data), metrics=["revenue", "efficiency"])

print(f"Profit: ${result['kpis']['profit']:,.0f}")
print(f"Margin: {result['kpis']['profit_margin_percent']:.1f}%")
print(f"\nSummary: {result['summary']}")

Error Handling

All tools follow consistent error handling:

try:
    result = tool_function(params)
except FileNotFoundError as e:
    print(f"File not found: {e}")
except ValueError as e:
    print(f"Invalid input: {e}")
except ImportError as e:
    print(f"Missing dependency: {e}")
except Exception as e:
    print(f"Unexpected error: {e}")

Type Hints

All functions use Python type hints:

from typing import Dict, Any, List

def function_name(param: str) -> Dict[str, Any]:
    ...

Logging

All tools use Python logging:

import logging
logger = logging.getLogger(__name__)

logger.info("Operation completed")
logger.warning("Warning message")
logger.error("Error occurred")

Dependencies

See requirements.txt for all dependencies:

mcp>=1.0.0
pypdf2>=3.0.0
requests>=2.31.0
beautifulsoup4>=4.12.0
pandas>=2.0.0
numpy>=1.24.0
matplotlib>=3.7.0
seaborn>=0.12.0
scikit-learn>=1.3.0
nltk>=3.8.0
pydantic>=2.0.0
faiss-cpu>=1.7.4
sentence-transformers>=2.2.0

MCP Integration

All tools are registered in mcp_server.py:

server.register_tool(
    name="pdf_reader",
    description="Extract text and metadata from PDF files",
    input_schema={
        "type": "object",
        "properties": {
            "file_path": {"type": "string"}
        },
        "required": ["file_path"]
    }
)

Version Information

API Version: 1.0.0
Python: 3.8+
MCP Protocol: 1.0.0

Support

For issues or questions:

GitHub: AlBaraa-1/CleanEye-Hackathon
Documentation: README.md
Examples: EXAMPLES.md
Testing: TESTING.md

Complete API reference for MissionControlMCP! 🚀