Spaces:

baeGil
/

calculus-agent

Running

App Files Files Community

calculus-agent / README.md

Đỗ Hải Nam

fix README

bfaf58a 19 days ago

preview code

raw

history blame contribute delete

14.6 kB

metadata

title: Calculus Agent
emoji: 🌌
colorFrom: gray
colorTo: gray
sdk: docker
pinned: false
license: mit
short_description: Multi-Agent Calculus Orchestration System

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Pochi 4.o: Multi-Agent Calculus Orchestration System

Pochi is a high-performance, asynchronous AI platform specialized in solving complex calculus problems. It utilizes a stateful multi-agent system built on LangGraph, coordinating multiple specialized LLMs and symbolic computation engines to achieve pedagogical excellence and mathematical precision.

Live Demo

Platform	URL
Hugging Face	Visit Pochi on Hugging Face

Project Achievements & Performance

Pochi's performance and reliability are continuously monitored via LangSmith. The following data highlights the system's operational excellence and high-speed reasoning capabilities.

System Health & Usage

Metric	Value	Description
Total Runs	476	Cumulative successful execution cycles.
Total Tokens	1.86M	Aggregate token throughput across all agents.
Median Tokens	2,846	Average context size per solver request.
Success Rate	99%	System resilience against API and execution errors.
Streaming Adoption	99%	Percentage of responses delivered via SSE for real-time feedback.

Latency Performance

Latency varies significantly based on task complexity (e.g., Simple symbolic math vs. Multi-image OCR + Recursive code fixing).

Stage	P50 (Median)	P99 (Tail)
Time to First Token (TTFT)	0.53s	5.30s
End-to-End Latency	1.51s	36.95s

Analysis:

Responsiveness: A P50 TTFT of 0.53s ensures that users perceive an "instant" start to the response, crucial for engagement.
Efficiency: The P50 latency of 1.51s for full calculus resolution demonstrates the high-performance nature of the asynchronous multi-agent orchestration.
Complexity Buffer: The P99 latency (~37s) accounts for the most intensive "Self-Healing" loops, where the system may perform multiple recursive code fixes or deep vision analysis.

Highlight Features

Multi-Agent Orchestration: Stateful DAG-based workflow using LangGraph for complex, multi-stage reasoning.
Parallel Sub-problem Processing: Intelligent decomposition of complex queries into independent atomic tasks executed in parallel.
Multimodal OCR Intelligence: High-fidelity vision extraction from up to 5 concurrent images with specialized math support.
Hybrid Solving Engine: Seamlessly combines symbolic precision (Wolfram Alpha) with algorithmic logic (Python Executor).
Intelligent Long-Term Memory: Massive 256K token context window with proactive memory management and token tracking.
Premium UI/UX: Modern glassmorphism design with reactive animations, interactive tours, and native LaTeX rendering.

System Architecture and Pipeline

The system is engineered as a directed acyclic graph (DAG) of specialized nodes, managed by a central orchestrator that maintains a consistent state throughout the conversation turn.

The Execution Pipeline

Vision Ingestion (OCR Agent): Processes up to 5 concurrent image inputs. Utilizing Llama-4 Maverick, it extracts raw text and LaTeX-formatted mathematical expressions.
Strategic Decomposition (Planner): Analyzes user intent and OCR data to generate a vectorized execution plan. It decomposes composite problems into independent atomic tasks (JSON defined).
Parallel Orchestration (Executor): The core processing engine that spawns asynchronous execution threads for each atomic task:
- Symbolic Branch: Direct interface with Wolfram Alpha API for verified algebraic and calculus manipulation.
- Algorithmic Branch: Python Code Engine (Qwen3-32B) for numerical methods or complex multi-step logic.
- Heuristic Branch: Direct LLM solving for theoretical or conceptual queries.
Self-Correction Loop (Code Engine): If the Algorithmic Branch encounters execution errors, a specialized CodeFixer (GPT-OSS-120B) performs recursive debugging and code modification.
Contextual Synthesis (Synthetic Agent): Aggregates atomic results, resolves inter-task dependencies, and consults conversation history to produce a structured, pedagogical response.

Technical Workflow Diagram

graph TD
    User([User Request]) --> API[FastAPI Entry]
    API --> State[Agent State Initialization]
    State --> OCR{OCR Node}
    
    OCR -- Multi-Image --> Vision[Llama-4 Maverick]
    Vision --> Planner[Planner Node: Kimi K2]
    OCR -- Text Only --> Planner
    
    Planner --> Plan{Execution Plan}
    Plan -- All Direct --> Synthetic[Synthetic Agent]
    Plan -- Tool Required --> Executor[Parallel Executor Node]
    
    subgraph ParallelTasks["Async Task Orchestration"]
        Executor --> Wolfram[Wolfram Alpha API]
        Executor --> Code[Qwen3 Code Gen]
        Code --> Exec[Python Executor]
        Exec -- Error --> Fixer[GPT-OSS-120B Fixer]
        Fixer --> Exec
    end
    
    ParallelTasks --> Synthetic
    Synthetic --> Render[LaTeX Formatter]
    Render --> SSE[SSE Stream]
    SSE --> User
    
    subgraph Observability["System Monitoring"]
        Tracing[LangSmith Trace]
        Memory[Session Memory Tracker]
        RateLimit[Token/Request Limiter]
    end
    
    API -.-> Observability
    Executor -.-> Observability

Fault Tolerance and Error Handling

Pochi is built with a "Resilience-First" mindset, ensuring that the system remains operational and provides accurate results even when facing API failures or ambiguous inputs.

1. Model Redundancy and Failover

OCR Failover: If the primary vision model (Maverick) encounters rate limits or internal errors, the system automatically redirects requests to a high-speed fallback model (Scout).
Model Switching: The ModelManager dynamically monitors model health and rate limits (RPM/TPM), performing seamless transitions between tiers without task interruption.

2. "Self-Healing" Algorithmic Solving

Recursive Debugging: The Python Code Engine is not a simple "one-shot" executor. If generated code fails (SyntaxError, ZeroDivision, etc.), the system sends the error log back to the CodeFixer agent.
Fix Loop: The system allows for multiple recursive fix attempts, where the agent analyzes the stack trace and re-writes the logic until a successful execution is achieved.

3. Graceful Degradation of Tools

Wolfram-to-Code Fallback: Symbolic math is the gold standard for precision. However, if the Wolfram Alpha API exceeds its 2000-req/month quota or times out, the system automatically shifts the problem to the Algorithmic Branch for a numerical solve.
Synthesis Resilience: If the Synthetic Agent fails to format the final response (e.g., due to context length), the system performs a "raw-safe" synthesis, delivering the tool results directly to the user to ensure no data is lost.

4. Robust State and Parsing

Durable IO: The background agent task saves intermediate results to the database immediately upon generation. This ensures that even if a client disconnects during a 20-second calculation, the result is waiting for them upon refresh.
JSON Recovery: LLMs occasionally return malformed JSON. The Planner includes a multi-stage recovery logic that uses regex and string normalization to repair broken JSON blocks, preventing system crashes on minor formatting errors.

5. Memory and Resource Safety

Context Protection: The SessionMemoryTracker proactively blocks requests that would exceed the 256K token limit, preventing "half-baked" or truncated responses from the LLM.
Rate Limit Resilience: Integrated backoff and retry mechanisms for all third-party API calls (Groq, Wolfram, LangSmith).

Model Distribution and Specialization

Component	Model Identifier	Specialization
OCR (Primary)	Llama-4 Maverick	Multi-modal mathematical extraction.
OCR (Fallback)	Llama-4 Scout	High-speed redundancy for simple OCR.
Planner & Synthesis	Kimi K2-Instruct	256K Context, complex reasoning, and pedagogy.
Code Generation	Qwen3-32B-Instruct	Optimized for Pythonic mathematical logic.
Code Rectification	GPT-OSS-120B	Deep-context code debugging and error resolution.
Symbolic Logic	Wolfram Alpha	Deterministic symbolic computation (2000 req/mo).

Project Structure

.
├── backend/                # FastAPI Application & LangGraph Agents
│   ├── agent/              # Multi-agent logic (Nodes, Graph, State)
│   ├── database/           # SQLite models and migrations
│   ├── tools/              # Symbolic & Algorithmic executor tools
│   └── utils/              # Memory tracking, rate limiting, tracing
├── frontend/               # React (Vite) Application
│   ├── src/
│   │   ├── components/     # UI components (Math rendering, Tour)
│   │   └── App.jsx         # Main application logic
├── Dockerfile              # Containerized deployment
├── pyproject.toml          # Python dependencies & metadata
└── README.md               # Technical documentation

Mathematics & Computation Stack

Pochi utilizes a heavy-duty scientific stack for high-precision calculations:

Symbolic: SymPy, Wolfram Alpha API
Numerical: NumPy, SciPy, Mpmath
Optimization: CVXpy, PuLP
Visuals: Matplotlib, Seaborn, Plotly
Data: Pandas, Polars, Statsmodels

Local Deployment

Environment Configuration

Create a .env file in the root directory:

GROQ_API_KEY=your_key_here
WOLFRAM_ALPHA_APP_ID=your_id_here
LANGSMITH_API_KEY=your_key_here (optional for tracking)
LANGSMITH_PROJECT=calculus-chatbot
LANGSMITH_TRACING=true

Backend Infrastructure

Initialize virtual environment: uv venv && source .venv/bin/activate
Install dependencies: uv pip install -r requirements.txt
Launch Service: python main.py

Frontend Application

Navigate to workspace: cd frontend
Install packages: npm install
Development server: npm run dev

Docker Deployment

Build and run the entire stack:

docker build -t pochi-app .
docker run -p 7860:7860 -v ./data:/data --env-file .env pochi-app

API Documentation

The backend service automatically generates interactive API documentation.

Swagger UI: http://localhost:7860/docs
ReDoc: http://localhost:7860/redoc

Advanced Customization

Prompt Engineering

The system's persona and logic are defined in backend/agent/prompts.py:

GUARD_PROMPT: Defines the "Pochi" persona and strict safety guardrails.
TOT_PROMPT: Enforces the Tree-of-Thought reasoning process (Plan -> Solve -> Verify).
PLANNER_SYSTEM_PROMPT: Controls the multi-modal decomposition logic.

Developers can modify these constants to adjust the chatbot's tone or reasoning strictness.

Security & Privacy Guidelines

Session Isolation: User sessions are logically isolated in the database (conversations table) and memory cache.
Transient Data: Uploaded images are processed in-memory (or temp storage) and converted to base64/embeddings; they are not permanently retained on disk for privacy.

Known Limitations

Multimodal Cap: Supports a maximum of 5 distinct images per query to manage context window limits.
Symbolic Rate Limit: Wolfram Alpha requests are capped at 2000/month. Heavy usage will degrade to the numerical Python solver (Qwen3).
Latency: Complex multi-step reasoning (Plan -> Code -> Fix -> Synthesize) may take 15-30s to fully resolve.

AI Model Rate Limits

The system enforces strict rate limits to ensure stability and usage fairness:

Model ID	RPM (Req/Min)	RPD (Req/Day)	TPM (Tokens/Min)	TPD (Tokens/Day)
Kimi K2 Instruct	60	1,000	10,000	300,000
Llama-4 Maverick	30	1,000	6,000	500,000
Llama-4 Scout	30	1,000	30,000	500,000
Qwen3-32B	60	1,000	6,000	500,000
GPT-OSS-120B	30	1,000	8,000	200,000

API Usage Examples

Natural Language Calculus

"Tính đạo hàm của f(x) = x^2 + 3x + 2"

Multimodal Math Analysis (Image Support)

[Upload 2 images of a calculus problem] "Giải bài toán trong ảnh sau"

Algorithmic Mathematical Tasks

"Sử dụng mã Python để tìm 100 số nguyên tố đầu tiên và giải thích thuật toán Sieve of Eratosthenes."

Troubleshooting

Issue	Possible Cause	Solution
413 Payload Too Large	Uploading images > 10MB total.	Reduce image size or upload fewer files per turn.
429 Too Many Requests	Exceeded Wolfram or LLM rate limits.	Wait 60s or switch to a different model tier in `.env`.
LangSmith Error	Invalid or missing API Key.	Set `LANGSMITH_TRACING=false` in `.env` to disable.
Docker Build Fail	Network timeout on `uv sync`.	Check internet connection or increase Docker memory limit.

Contributing

We welcome contributions! Please follow these steps:

Fork the repository.
Create a feature branch: git checkout -b feature/amazing-feature.
Commit your changes: git commit -m 'Add amazing feature'.
Push to the branch: git push origin feature/amazing-feature.
Open a Pull Request.

License

Distributed under the MIT License. See LICENSE for more information.

Acknowledgments

We deeply appreciate the open-source community and the providers of the powerful technologies that make Pochi possible:

AI & Logic Providers:
- LangChain & LangGraph: For the robust orchestration framework.
- Groq: For ultra-low latency Llama inference.
- Alibaba: For the Qwen model.
- OpenAI: For the GPT-oss model.
- Moonshot AI: For the Kimi reasoning model.
- Meta AI: For the Llama vision models.
- Wolfram Alpha: For the symbolic computation engine.
Frontend Ecosystem:
- React & Vite: For the blazing fast UI.
- Lucide React: For the beautiful icon set.