Spaces:
Running
title: Calculus Agent
emoji: 🌌
colorFrom: gray
colorTo: gray
sdk: docker
pinned: false
license: mit
short_description: Multi-Agent Calculus Orchestration System
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
Pochi 4.o: Multi-Agent Calculus Orchestration System
Pochi is a high-performance, asynchronous AI platform specialized in solving complex calculus problems. It utilizes a stateful multi-agent system built on LangGraph, coordinating multiple specialized LLMs and symbolic computation engines to achieve pedagogical excellence and mathematical precision.
Live Demo
| Platform | URL |
|---|---|
| Hugging Face | Visit Pochi on Hugging Face |
Project Achievements & Performance
Pochi's performance and reliability are continuously monitored via LangSmith. The following data highlights the system's operational excellence and high-speed reasoning capabilities.
System Health & Usage
| Metric | Value | Description |
|---|---|---|
| Total Runs | 476 | Cumulative successful execution cycles. |
| Total Tokens | 1.86M | Aggregate token throughput across all agents. |
| Median Tokens | 2,846 | Average context size per solver request. |
| Success Rate | 99% | System resilience against API and execution errors. |
| Streaming Adoption | 99% | Percentage of responses delivered via SSE for real-time feedback. |
Latency Performance
Latency varies significantly based on task complexity (e.g., Simple symbolic math vs. Multi-image OCR + Recursive code fixing).
| Stage | P50 (Median) | P99 (Tail) |
|---|---|---|
| Time to First Token (TTFT) | 0.53s | 5.30s |
| End-to-End Latency | 1.51s | 36.95s |
Analysis:
- Responsiveness: A P50 TTFT of 0.53s ensures that users perceive an "instant" start to the response, crucial for engagement.
- Efficiency: The P50 latency of 1.51s for full calculus resolution demonstrates the high-performance nature of the asynchronous multi-agent orchestration.
- Complexity Buffer: The P99 latency (~37s) accounts for the most intensive "Self-Healing" loops, where the system may perform multiple recursive code fixes or deep vision analysis.
Highlight Features
- Multi-Agent Orchestration: Stateful DAG-based workflow using LangGraph for complex, multi-stage reasoning.
- Parallel Sub-problem Processing: Intelligent decomposition of complex queries into independent atomic tasks executed in parallel.
- Multimodal OCR Intelligence: High-fidelity vision extraction from up to 5 concurrent images with specialized math support.
- Hybrid Solving Engine: Seamlessly combines symbolic precision (Wolfram Alpha) with algorithmic logic (Python Executor).
- Intelligent Long-Term Memory: Massive 256K token context window with proactive memory management and token tracking.
- Premium UI/UX: Modern glassmorphism design with reactive animations, interactive tours, and native LaTeX rendering.
System Architecture and Pipeline
The system is engineered as a directed acyclic graph (DAG) of specialized nodes, managed by a central orchestrator that maintains a consistent state throughout the conversation turn.
The Execution Pipeline
- Vision Ingestion (OCR Agent): Processes up to 5 concurrent image inputs. Utilizing Llama-4 Maverick, it extracts raw text and LaTeX-formatted mathematical expressions.
- Strategic Decomposition (Planner): Analyzes user intent and OCR data to generate a vectorized execution plan. It decomposes composite problems into independent atomic tasks (JSON defined).
- Parallel Orchestration (Executor): The core processing engine that spawns asynchronous execution threads for each atomic task:
- Symbolic Branch: Direct interface with Wolfram Alpha API for verified algebraic and calculus manipulation.
- Algorithmic Branch: Python Code Engine (Qwen3-32B) for numerical methods or complex multi-step logic.
- Heuristic Branch: Direct LLM solving for theoretical or conceptual queries.
- Self-Correction Loop (Code Engine): If the Algorithmic Branch encounters execution errors, a specialized CodeFixer (GPT-OSS-120B) performs recursive debugging and code modification.
- Contextual Synthesis (Synthetic Agent): Aggregates atomic results, resolves inter-task dependencies, and consults conversation history to produce a structured, pedagogical response.
Technical Workflow Diagram
graph TD
User([User Request]) --> API[FastAPI Entry]
API --> State[Agent State Initialization]
State --> OCR{OCR Node}
OCR -- Multi-Image --> Vision[Llama-4 Maverick]
Vision --> Planner[Planner Node: Kimi K2]
OCR -- Text Only --> Planner
Planner --> Plan{Execution Plan}
Plan -- All Direct --> Synthetic[Synthetic Agent]
Plan -- Tool Required --> Executor[Parallel Executor Node]
subgraph ParallelTasks["Async Task Orchestration"]
Executor --> Wolfram[Wolfram Alpha API]
Executor --> Code[Qwen3 Code Gen]
Code --> Exec[Python Executor]
Exec -- Error --> Fixer[GPT-OSS-120B Fixer]
Fixer --> Exec
end
ParallelTasks --> Synthetic
Synthetic --> Render[LaTeX Formatter]
Render --> SSE[SSE Stream]
SSE --> User
subgraph Observability["System Monitoring"]
Tracing[LangSmith Trace]
Memory[Session Memory Tracker]
RateLimit[Token/Request Limiter]
end
API -.-> Observability
Executor -.-> Observability
Fault Tolerance and Error Handling
Pochi is built with a "Resilience-First" mindset, ensuring that the system remains operational and provides accurate results even when facing API failures or ambiguous inputs.
1. Model Redundancy and Failover
- OCR Failover: If the primary vision model (Maverick) encounters rate limits or internal errors, the system automatically redirects requests to a high-speed fallback model (Scout).
- Model Switching: The
ModelManagerdynamically monitors model health and rate limits (RPM/TPM), performing seamless transitions between tiers without task interruption.
2. "Self-Healing" Algorithmic Solving
- Recursive Debugging: The Python Code Engine is not a simple "one-shot" executor. If generated code fails (SyntaxError, ZeroDivision, etc.), the system sends the error log back to the
CodeFixeragent. - Fix Loop: The system allows for multiple recursive fix attempts, where the agent analyzes the stack trace and re-writes the logic until a successful execution is achieved.
3. Graceful Degradation of Tools
- Wolfram-to-Code Fallback: Symbolic math is the gold standard for precision. However, if the Wolfram Alpha API exceeds its 2000-req/month quota or times out, the system automatically shifts the problem to the Algorithmic Branch for a numerical solve.
- Synthesis Resilience: If the Synthetic Agent fails to format the final response (e.g., due to context length), the system performs a "raw-safe" synthesis, delivering the tool results directly to the user to ensure no data is lost.
4. Robust State and Parsing
- Durable IO: The background agent task saves intermediate results to the database immediately upon generation. This ensures that even if a client disconnects during a 20-second calculation, the result is waiting for them upon refresh.
- JSON Recovery: LLMs occasionally return malformed JSON. The
Plannerincludes a multi-stage recovery logic that uses regex and string normalization to repair broken JSON blocks, preventing system crashes on minor formatting errors.
5. Memory and Resource Safety
- Context Protection: The
SessionMemoryTrackerproactively blocks requests that would exceed the 256K token limit, preventing "half-baked" or truncated responses from the LLM. - Rate Limit Resilience: Integrated backoff and retry mechanisms for all third-party API calls (Groq, Wolfram, LangSmith).
Model Distribution and Specialization
| Component | Model Identifier | Specialization |
|---|---|---|
| OCR (Primary) | Llama-4 Maverick | Multi-modal mathematical extraction. |
| OCR (Fallback) | Llama-4 Scout | High-speed redundancy for simple OCR. |
| Planner & Synthesis | Kimi K2-Instruct | 256K Context, complex reasoning, and pedagogy. |
| Code Generation | Qwen3-32B-Instruct | Optimized for Pythonic mathematical logic. |
| Code Rectification | GPT-OSS-120B | Deep-context code debugging and error resolution. |
| Symbolic Logic | Wolfram Alpha | Deterministic symbolic computation (2000 req/mo). |
Project Structure
.
├── backend/ # FastAPI Application & LangGraph Agents
│ ├── agent/ # Multi-agent logic (Nodes, Graph, State)
│ ├── database/ # SQLite models and migrations
│ ├── tools/ # Symbolic & Algorithmic executor tools
│ └── utils/ # Memory tracking, rate limiting, tracing
├── frontend/ # React (Vite) Application
│ ├── src/
│ │ ├── components/ # UI components (Math rendering, Tour)
│ │ └── App.jsx # Main application logic
├── Dockerfile # Containerized deployment
├── pyproject.toml # Python dependencies & metadata
└── README.md # Technical documentation
Mathematics & Computation Stack
Pochi utilizes a heavy-duty scientific stack for high-precision calculations:
- Symbolic: SymPy, Wolfram Alpha API
- Numerical: NumPy, SciPy, Mpmath
- Optimization: CVXpy, PuLP
- Visuals: Matplotlib, Seaborn, Plotly
- Data: Pandas, Polars, Statsmodels
Local Deployment
Environment Configuration
Create a .env file in the root directory:
GROQ_API_KEY=your_key_here
WOLFRAM_ALPHA_APP_ID=your_id_here
LANGSMITH_API_KEY=your_key_here (optional for tracking)
LANGSMITH_PROJECT=calculus-chatbot
LANGSMITH_TRACING=true
Backend Infrastructure
- Initialize virtual environment:
uv venv && source .venv/bin/activate - Install dependencies:
uv pip install -r requirements.txt - Launch Service:
python main.py
Frontend Application
- Navigate to workspace:
cd frontend - Install packages:
npm install - Development server:
npm run dev
Docker Deployment
Build and run the entire stack:
docker build -t pochi-app .
docker run -p 7860:7860 -v ./data:/data --env-file .env pochi-app
API Documentation
The backend service automatically generates interactive API documentation.
- Swagger UI:
http://localhost:7860/docs - ReDoc:
http://localhost:7860/redoc
Advanced Customization
Prompt Engineering
The system's persona and logic are defined in backend/agent/prompts.py:
- GUARD_PROMPT: Defines the "Pochi" persona and strict safety guardrails.
- TOT_PROMPT: Enforces the Tree-of-Thought reasoning process (Plan -> Solve -> Verify).
- PLANNER_SYSTEM_PROMPT: Controls the multi-modal decomposition logic.
Developers can modify these constants to adjust the chatbot's tone or reasoning strictness.
Security & Privacy Guidelines
- Session Isolation: User sessions are logically isolated in the database (
conversationstable) and memory cache. - Transient Data: Uploaded images are processed in-memory (or temp storage) and converted to base64/embeddings; they are not permanently retained on disk for privacy.
Known Limitations
- Multimodal Cap: Supports a maximum of 5 distinct images per query to manage context window limits.
- Symbolic Rate Limit: Wolfram Alpha requests are capped at 2000/month. Heavy usage will degrade to the numerical Python solver (Qwen3).
- Latency: Complex multi-step reasoning (Plan -> Code -> Fix -> Synthesize) may take 15-30s to fully resolve.
AI Model Rate Limits
The system enforces strict rate limits to ensure stability and usage fairness:
| Model ID | RPM (Req/Min) | RPD (Req/Day) | TPM (Tokens/Min) | TPD (Tokens/Day) |
|---|---|---|---|---|
| Kimi K2 Instruct | 60 | 1,000 | 10,000 | 300,000 |
| Llama-4 Maverick | 30 | 1,000 | 6,000 | 500,000 |
| Llama-4 Scout | 30 | 1,000 | 30,000 | 500,000 |
| Qwen3-32B | 60 | 1,000 | 6,000 | 500,000 |
| GPT-OSS-120B | 30 | 1,000 | 8,000 | 200,000 |
API Usage Examples
Natural Language Calculus
"Tính đạo hàm của f(x) = x^2 + 3x + 2"
Multimodal Math Analysis (Image Support)
[Upload 2 images of a calculus problem] "Giải bài toán trong ảnh sau"
Algorithmic Mathematical Tasks
"Sử dụng mã Python để tìm 100 số nguyên tố đầu tiên và giải thích thuật toán Sieve of Eratosthenes."
Troubleshooting
| Issue | Possible Cause | Solution |
|---|---|---|
| 413 Payload Too Large | Uploading images > 10MB total. | Reduce image size or upload fewer files per turn. |
| 429 Too Many Requests | Exceeded Wolfram or LLM rate limits. | Wait 60s or switch to a different model tier in .env. |
| LangSmith Error | Invalid or missing API Key. | Set LANGSMITH_TRACING=false in .env to disable. |
| Docker Build Fail | Network timeout on uv sync. |
Check internet connection or increase Docker memory limit. |
Contributing
We welcome contributions! Please follow these steps:
- Fork the repository.
- Create a feature branch:
git checkout -b feature/amazing-feature. - Commit your changes:
git commit -m 'Add amazing feature'. - Push to the branch:
git push origin feature/amazing-feature. - Open a Pull Request.
License
Distributed under the MIT License. See LICENSE for more information.
Acknowledgments
We deeply appreciate the open-source community and the providers of the powerful technologies that make Pochi possible:
- AI & Logic Providers:
- LangChain & LangGraph: For the robust orchestration framework.
- Groq: For ultra-low latency Llama inference.
- Alibaba: For the Qwen model.
- OpenAI: For the GPT-oss model.
- Moonshot AI: For the Kimi reasoning model.
- Meta AI: For the Llama vision models.
- Wolfram Alpha: For the symbolic computation engine.
- Frontend Ecosystem:
- React & Vite: For the blazing fast UI.
- Lucide React: For the beautiful icon set.
