--- title: Calculus Agent emoji: 🌌 colorFrom: gray colorTo: gray sdk: docker pinned: false license: mit short_description: Multi-Agent Calculus Orchestration System --- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference # Pochi 4.o: Multi-Agent Calculus Orchestration System Pochi is a high-performance, asynchronous AI platform specialized in solving complex calculus problems. It utilizes a stateful multi-agent system built on LangGraph, coordinating multiple specialized LLMs and symbolic computation engines to achieve pedagogical excellence and mathematical precision. ## Live Demo | Platform | URL | | :--- | :--- | | **Hugging Face** | [Visit Pochi on Hugging Face](https://huggingface.co/spaces/baeGil/calculus-agent) | ## Project Achievements & Performance Pochi's performance and reliability are continuously monitored via LangSmith. The following data highlights the system's operational excellence and high-speed reasoning capabilities. ![LangSmith Traces](images/traces.png) ### System Health & Usage | Metric | Value | Description | | :--- | :--- | :--- | | **Total Runs** | 476 | Cumulative successful execution cycles. | | **Total Tokens** | 1.86M | Aggregate token throughput across all agents. | | **Median Tokens** | 2,846 | Average context size per solver request. | | **Success Rate** | 99% | System resilience against API and execution errors. | | **Streaming Adoption** | 99% | Percentage of responses delivered via SSE for real-time feedback. | ### Latency Performance > Latency varies significantly based on task complexity (e.g., Simple symbolic math vs. Multi-image OCR + Recursive code fixing). | Stage | P50 (Median) | P99 (Tail) | | :--- | :---: | :---: | | **Time to First Token (TTFT)** | 0.53s | 5.30s | | **End-to-End Latency** | 1.51s | 36.95s | **Analysis**: - **Responsiveness**: A P50 TTFT of **0.53s** ensures that users perceive an "instant" start to the response, crucial for engagement. - **Efficiency**: The P50 latency of **1.51s** for full calculus resolution demonstrates the high-performance nature of the asynchronous multi-agent orchestration. - **Complexity Buffer**: The P99 latency (**~37s**) accounts for the most intensive "Self-Healing" loops, where the system may perform multiple recursive code fixes or deep vision analysis. ## Highlight Features - **Multi-Agent Orchestration**: Stateful DAG-based workflow using LangGraph for complex, multi-stage reasoning. - **Parallel Sub-problem Processing**: Intelligent decomposition of complex queries into independent atomic tasks executed in parallel. - **Multimodal OCR Intelligence**: High-fidelity vision extraction from up to 5 concurrent images with specialized math support. - **Hybrid Solving Engine**: Seamlessly combines symbolic precision (Wolfram Alpha) with algorithmic logic (Python Executor). - **Intelligent Long-Term Memory**: Massive 256K token context window with proactive memory management and token tracking. - **Premium UI/UX**: Modern glassmorphism design with reactive animations, interactive tours, and native LaTeX rendering. ## System Architecture and Pipeline The system is engineered as a directed acyclic graph (DAG) of specialized nodes, managed by a central orchestrator that maintains a consistent state throughout the conversation turn. ### The Execution Pipeline 1. **Vision Ingestion (OCR Agent)**: Processes up to 5 concurrent image inputs. Utilizing Llama-4 Maverick, it extracts raw text and LaTeX-formatted mathematical expressions. 2. **Strategic Decomposition (Planner)**: Analyzes user intent and OCR data to generate a vectorized execution plan. It decomposes composite problems into independent atomic tasks (JSON defined). 3. **Parallel Orchestration (Executor)**: The core processing engine that spawns asynchronous execution threads for each atomic task: - **Symbolic Branch**: Direct interface with Wolfram Alpha API for verified algebraic and calculus manipulation. - **Algorithmic Branch**: Python Code Engine (Qwen3-32B) for numerical methods or complex multi-step logic. - **Heuristic Branch**: Direct LLM solving for theoretical or conceptual queries. 4. **Self-Correction Loop (Code Engine)**: If the Algorithmic Branch encounters execution errors, a specialized CodeFixer (GPT-OSS-120B) performs recursive debugging and code modification. 5. **Contextual Synthesis (Synthetic Agent)**: Aggregates atomic results, resolves inter-task dependencies, and consults conversation history to produce a structured, pedagogical response. ### Technical Workflow Diagram ```mermaid graph TD User([User Request]) --> API[FastAPI Entry] API --> State[Agent State Initialization] State --> OCR{OCR Node} OCR -- Multi-Image --> Vision[Llama-4 Maverick] Vision --> Planner[Planner Node: Kimi K2] OCR -- Text Only --> Planner Planner --> Plan{Execution Plan} Plan -- All Direct --> Synthetic[Synthetic Agent] Plan -- Tool Required --> Executor[Parallel Executor Node] subgraph ParallelTasks["Async Task Orchestration"] Executor --> Wolfram[Wolfram Alpha API] Executor --> Code[Qwen3 Code Gen] Code --> Exec[Python Executor] Exec -- Error --> Fixer[GPT-OSS-120B Fixer] Fixer --> Exec end ParallelTasks --> Synthetic Synthetic --> Render[LaTeX Formatter] Render --> SSE[SSE Stream] SSE --> User subgraph Observability["System Monitoring"] Tracing[LangSmith Trace] Memory[Session Memory Tracker] RateLimit[Token/Request Limiter] end API -.-> Observability Executor -.-> Observability ``` ## Fault Tolerance and Error Handling Pochi is built with a "Resilience-First" mindset, ensuring that the system remains operational and provides accurate results even when facing API failures or ambiguous inputs. ### 1. Model Redundancy and Failover - **OCR Failover**: If the primary vision model (Maverick) encounters rate limits or internal errors, the system automatically redirects requests to a high-speed fallback model (Scout). - **Model Switching**: The `ModelManager` dynamically monitors model health and rate limits (RPM/TPM), performing seamless transitions between tiers without task interruption. ### 2. "Self-Healing" Algorithmic Solving - **Recursive Debugging**: The Python Code Engine is not a simple "one-shot" executor. If generated code fails (SyntaxError, ZeroDivision, etc.), the system sends the error log back to the `CodeFixer` agent. - **Fix Loop**: The system allows for multiple recursive fix attempts, where the agent analyzes the stack trace and re-writes the logic until a successful execution is achieved. ### 3. Graceful Degradation of Tools - **Wolfram-to-Code Fallback**: Symbolic math is the gold standard for precision. However, if the Wolfram Alpha API exceeds its 2000-req/month quota or times out, the system automatically shifts the problem to the Algorithmic Branch for a numerical solve. - **Synthesis Resilience**: If the Synthetic Agent fails to format the final response (e.g., due to context length), the system performs a "raw-safe" synthesis, delivering the tool results directly to the user to ensure no data is lost. ### 4. Robust State and Parsing - **Durable IO**: The background agent task saves intermediate results to the database immediately upon generation. This ensures that even if a client disconnects during a 20-second calculation, the result is waiting for them upon refresh. - **JSON Recovery**: LLMs occasionally return malformed JSON. The `Planner` includes a multi-stage recovery logic that uses regex and string normalization to repair broken JSON blocks, preventing system crashes on minor formatting errors. ### 5. Memory and Resource Safety - **Context Protection**: The `SessionMemoryTracker` proactively blocks requests that would exceed the 256K token limit, preventing "half-baked" or truncated responses from the LLM. - **Rate Limit Resilience**: Integrated backoff and retry mechanisms for all third-party API calls (Groq, Wolfram, LangSmith). ## Model Distribution and Specialization | Component | Model Identifier | Specialization | | :--- | :--- | :--- | | **OCR (Primary)** | Llama-4 Maverick | Multi-modal mathematical extraction. | | **OCR (Fallback)** | Llama-4 Scout | High-speed redundancy for simple OCR. | | **Planner & Synthesis** | Kimi K2-Instruct | 256K Context, complex reasoning, and pedagogy. | | **Code Generation** | Qwen3-32B-Instruct | Optimized for Pythonic mathematical logic. | | **Code Rectification** | GPT-OSS-120B | Deep-context code debugging and error resolution. | | **Symbolic Logic** | Wolfram Alpha | Deterministic symbolic computation (2000 req/mo). | ## Project Structure ```text . ├── backend/ # FastAPI Application & LangGraph Agents │ ├── agent/ # Multi-agent logic (Nodes, Graph, State) │ ├── database/ # SQLite models and migrations │ ├── tools/ # Symbolic & Algorithmic executor tools │ └── utils/ # Memory tracking, rate limiting, tracing ├── frontend/ # React (Vite) Application │ ├── src/ │ │ ├── components/ # UI components (Math rendering, Tour) │ │ └── App.jsx # Main application logic ├── Dockerfile # Containerized deployment ├── pyproject.toml # Python dependencies & metadata └── README.md # Technical documentation ``` ## Mathematics & Computation Stack Pochi utilizes a heavy-duty scientific stack for high-precision calculations: - **Symbolic**: SymPy, Wolfram Alpha API - **Numerical**: NumPy, SciPy, Mpmath - **Optimization**: CVXpy, PuLP - **Visuals**: Matplotlib, Seaborn, Plotly - **Data**: Pandas, Polars, Statsmodels ## Local Deployment ### Environment Configuration Create a `.env` file in the root directory: ```env GROQ_API_KEY=your_key_here WOLFRAM_ALPHA_APP_ID=your_id_here LANGSMITH_API_KEY=your_key_here (optional for tracking) LANGSMITH_PROJECT=calculus-chatbot LANGSMITH_TRACING=true ``` ### Backend Infrastructure 1. Initialize virtual environment: `uv venv && source .venv/bin/activate` 2. Install dependencies: `uv pip install -r requirements.txt` 3. Launch Service: `python main.py` ### Frontend Application 1. Navigate to workspace: `cd frontend` 2. Install packages: `npm install` 3. Development server: `npm run dev` ### Docker Deployment Build and run the entire stack: ```bash docker build -t pochi-app . docker run -p 7860:7860 -v ./data:/data --env-file .env pochi-app ``` ## API Documentation The backend service automatically generates interactive API documentation. - **Swagger UI**: `http://localhost:7860/docs` - **ReDoc**: `http://localhost:7860/redoc` ## Advanced Customization ### Prompt Engineering The system's persona and logic are defined in `backend/agent/prompts.py`: - **GUARD_PROMPT**: Defines the "Pochi" persona and strict safety guardrails. - **TOT_PROMPT**: Enforces the Tree-of-Thought reasoning process (Plan -> Solve -> Verify). - **PLANNER_SYSTEM_PROMPT**: Controls the multi-modal decomposition logic. Developers can modify these constants to adjust the chatbot's tone or reasoning strictness. ## Security & Privacy Guidelines - **Session Isolation**: User sessions are logically isolated in the database (`conversations` table) and memory cache. - **Transient Data**: Uploaded images are processed in-memory (or temp storage) and converted to base64/embeddings; they are not permanently retained on disk for privacy. ## Known Limitations - **Multimodal Cap**: Supports a maximum of 5 distinct images per query to manage context window limits. - **Symbolic Rate Limit**: Wolfram Alpha requests are capped at 2000/month. Heavy usage will degrade to the numerical Python solver (Qwen3). - **Latency**: Complex multi-step reasoning (Plan -> Code -> Fix -> Synthesize) may take 15-30s to fully resolve. ### AI Model Rate Limits The system enforces strict rate limits to ensure stability and usage fairness: | Model ID | RPM (Req/Min) | RPD (Req/Day) | TPM (Tokens/Min) | TPD (Tokens/Day) | | :--- | :---: | :---: | :---: | :---: | | **Kimi K2 Instruct** | 60 | 1,000 | 10,000 | 300,000 | | **Llama-4 Maverick** | 30 | 1,000 | 6,000 | 500,000 | | **Llama-4 Scout** | 30 | 1,000 | 30,000 | 500,000 | | **Qwen3-32B** | 60 | 1,000 | 6,000 | 500,000 | | **GPT-OSS-120B** | 30 | 1,000 | 8,000 | 200,000 | ## API Usage Examples ### Natural Language Calculus > "Tính đạo hàm của f(x) = x^2 + 3x + 2" ### Multimodal Math Analysis (Image Support) > [Upload 2 images of a calculus problem] "Giải bài toán trong ảnh sau" ### Algorithmic Mathematical Tasks > "Sử dụng mã Python để tìm 100 số nguyên tố đầu tiên và giải thích thuật toán Sieve of Eratosthenes." ## Troubleshooting | Issue | Possible Cause | Solution | | :--- | :--- | :--- | | **413 Payload Too Large** | Uploading images > 10MB total. | Reduce image size or upload fewer files per turn. | | **429 Too Many Requests** | Exceeded Wolfram or LLM rate limits. | Wait 60s or switch to a different model tier in `.env`. | | **LangSmith Error** | Invalid or missing API Key. | Set `LANGSMITH_TRACING=false` in `.env` to disable. | | **Docker Build Fail** | Network timeout on `uv sync`. | Check internet connection or increase Docker memory limit. | ## Contributing We welcome contributions! Please follow these steps: 1. Fork the repository. 2. Create a feature branch: `git checkout -b feature/amazing-feature`. 3. Commit your changes: `git commit -m 'Add amazing feature'`. 4. Push to the branch: `git push origin feature/amazing-feature`. 5. Open a Pull Request. ## License Distributed under the MIT License. See `LICENSE` for more information. ## Acknowledgments We deeply appreciate the open-source community and the providers of the powerful technologies that make Pochi possible: - **AI & Logic Providers**: - **LangChain & LangGraph**: For the robust orchestration framework. - **Groq**: For ultra-low latency Llama inference. - **Alibaba**: For the Qwen model. - **OpenAI**: For the GPT-oss model. - **Moonshot AI**: For the Kimi reasoning model. - **Meta AI**: For the Llama vision models. - **Wolfram Alpha**: For the symbolic computation engine. - **Frontend Ecosystem**: - **React & Vite**: For the blazing fast UI. - **Lucide React**: For the beautiful icon set.