Spaces:
Running
Running
| title: Calculus Agent | |
| emoji: 🌌 | |
| colorFrom: gray | |
| colorTo: gray | |
| sdk: docker | |
| pinned: false | |
| license: mit | |
| short_description: Multi-Agent Calculus Orchestration System | |
| Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference | |
| # Pochi 4.o: Multi-Agent Calculus Orchestration System | |
| Pochi is a high-performance, asynchronous AI platform specialized in solving complex calculus problems. It utilizes a stateful multi-agent system built on LangGraph, coordinating multiple specialized LLMs and symbolic computation engines to achieve pedagogical excellence and mathematical precision. | |
| ## Live Demo | |
| | Platform | URL | | |
| | :--- | :--- | | |
| | **Hugging Face** | [Visit Pochi on Hugging Face](https://huggingface.co/spaces/baeGil/calculus-agent) | | |
| ## Project Achievements & Performance | |
| Pochi's performance and reliability are continuously monitored via LangSmith. The following data highlights the system's operational excellence and high-speed reasoning capabilities. | |
|  | |
| ### System Health & Usage | |
| | Metric | Value | Description | | |
| | :--- | :--- | :--- | | |
| | **Total Runs** | 476 | Cumulative successful execution cycles. | | |
| | **Total Tokens** | 1.86M | Aggregate token throughput across all agents. | | |
| | **Median Tokens** | 2,846 | Average context size per solver request. | | |
| | **Success Rate** | 99% | System resilience against API and execution errors. | | |
| | **Streaming Adoption** | 99% | Percentage of responses delivered via SSE for real-time feedback. | | |
| ### Latency Performance | |
| > Latency varies significantly based on task complexity (e.g., Simple symbolic math vs. Multi-image OCR + Recursive code fixing). | |
| | Stage | P50 (Median) | P99 (Tail) | | |
| | :--- | :---: | :---: | | |
| | **Time to First Token (TTFT)** | 0.53s | 5.30s | | |
| | **End-to-End Latency** | 1.51s | 36.95s | | |
| **Analysis**: | |
| - **Responsiveness**: A P50 TTFT of **0.53s** ensures that users perceive an "instant" start to the response, crucial for engagement. | |
| - **Efficiency**: The P50 latency of **1.51s** for full calculus resolution demonstrates the high-performance nature of the asynchronous multi-agent orchestration. | |
| - **Complexity Buffer**: The P99 latency (**~37s**) accounts for the most intensive "Self-Healing" loops, where the system may perform multiple recursive code fixes or deep vision analysis. | |
| ## Highlight Features | |
| - **Multi-Agent Orchestration**: Stateful DAG-based workflow using LangGraph for complex, multi-stage reasoning. | |
| - **Parallel Sub-problem Processing**: Intelligent decomposition of complex queries into independent atomic tasks executed in parallel. | |
| - **Multimodal OCR Intelligence**: High-fidelity vision extraction from up to 5 concurrent images with specialized math support. | |
| - **Hybrid Solving Engine**: Seamlessly combines symbolic precision (Wolfram Alpha) with algorithmic logic (Python Executor). | |
| - **Intelligent Long-Term Memory**: Massive 256K token context window with proactive memory management and token tracking. | |
| - **Premium UI/UX**: Modern glassmorphism design with reactive animations, interactive tours, and native LaTeX rendering. | |
| ## System Architecture and Pipeline | |
| The system is engineered as a directed acyclic graph (DAG) of specialized nodes, managed by a central orchestrator that maintains a consistent state throughout the conversation turn. | |
| ### The Execution Pipeline | |
| 1. **Vision Ingestion (OCR Agent)**: Processes up to 5 concurrent image inputs. Utilizing Llama-4 Maverick, it extracts raw text and LaTeX-formatted mathematical expressions. | |
| 2. **Strategic Decomposition (Planner)**: Analyzes user intent and OCR data to generate a vectorized execution plan. It decomposes composite problems into independent atomic tasks (JSON defined). | |
| 3. **Parallel Orchestration (Executor)**: The core processing engine that spawns asynchronous execution threads for each atomic task: | |
| - **Symbolic Branch**: Direct interface with Wolfram Alpha API for verified algebraic and calculus manipulation. | |
| - **Algorithmic Branch**: Python Code Engine (Qwen3-32B) for numerical methods or complex multi-step logic. | |
| - **Heuristic Branch**: Direct LLM solving for theoretical or conceptual queries. | |
| 4. **Self-Correction Loop (Code Engine)**: If the Algorithmic Branch encounters execution errors, a specialized CodeFixer (GPT-OSS-120B) performs recursive debugging and code modification. | |
| 5. **Contextual Synthesis (Synthetic Agent)**: Aggregates atomic results, resolves inter-task dependencies, and consults conversation history to produce a structured, pedagogical response. | |
| ### Technical Workflow Diagram | |
| ```mermaid | |
| graph TD | |
| User([User Request]) --> API[FastAPI Entry] | |
| API --> State[Agent State Initialization] | |
| State --> OCR{OCR Node} | |
| OCR -- Multi-Image --> Vision[Llama-4 Maverick] | |
| Vision --> Planner[Planner Node: Kimi K2] | |
| OCR -- Text Only --> Planner | |
| Planner --> Plan{Execution Plan} | |
| Plan -- All Direct --> Synthetic[Synthetic Agent] | |
| Plan -- Tool Required --> Executor[Parallel Executor Node] | |
| subgraph ParallelTasks["Async Task Orchestration"] | |
| Executor --> Wolfram[Wolfram Alpha API] | |
| Executor --> Code[Qwen3 Code Gen] | |
| Code --> Exec[Python Executor] | |
| Exec -- Error --> Fixer[GPT-OSS-120B Fixer] | |
| Fixer --> Exec | |
| end | |
| ParallelTasks --> Synthetic | |
| Synthetic --> Render[LaTeX Formatter] | |
| Render --> SSE[SSE Stream] | |
| SSE --> User | |
| subgraph Observability["System Monitoring"] | |
| Tracing[LangSmith Trace] | |
| Memory[Session Memory Tracker] | |
| RateLimit[Token/Request Limiter] | |
| end | |
| API -.-> Observability | |
| Executor -.-> Observability | |
| ``` | |
| ## Fault Tolerance and Error Handling | |
| Pochi is built with a "Resilience-First" mindset, ensuring that the system remains operational and provides accurate results even when facing API failures or ambiguous inputs. | |
| ### 1. Model Redundancy and Failover | |
| - **OCR Failover**: If the primary vision model (Maverick) encounters rate limits or internal errors, the system automatically redirects requests to a high-speed fallback model (Scout). | |
| - **Model Switching**: The `ModelManager` dynamically monitors model health and rate limits (RPM/TPM), performing seamless transitions between tiers without task interruption. | |
| ### 2. "Self-Healing" Algorithmic Solving | |
| - **Recursive Debugging**: The Python Code Engine is not a simple "one-shot" executor. If generated code fails (SyntaxError, ZeroDivision, etc.), the system sends the error log back to the `CodeFixer` agent. | |
| - **Fix Loop**: The system allows for multiple recursive fix attempts, where the agent analyzes the stack trace and re-writes the logic until a successful execution is achieved. | |
| ### 3. Graceful Degradation of Tools | |
| - **Wolfram-to-Code Fallback**: Symbolic math is the gold standard for precision. However, if the Wolfram Alpha API exceeds its 2000-req/month quota or times out, the system automatically shifts the problem to the Algorithmic Branch for a numerical solve. | |
| - **Synthesis Resilience**: If the Synthetic Agent fails to format the final response (e.g., due to context length), the system performs a "raw-safe" synthesis, delivering the tool results directly to the user to ensure no data is lost. | |
| ### 4. Robust State and Parsing | |
| - **Durable IO**: The background agent task saves intermediate results to the database immediately upon generation. This ensures that even if a client disconnects during a 20-second calculation, the result is waiting for them upon refresh. | |
| - **JSON Recovery**: LLMs occasionally return malformed JSON. The `Planner` includes a multi-stage recovery logic that uses regex and string normalization to repair broken JSON blocks, preventing system crashes on minor formatting errors. | |
| ### 5. Memory and Resource Safety | |
| - **Context Protection**: The `SessionMemoryTracker` proactively blocks requests that would exceed the 256K token limit, preventing "half-baked" or truncated responses from the LLM. | |
| - **Rate Limit Resilience**: Integrated backoff and retry mechanisms for all third-party API calls (Groq, Wolfram, LangSmith). | |
| ## Model Distribution and Specialization | |
| | Component | Model Identifier | Specialization | | |
| | :--- | :--- | :--- | | |
| | **OCR (Primary)** | Llama-4 Maverick | Multi-modal mathematical extraction. | | |
| | **OCR (Fallback)** | Llama-4 Scout | High-speed redundancy for simple OCR. | | |
| | **Planner & Synthesis** | Kimi K2-Instruct | 256K Context, complex reasoning, and pedagogy. | | |
| | **Code Generation** | Qwen3-32B-Instruct | Optimized for Pythonic mathematical logic. | | |
| | **Code Rectification** | GPT-OSS-120B | Deep-context code debugging and error resolution. | | |
| | **Symbolic Logic** | Wolfram Alpha | Deterministic symbolic computation (2000 req/mo). | | |
| ## Project Structure | |
| ```text | |
| . | |
| ├── backend/ # FastAPI Application & LangGraph Agents | |
| │ ├── agent/ # Multi-agent logic (Nodes, Graph, State) | |
| │ ├── database/ # SQLite models and migrations | |
| │ ├── tools/ # Symbolic & Algorithmic executor tools | |
| │ └── utils/ # Memory tracking, rate limiting, tracing | |
| ├── frontend/ # React (Vite) Application | |
| │ ├── src/ | |
| │ │ ├── components/ # UI components (Math rendering, Tour) | |
| │ │ └── App.jsx # Main application logic | |
| ├── Dockerfile # Containerized deployment | |
| ├── pyproject.toml # Python dependencies & metadata | |
| └── README.md # Technical documentation | |
| ``` | |
| ## Mathematics & Computation Stack | |
| Pochi utilizes a heavy-duty scientific stack for high-precision calculations: | |
| - **Symbolic**: SymPy, Wolfram Alpha API | |
| - **Numerical**: NumPy, SciPy, Mpmath | |
| - **Optimization**: CVXpy, PuLP | |
| - **Visuals**: Matplotlib, Seaborn, Plotly | |
| - **Data**: Pandas, Polars, Statsmodels | |
| ## Local Deployment | |
| ### Environment Configuration | |
| Create a `.env` file in the root directory: | |
| ```env | |
| GROQ_API_KEY=your_key_here | |
| WOLFRAM_ALPHA_APP_ID=your_id_here | |
| LANGSMITH_API_KEY=your_key_here (optional for tracking) | |
| LANGSMITH_PROJECT=calculus-chatbot | |
| LANGSMITH_TRACING=true | |
| ``` | |
| ### Backend Infrastructure | |
| 1. Initialize virtual environment: `uv venv && source .venv/bin/activate` | |
| 2. Install dependencies: `uv pip install -r requirements.txt` | |
| 3. Launch Service: `python main.py` | |
| ### Frontend Application | |
| 1. Navigate to workspace: `cd frontend` | |
| 2. Install packages: `npm install` | |
| 3. Development server: `npm run dev` | |
| ### Docker Deployment | |
| Build and run the entire stack: | |
| ```bash | |
| docker build -t pochi-app . | |
| docker run -p 7860:7860 -v ./data:/data --env-file .env pochi-app | |
| ``` | |
| ## API Documentation | |
| The backend service automatically generates interactive API documentation. | |
| - **Swagger UI**: `http://localhost:7860/docs` | |
| - **ReDoc**: `http://localhost:7860/redoc` | |
| ## Advanced Customization | |
| ### Prompt Engineering | |
| The system's persona and logic are defined in `backend/agent/prompts.py`: | |
| - **GUARD_PROMPT**: Defines the "Pochi" persona and strict safety guardrails. | |
| - **TOT_PROMPT**: Enforces the Tree-of-Thought reasoning process (Plan -> Solve -> Verify). | |
| - **PLANNER_SYSTEM_PROMPT**: Controls the multi-modal decomposition logic. | |
| Developers can modify these constants to adjust the chatbot's tone or reasoning strictness. | |
| ## Security & Privacy Guidelines | |
| - **Session Isolation**: User sessions are logically isolated in the database (`conversations` table) and memory cache. | |
| - **Transient Data**: Uploaded images are processed in-memory (or temp storage) and converted to base64/embeddings; they are not permanently retained on disk for privacy. | |
| ## Known Limitations | |
| - **Multimodal Cap**: Supports a maximum of 5 distinct images per query to manage context window limits. | |
| - **Symbolic Rate Limit**: Wolfram Alpha requests are capped at 2000/month. Heavy usage will degrade to the numerical Python solver (Qwen3). | |
| - **Latency**: Complex multi-step reasoning (Plan -> Code -> Fix -> Synthesize) may take 15-30s to fully resolve. | |
| ### AI Model Rate Limits | |
| The system enforces strict rate limits to ensure stability and usage fairness: | |
| | Model ID | RPM (Req/Min) | RPD (Req/Day) | TPM (Tokens/Min) | TPD (Tokens/Day) | | |
| | :--- | :---: | :---: | :---: | :---: | | |
| | **Kimi K2 Instruct** | 60 | 1,000 | 10,000 | 300,000 | | |
| | **Llama-4 Maverick** | 30 | 1,000 | 6,000 | 500,000 | | |
| | **Llama-4 Scout** | 30 | 1,000 | 30,000 | 500,000 | | |
| | **Qwen3-32B** | 60 | 1,000 | 6,000 | 500,000 | | |
| | **GPT-OSS-120B** | 30 | 1,000 | 8,000 | 200,000 | | |
| ## API Usage Examples | |
| ### Natural Language Calculus | |
| > "Tính đạo hàm của f(x) = x^2 + 3x + 2" | |
| ### Multimodal Math Analysis (Image Support) | |
| > [Upload 2 images of a calculus problem] "Giải bài toán trong ảnh sau" | |
| ### Algorithmic Mathematical Tasks | |
| > "Sử dụng mã Python để tìm 100 số nguyên tố đầu tiên và giải thích thuật toán Sieve of Eratosthenes." | |
| ## Troubleshooting | |
| | Issue | Possible Cause | Solution | | |
| | :--- | :--- | :--- | | |
| | **413 Payload Too Large** | Uploading images > 10MB total. | Reduce image size or upload fewer files per turn. | | |
| | **429 Too Many Requests** | Exceeded Wolfram or LLM rate limits. | Wait 60s or switch to a different model tier in `.env`. | | |
| | **LangSmith Error** | Invalid or missing API Key. | Set `LANGSMITH_TRACING=false` in `.env` to disable. | | |
| | **Docker Build Fail** | Network timeout on `uv sync`. | Check internet connection or increase Docker memory limit. | | |
| ## Contributing | |
| We welcome contributions! Please follow these steps: | |
| 1. Fork the repository. | |
| 2. Create a feature branch: `git checkout -b feature/amazing-feature`. | |
| 3. Commit your changes: `git commit -m 'Add amazing feature'`. | |
| 4. Push to the branch: `git push origin feature/amazing-feature`. | |
| 5. Open a Pull Request. | |
| ## License | |
| Distributed under the MIT License. See `LICENSE` for more information. | |
| ## Acknowledgments | |
| We deeply appreciate the open-source community and the providers of the powerful technologies that make Pochi possible: | |
| - **AI & Logic Providers**: | |
| - **LangChain & LangGraph**: For the robust orchestration framework. | |
| - **Groq**: For ultra-low latency Llama inference. | |
| - **Alibaba**: For the Qwen model. | |
| - **OpenAI**: For the GPT-oss model. | |
| - **Moonshot AI**: For the Kimi reasoning model. | |
| - **Meta AI**: For the Llama vision models. | |
| - **Wolfram Alpha**: For the symbolic computation engine. | |
| - **Frontend Ecosystem**: | |
| - **React & Vite**: For the blazing fast UI. | |
| - **Lucide React**: For the beautiful icon set. | |