Spaces:

baeGil
/

calculus-agent

Sleeping

File size: 14,568 Bytes

---
title: Calculus Agent
emoji: 🌌
colorFrom: gray
colorTo: gray
sdk: docker
pinned: false
license: mit
short_description: Multi-Agent Calculus Orchestration System
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# Pochi 4.o: Multi-Agent Calculus Orchestration System

Pochi is a high-performance, asynchronous AI platform specialized in solving complex calculus problems. It utilizes a stateful multi-agent system built on LangGraph, coordinating multiple specialized LLMs and symbolic computation engines to achieve pedagogical excellence and mathematical precision.

## Live Demo

| Platform | URL |
| :--- | :--- |
| **Hugging Face** | [Visit Pochi on Hugging Face](https://huggingface.co/spaces/baeGil/calculus-agent) |

## Project Achievements & Performance

Pochi's performance and reliability are continuously monitored via LangSmith. The following data highlights the system's operational excellence and high-speed reasoning capabilities.

![LangSmith Traces](images/traces.png)

### System Health & Usage
| Metric | Value | Description |
| :--- | :--- | :--- |
| **Total Runs** | 476 | Cumulative successful execution cycles. |
| **Total Tokens** | 1.86M | Aggregate token throughput across all agents. |
| **Median Tokens** | 2,846 | Average context size per solver request. |
| **Success Rate** | 99% | System resilience against API and execution errors. |
| **Streaming Adoption** | 99% | Percentage of responses delivered via SSE for real-time feedback. |

### Latency Performance
> Latency varies significantly based on task complexity (e.g., Simple symbolic math vs. Multi-image OCR + Recursive code fixing).

| Stage | P50 (Median) | P99 (Tail) |
| :--- | :---: | :---: |
| **Time to First Token (TTFT)** | 0.53s | 5.30s |
| **End-to-End Latency** | 1.51s | 36.95s |

**Analysis**:
- **Responsiveness**: A P50 TTFT of **0.53s** ensures that users perceive an "instant" start to the response, crucial for engagement.
- **Efficiency**: The P50 latency of **1.51s** for full calculus resolution demonstrates the high-performance nature of the asynchronous multi-agent orchestration.
- **Complexity Buffer**: The P99 latency (**~37s**) accounts for the most intensive "Self-Healing" loops, where the system may perform multiple recursive code fixes or deep vision analysis.

## Highlight Features

- **Multi-Agent Orchestration**: Stateful DAG-based workflow using LangGraph for complex, multi-stage reasoning.
- **Parallel Sub-problem Processing**: Intelligent decomposition of complex queries into independent atomic tasks executed in parallel.
- **Multimodal OCR Intelligence**: High-fidelity vision extraction from up to 5 concurrent images with specialized math support.
- **Hybrid Solving Engine**: Seamlessly combines symbolic precision (Wolfram Alpha) with algorithmic logic (Python Executor).
- **Intelligent Long-Term Memory**: Massive 256K token context window with proactive memory management and token tracking.
- **Premium UI/UX**: Modern glassmorphism design with reactive animations, interactive tours, and native LaTeX rendering.

## System Architecture and Pipeline

The system is engineered as a directed acyclic graph (DAG) of specialized nodes, managed by a central orchestrator that maintains a consistent state throughout the conversation turn.

### The Execution Pipeline

1.  **Vision Ingestion (OCR Agent)**: Processes up to 5 concurrent image inputs. Utilizing Llama-4 Maverick, it extracts raw text and LaTeX-formatted mathematical expressions.
2.  **Strategic Decomposition (Planner)**: Analyzes user intent and OCR data to generate a vectorized execution plan. It decomposes composite problems into independent atomic tasks (JSON defined).
3.  **Parallel Orchestration (Executor)**: The core processing engine that spawns asynchronous execution threads for each atomic task:
    - **Symbolic Branch**: Direct interface with Wolfram Alpha API for verified algebraic and calculus manipulation.
    - **Algorithmic Branch**: Python Code Engine (Qwen3-32B) for numerical methods or complex multi-step logic.
    - **Heuristic Branch**: Direct LLM solving for theoretical or conceptual queries.
4.  **Self-Correction Loop (Code Engine)**: If the Algorithmic Branch encounters execution errors, a specialized CodeFixer (GPT-OSS-120B) performs recursive debugging and code modification.
5.  **Contextual Synthesis (Synthetic Agent)**: Aggregates atomic results, resolves inter-task dependencies, and consults conversation history to produce a structured, pedagogical response.

### Technical Workflow Diagram

```mermaid
graph TD
    User([User Request]) --> API[FastAPI Entry]
    API --> State[Agent State Initialization]
    State --> OCR{OCR Node}
    
    OCR -- Multi-Image --> Vision[Llama-4 Maverick]
    Vision --> Planner[Planner Node: Kimi K2]
    OCR -- Text Only --> Planner
    
    Planner --> Plan{Execution Plan}
    Plan -- All Direct --> Synthetic[Synthetic Agent]
    Plan -- Tool Required --> Executor[Parallel Executor Node]
    
    subgraph ParallelTasks["Async Task Orchestration"]
        Executor --> Wolfram[Wolfram Alpha API]
        Executor --> Code[Qwen3 Code Gen]
        Code --> Exec[Python Executor]
        Exec -- Error --> Fixer[GPT-OSS-120B Fixer]
        Fixer --> Exec
    end
    
    ParallelTasks --> Synthetic
    Synthetic --> Render[LaTeX Formatter]
    Render --> SSE[SSE Stream]
    SSE --> User
    
    subgraph Observability["System Monitoring"]
        Tracing[LangSmith Trace]
        Memory[Session Memory Tracker]
        RateLimit[Token/Request Limiter]
    end
    
    API -.-> Observability
    Executor -.-> Observability
    
```

## Fault Tolerance and Error Handling

Pochi is built with a "Resilience-First" mindset, ensuring that the system remains operational and provides accurate results even when facing API failures or ambiguous inputs.

### 1. Model Redundancy and Failover
- **OCR Failover**: If the primary vision model (Maverick) encounters rate limits or internal errors, the system automatically redirects requests to a high-speed fallback model (Scout).
- **Model Switching**: The `ModelManager` dynamically monitors model health and rate limits (RPM/TPM), performing seamless transitions between tiers without task interruption.

### 2. "Self-Healing" Algorithmic Solving
- **Recursive Debugging**: The Python Code Engine is not a simple "one-shot" executor. If generated code fails (SyntaxError, ZeroDivision, etc.), the system sends the error log back to the `CodeFixer` agent.
- **Fix Loop**: The system allows for multiple recursive fix attempts, where the agent analyzes the stack trace and re-writes the logic until a successful execution is achieved.

### 3. Graceful Degradation of Tools
- **Wolfram-to-Code Fallback**: Symbolic math is the gold standard for precision. However, if the Wolfram Alpha API exceeds its 2000-req/month quota or times out, the system automatically shifts the problem to the Algorithmic Branch for a numerical solve.
- **Synthesis Resilience**: If the Synthetic Agent fails to format the final response (e.g., due to context length), the system performs a "raw-safe" synthesis, delivering the tool results directly to the user to ensure no data is lost.

### 4. Robust State and Parsing
- **Durable IO**: The background agent task saves intermediate results to the database immediately upon generation. This ensures that even if a client disconnects during a 20-second calculation, the result is waiting for them upon refresh.
- **JSON Recovery**: LLMs occasionally return malformed JSON. The `Planner` includes a multi-stage recovery logic that uses regex and string normalization to repair broken JSON blocks, preventing system crashes on minor formatting errors.

### 5. Memory and Resource Safety
- **Context Protection**: The `SessionMemoryTracker` proactively blocks requests that would exceed the 256K token limit, preventing "half-baked" or truncated responses from the LLM.
- **Rate Limit Resilience**: Integrated backoff and retry mechanisms for all third-party API calls (Groq, Wolfram, LangSmith).

## Model Distribution and Specialization

| Component | Model Identifier | Specialization |
| :--- | :--- | :--- |
| **OCR (Primary)** | Llama-4 Maverick | Multi-modal mathematical extraction. |
| **OCR (Fallback)** | Llama-4 Scout | High-speed redundancy for simple OCR. |
| **Planner & Synthesis** | Kimi K2-Instruct | 256K Context, complex reasoning, and pedagogy. |
| **Code Generation** | Qwen3-32B-Instruct | Optimized for Pythonic mathematical logic. |
| **Code Rectification** | GPT-OSS-120B | Deep-context code debugging and error resolution. |
| **Symbolic Logic** | Wolfram Alpha | Deterministic symbolic computation (2000 req/mo). |

## Project Structure

```text
.
├── backend/                # FastAPI Application & LangGraph Agents
│   ├── agent/              # Multi-agent logic (Nodes, Graph, State)
│   ├── database/           # SQLite models and migrations
│   ├── tools/              # Symbolic & Algorithmic executor tools
│   └── utils/              # Memory tracking, rate limiting, tracing
├── frontend/               # React (Vite) Application
│   ├── src/
│   │   ├── components/     # UI components (Math rendering, Tour)
│   │   └── App.jsx         # Main application logic
├── Dockerfile              # Containerized deployment
├── pyproject.toml          # Python dependencies & metadata
└── README.md               # Technical documentation
```

## Mathematics & Computation Stack

Pochi utilizes a heavy-duty scientific stack for high-precision calculations:
- **Symbolic**: SymPy, Wolfram Alpha API
- **Numerical**: NumPy, SciPy, Mpmath
- **Optimization**: CVXpy, PuLP
- **Visuals**: Matplotlib, Seaborn, Plotly
- **Data**: Pandas, Polars, Statsmodels

## Local Deployment

### Environment Configuration
Create a `.env` file in the root directory:
```env
GROQ_API_KEY=your_key_here
WOLFRAM_ALPHA_APP_ID=your_id_here
LANGSMITH_API_KEY=your_key_here (optional for tracking)
LANGSMITH_PROJECT=calculus-chatbot
LANGSMITH_TRACING=true
```

### Backend Infrastructure
1.  Initialize virtual environment: `uv venv && source .venv/bin/activate`
2.  Install dependencies: `uv pip install -r requirements.txt`
3.  Launch Service: `python main.py`

### Frontend Application
1.  Navigate to workspace: `cd frontend`
2.  Install packages: `npm install`
3.  Development server: `npm run dev`

### Docker Deployment
Build and run the entire stack:
```bash
docker build -t pochi-app .
docker run -p 7860:7860 -v ./data:/data --env-file .env pochi-app
```

## API Documentation

The backend service automatically generates interactive API documentation.
-   **Swagger UI**: `http://localhost:7860/docs`
-   **ReDoc**: `http://localhost:7860/redoc`

## Advanced Customization

### Prompt Engineering
The system's persona and logic are defined in `backend/agent/prompts.py`:
-   **GUARD_PROMPT**: Defines the "Pochi" persona and strict safety guardrails.
-   **TOT_PROMPT**: Enforces the Tree-of-Thought reasoning process (Plan -> Solve -> Verify).
-   **PLANNER_SYSTEM_PROMPT**: Controls the multi-modal decomposition logic.

Developers can modify these constants to adjust the chatbot's tone or reasoning strictness.

## Security & Privacy Guidelines

- **Session Isolation**: User sessions are logically isolated in the database (`conversations` table) and memory cache.
- **Transient Data**: Uploaded images are processed in-memory (or temp storage) and converted to base64/embeddings; they are not permanently retained on disk for privacy.

## Known Limitations

- **Multimodal Cap**: Supports a maximum of 5 distinct images per query to manage context window limits.
- **Symbolic Rate Limit**: Wolfram Alpha requests are capped at 2000/month. Heavy usage will degrade to the numerical Python solver (Qwen3).
- **Latency**: Complex multi-step reasoning (Plan -> Code -> Fix -> Synthesize) may take 15-30s to fully resolve.

### AI Model Rate Limits

The system enforces strict rate limits to ensure stability and usage fairness:

| Model ID | RPM (Req/Min) | RPD (Req/Day) | TPM (Tokens/Min) | TPD (Tokens/Day) |
| :--- | :---: | :---: | :---: | :---: |
| **Kimi K2 Instruct** | 60 | 1,000 | 10,000 | 300,000 |
| **Llama-4 Maverick** | 30 | 1,000 | 6,000 | 500,000 |
| **Llama-4 Scout** | 30 | 1,000 | 30,000 | 500,000 |
| **Qwen3-32B** | 60 | 1,000 | 6,000 | 500,000 |
| **GPT-OSS-120B** | 30 | 1,000 | 8,000 | 200,000 |

## API Usage Examples

### Natural Language Calculus
> "Tính đạo hàm của f(x) = x^2 + 3x + 2"

### Multimodal Math Analysis (Image Support)
> [Upload 2 images of a calculus problem] "Giải bài toán trong ảnh sau"

### Algorithmic Mathematical Tasks
> "Sử dụng mã Python để tìm 100 số nguyên tố đầu tiên và giải thích thuật toán Sieve of Eratosthenes."

## Troubleshooting

| Issue | Possible Cause | Solution |
| :--- | :--- | :--- |
| **413 Payload Too Large** | Uploading images > 10MB total. | Reduce image size or upload fewer files per turn. |
| **429 Too Many Requests** | Exceeded Wolfram or LLM rate limits. | Wait 60s or switch to a different model tier in `.env`. |
| **LangSmith Error** | Invalid or missing API Key. | Set `LANGSMITH_TRACING=false` in `.env` to disable. |
| **Docker Build Fail** | Network timeout on `uv sync`. | Check internet connection or increase Docker memory limit. |

## Contributing

We welcome contributions! Please follow these steps:
1.  Fork the repository.
2.  Create a feature branch: `git checkout -b feature/amazing-feature`.
3.  Commit your changes: `git commit -m 'Add amazing feature'`.
4.  Push to the branch: `git push origin feature/amazing-feature`.
5.  Open a Pull Request.

## License

Distributed under the MIT License. See `LICENSE` for more information.

## Acknowledgments

We deeply appreciate the open-source community and the providers of the powerful technologies that make Pochi possible:

- **AI & Logic Providers**:
    - **LangChain & LangGraph**: For the robust orchestration framework.
    - **Groq**: For ultra-low latency Llama inference.
    - **Alibaba**: For the Qwen model.
    - **OpenAI**: For the GPT-oss model.
    - **Moonshot AI**: For the Kimi reasoning model.
    - **Meta AI**: For the Llama vision models.
    - **Wolfram Alpha**: For the symbolic computation engine.
- **Frontend Ecosystem**:
    - **React & Vite**: For the blazing fast UI.
    - **Lucide React**: For the beautiful icon set.