File size: 14,568 Bytes
bfaf58a
f36cf20
ee294d7
f36cf20
 
 
 
 
 
bfaf58a
f36cf20
bfaf58a
20e5510
9bd51ef
 
 
 
359123a
 
 
 
 
 
ab12077
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9bd51ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20e5510
9bd51ef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
507bcad
 
 
 
9bd51ef
 
 
 
 
 
 
 
 
 
 
 
 
 
20e5510
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
---
title: Calculus Agent
emoji: 🌌
colorFrom: gray
colorTo: gray
sdk: docker
pinned: false
license: mit
short_description: Multi-Agent Calculus Orchestration System
---

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

# Pochi 4.o: Multi-Agent Calculus Orchestration System

Pochi is a high-performance, asynchronous AI platform specialized in solving complex calculus problems. It utilizes a stateful multi-agent system built on LangGraph, coordinating multiple specialized LLMs and symbolic computation engines to achieve pedagogical excellence and mathematical precision.

## Live Demo

| Platform | URL |
| :--- | :--- |
| **Hugging Face** | [Visit Pochi on Hugging Face](https://huggingface.co/spaces/baeGil/calculus-agent) |

## Project Achievements & Performance

Pochi's performance and reliability are continuously monitored via LangSmith. The following data highlights the system's operational excellence and high-speed reasoning capabilities.

![LangSmith Traces](images/traces.png)

### System Health & Usage
| Metric | Value | Description |
| :--- | :--- | :--- |
| **Total Runs** | 476 | Cumulative successful execution cycles. |
| **Total Tokens** | 1.86M | Aggregate token throughput across all agents. |
| **Median Tokens** | 2,846 | Average context size per solver request. |
| **Success Rate** | 99% | System resilience against API and execution errors. |
| **Streaming Adoption** | 99% | Percentage of responses delivered via SSE for real-time feedback. |

### Latency Performance
> Latency varies significantly based on task complexity (e.g., Simple symbolic math vs. Multi-image OCR + Recursive code fixing).

| Stage | P50 (Median) | P99 (Tail) |
| :--- | :---: | :---: |
| **Time to First Token (TTFT)** | 0.53s | 5.30s |
| **End-to-End Latency** | 1.51s | 36.95s |

**Analysis**:
- **Responsiveness**: A P50 TTFT of **0.53s** ensures that users perceive an "instant" start to the response, crucial for engagement.
- **Efficiency**: The P50 latency of **1.51s** for full calculus resolution demonstrates the high-performance nature of the asynchronous multi-agent orchestration.
- **Complexity Buffer**: The P99 latency (**~37s**) accounts for the most intensive "Self-Healing" loops, where the system may perform multiple recursive code fixes or deep vision analysis.

## Highlight Features

- **Multi-Agent Orchestration**: Stateful DAG-based workflow using LangGraph for complex, multi-stage reasoning.
- **Parallel Sub-problem Processing**: Intelligent decomposition of complex queries into independent atomic tasks executed in parallel.
- **Multimodal OCR Intelligence**: High-fidelity vision extraction from up to 5 concurrent images with specialized math support.
- **Hybrid Solving Engine**: Seamlessly combines symbolic precision (Wolfram Alpha) with algorithmic logic (Python Executor).
- **Intelligent Long-Term Memory**: Massive 256K token context window with proactive memory management and token tracking.
- **Premium UI/UX**: Modern glassmorphism design with reactive animations, interactive tours, and native LaTeX rendering.

## System Architecture and Pipeline

The system is engineered as a directed acyclic graph (DAG) of specialized nodes, managed by a central orchestrator that maintains a consistent state throughout the conversation turn.

### The Execution Pipeline

1.  **Vision Ingestion (OCR Agent)**: Processes up to 5 concurrent image inputs. Utilizing Llama-4 Maverick, it extracts raw text and LaTeX-formatted mathematical expressions.
2.  **Strategic Decomposition (Planner)**: Analyzes user intent and OCR data to generate a vectorized execution plan. It decomposes composite problems into independent atomic tasks (JSON defined).
3.  **Parallel Orchestration (Executor)**: The core processing engine that spawns asynchronous execution threads for each atomic task:
    - **Symbolic Branch**: Direct interface with Wolfram Alpha API for verified algebraic and calculus manipulation.
    - **Algorithmic Branch**: Python Code Engine (Qwen3-32B) for numerical methods or complex multi-step logic.
    - **Heuristic Branch**: Direct LLM solving for theoretical or conceptual queries.
4.  **Self-Correction Loop (Code Engine)**: If the Algorithmic Branch encounters execution errors, a specialized CodeFixer (GPT-OSS-120B) performs recursive debugging and code modification.
5.  **Contextual Synthesis (Synthetic Agent)**: Aggregates atomic results, resolves inter-task dependencies, and consults conversation history to produce a structured, pedagogical response.

### Technical Workflow Diagram

```mermaid
graph TD
    User([User Request]) --> API[FastAPI Entry]
    API --> State[Agent State Initialization]
    State --> OCR{OCR Node}
    
    OCR -- Multi-Image --> Vision[Llama-4 Maverick]
    Vision --> Planner[Planner Node: Kimi K2]
    OCR -- Text Only --> Planner
    
    Planner --> Plan{Execution Plan}
    Plan -- All Direct --> Synthetic[Synthetic Agent]
    Plan -- Tool Required --> Executor[Parallel Executor Node]
    
    subgraph ParallelTasks["Async Task Orchestration"]
        Executor --> Wolfram[Wolfram Alpha API]
        Executor --> Code[Qwen3 Code Gen]
        Code --> Exec[Python Executor]
        Exec -- Error --> Fixer[GPT-OSS-120B Fixer]
        Fixer --> Exec
    end
    
    ParallelTasks --> Synthetic
    Synthetic --> Render[LaTeX Formatter]
    Render --> SSE[SSE Stream]
    SSE --> User
    
    subgraph Observability["System Monitoring"]
        Tracing[LangSmith Trace]
        Memory[Session Memory Tracker]
        RateLimit[Token/Request Limiter]
    end
    
    API -.-> Observability
    Executor -.-> Observability
    
```

## Fault Tolerance and Error Handling

Pochi is built with a "Resilience-First" mindset, ensuring that the system remains operational and provides accurate results even when facing API failures or ambiguous inputs.

### 1. Model Redundancy and Failover
- **OCR Failover**: If the primary vision model (Maverick) encounters rate limits or internal errors, the system automatically redirects requests to a high-speed fallback model (Scout).
- **Model Switching**: The `ModelManager` dynamically monitors model health and rate limits (RPM/TPM), performing seamless transitions between tiers without task interruption.

### 2. "Self-Healing" Algorithmic Solving
- **Recursive Debugging**: The Python Code Engine is not a simple "one-shot" executor. If generated code fails (SyntaxError, ZeroDivision, etc.), the system sends the error log back to the `CodeFixer` agent.
- **Fix Loop**: The system allows for multiple recursive fix attempts, where the agent analyzes the stack trace and re-writes the logic until a successful execution is achieved.

### 3. Graceful Degradation of Tools
- **Wolfram-to-Code Fallback**: Symbolic math is the gold standard for precision. However, if the Wolfram Alpha API exceeds its 2000-req/month quota or times out, the system automatically shifts the problem to the Algorithmic Branch for a numerical solve.
- **Synthesis Resilience**: If the Synthetic Agent fails to format the final response (e.g., due to context length), the system performs a "raw-safe" synthesis, delivering the tool results directly to the user to ensure no data is lost.

### 4. Robust State and Parsing
- **Durable IO**: The background agent task saves intermediate results to the database immediately upon generation. This ensures that even if a client disconnects during a 20-second calculation, the result is waiting for them upon refresh.
- **JSON Recovery**: LLMs occasionally return malformed JSON. The `Planner` includes a multi-stage recovery logic that uses regex and string normalization to repair broken JSON blocks, preventing system crashes on minor formatting errors.

### 5. Memory and Resource Safety
- **Context Protection**: The `SessionMemoryTracker` proactively blocks requests that would exceed the 256K token limit, preventing "half-baked" or truncated responses from the LLM.
- **Rate Limit Resilience**: Integrated backoff and retry mechanisms for all third-party API calls (Groq, Wolfram, LangSmith).

## Model Distribution and Specialization

| Component | Model Identifier | Specialization |
| :--- | :--- | :--- |
| **OCR (Primary)** | Llama-4 Maverick | Multi-modal mathematical extraction. |
| **OCR (Fallback)** | Llama-4 Scout | High-speed redundancy for simple OCR. |
| **Planner & Synthesis** | Kimi K2-Instruct | 256K Context, complex reasoning, and pedagogy. |
| **Code Generation** | Qwen3-32B-Instruct | Optimized for Pythonic mathematical logic. |
| **Code Rectification** | GPT-OSS-120B | Deep-context code debugging and error resolution. |
| **Symbolic Logic** | Wolfram Alpha | Deterministic symbolic computation (2000 req/mo). |

## Project Structure

```text
.
├── backend/                # FastAPI Application & LangGraph Agents
│   ├── agent/              # Multi-agent logic (Nodes, Graph, State)
│   ├── database/           # SQLite models and migrations
│   ├── tools/              # Symbolic & Algorithmic executor tools
│   └── utils/              # Memory tracking, rate limiting, tracing
├── frontend/               # React (Vite) Application
│   ├── src/
│   │   ├── components/     # UI components (Math rendering, Tour)
│   │   └── App.jsx         # Main application logic
├── Dockerfile              # Containerized deployment
├── pyproject.toml          # Python dependencies & metadata
└── README.md               # Technical documentation
```

## Mathematics & Computation Stack

Pochi utilizes a heavy-duty scientific stack for high-precision calculations:
- **Symbolic**: SymPy, Wolfram Alpha API
- **Numerical**: NumPy, SciPy, Mpmath
- **Optimization**: CVXpy, PuLP
- **Visuals**: Matplotlib, Seaborn, Plotly
- **Data**: Pandas, Polars, Statsmodels

## Local Deployment

### Environment Configuration
Create a `.env` file in the root directory:
```env
GROQ_API_KEY=your_key_here
WOLFRAM_ALPHA_APP_ID=your_id_here
LANGSMITH_API_KEY=your_key_here (optional for tracking)
LANGSMITH_PROJECT=calculus-chatbot
LANGSMITH_TRACING=true
```

### Backend Infrastructure
1.  Initialize virtual environment: `uv venv && source .venv/bin/activate`
2.  Install dependencies: `uv pip install -r requirements.txt`
3.  Launch Service: `python main.py`

### Frontend Application
1.  Navigate to workspace: `cd frontend`
2.  Install packages: `npm install`
3.  Development server: `npm run dev`

### Docker Deployment
Build and run the entire stack:
```bash
docker build -t pochi-app .
docker run -p 7860:7860 -v ./data:/data --env-file .env pochi-app
```

## API Documentation

The backend service automatically generates interactive API documentation.
-   **Swagger UI**: `http://localhost:7860/docs`
-   **ReDoc**: `http://localhost:7860/redoc`

## Advanced Customization

### Prompt Engineering
The system's persona and logic are defined in `backend/agent/prompts.py`:
-   **GUARD_PROMPT**: Defines the "Pochi" persona and strict safety guardrails.
-   **TOT_PROMPT**: Enforces the Tree-of-Thought reasoning process (Plan -> Solve -> Verify).
-   **PLANNER_SYSTEM_PROMPT**: Controls the multi-modal decomposition logic.

Developers can modify these constants to adjust the chatbot's tone or reasoning strictness.

## Security & Privacy Guidelines

- **Session Isolation**: User sessions are logically isolated in the database (`conversations` table) and memory cache.
- **Transient Data**: Uploaded images are processed in-memory (or temp storage) and converted to base64/embeddings; they are not permanently retained on disk for privacy.

## Known Limitations

- **Multimodal Cap**: Supports a maximum of 5 distinct images per query to manage context window limits.
- **Symbolic Rate Limit**: Wolfram Alpha requests are capped at 2000/month. Heavy usage will degrade to the numerical Python solver (Qwen3).
- **Latency**: Complex multi-step reasoning (Plan -> Code -> Fix -> Synthesize) may take 15-30s to fully resolve.

### AI Model Rate Limits

The system enforces strict rate limits to ensure stability and usage fairness:

| Model ID | RPM (Req/Min) | RPD (Req/Day) | TPM (Tokens/Min) | TPD (Tokens/Day) |
| :--- | :---: | :---: | :---: | :---: |
| **Kimi K2 Instruct** | 60 | 1,000 | 10,000 | 300,000 |
| **Llama-4 Maverick** | 30 | 1,000 | 6,000 | 500,000 |
| **Llama-4 Scout** | 30 | 1,000 | 30,000 | 500,000 |
| **Qwen3-32B** | 60 | 1,000 | 6,000 | 500,000 |
| **GPT-OSS-120B** | 30 | 1,000 | 8,000 | 200,000 |

## API Usage Examples

### Natural Language Calculus
> "Tính đạo hàm của f(x) = x^2 + 3x + 2"

### Multimodal Math Analysis (Image Support)
> [Upload 2 images of a calculus problem] "Giải bài toán trong ảnh sau"

### Algorithmic Mathematical Tasks
> "Sử dụng mã Python để tìm 100 số nguyên tố đầu tiên và giải thích thuật toán Sieve of Eratosthenes."

## Troubleshooting

| Issue | Possible Cause | Solution |
| :--- | :--- | :--- |
| **413 Payload Too Large** | Uploading images > 10MB total. | Reduce image size or upload fewer files per turn. |
| **429 Too Many Requests** | Exceeded Wolfram or LLM rate limits. | Wait 60s or switch to a different model tier in `.env`. |
| **LangSmith Error** | Invalid or missing API Key. | Set `LANGSMITH_TRACING=false` in `.env` to disable. |
| **Docker Build Fail** | Network timeout on `uv sync`. | Check internet connection or increase Docker memory limit. |

## Contributing

We welcome contributions! Please follow these steps:
1.  Fork the repository.
2.  Create a feature branch: `git checkout -b feature/amazing-feature`.
3.  Commit your changes: `git commit -m 'Add amazing feature'`.
4.  Push to the branch: `git push origin feature/amazing-feature`.
5.  Open a Pull Request.

## License

Distributed under the MIT License. See `LICENSE` for more information.

## Acknowledgments

We deeply appreciate the open-source community and the providers of the powerful technologies that make Pochi possible:

- **AI & Logic Providers**:
    - **LangChain & LangGraph**: For the robust orchestration framework.
    - **Groq**: For ultra-low latency Llama inference.
    - **Alibaba**: For the Qwen model.
    - **OpenAI**: For the GPT-oss model.
    - **Moonshot AI**: For the Kimi reasoning model.
    - **Meta AI**: For the Llama vision models.
    - **Wolfram Alpha**: For the symbolic computation engine.
- **Frontend Ecosystem**:
    - **React & Vite**: For the blazing fast UI.
    - **Lucide React**: For the beautiful icon set.