Spaces:

baeGil
/

calculus-agent

Running

App Files Files Community

calculus-agent / README.md

Đỗ Hải Nam

fix README

bfaf58a 19 days ago

preview code

raw

history blame contribute delete

14.6 kB

	---
	title: Calculus Agent
	emoji: 🌌
	colorFrom: gray
	colorTo: gray
	sdk: docker
	pinned: false
	license: mit
	short_description: Multi-Agent Calculus Orchestration System
	---

	Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

	# Pochi 4.o: Multi-Agent Calculus Orchestration System

	Pochi is a high-performance, asynchronous AI platform specialized in solving complex calculus problems. It utilizes a stateful multi-agent system built on LangGraph, coordinating multiple specialized LLMs and symbolic computation engines to achieve pedagogical excellence and mathematical precision.

	## Live Demo

	\| Platform \| URL \|
	\| :--- \| :--- \|
	\| Hugging Face \| [Visit Pochi on Hugging Face](https://huggingface.co/spaces/baeGil/calculus-agent) \|

	## Project Achievements & Performance

	Pochi's performance and reliability are continuously monitored via LangSmith. The following data highlights the system's operational excellence and high-speed reasoning capabilities.

	![LangSmith Traces](images/traces.png)

	### System Health & Usage
	\| Metric \| Value \| Description \|
	\| :--- \| :--- \| :--- \|
	\| Total Runs \| 476 \| Cumulative successful execution cycles. \|
	\| Total Tokens \| 1.86M \| Aggregate token throughput across all agents. \|
	\| Median Tokens \| 2,846 \| Average context size per solver request. \|
	\| Success Rate \| 99% \| System resilience against API and execution errors. \|
	\| Streaming Adoption \| 99% \| Percentage of responses delivered via SSE for real-time feedback. \|

	### Latency Performance
	> Latency varies significantly based on task complexity (e.g., Simple symbolic math vs. Multi-image OCR + Recursive code fixing).

	\| Stage \| P50 (Median) \| P99 (Tail) \|
	\| :--- \| :---: \| :---: \|
	\| Time to First Token (TTFT) \| 0.53s \| 5.30s \|
	\| End-to-End Latency \| 1.51s \| 36.95s \|

	Analysis:
	- Responsiveness: A P50 TTFT of 0.53s ensures that users perceive an "instant" start to the response, crucial for engagement.
	- Efficiency: The P50 latency of 1.51s for full calculus resolution demonstrates the high-performance nature of the asynchronous multi-agent orchestration.
	- Complexity Buffer: The P99 latency (~37s) accounts for the most intensive "Self-Healing" loops, where the system may perform multiple recursive code fixes or deep vision analysis.

	## Highlight Features

	- Multi-Agent Orchestration: Stateful DAG-based workflow using LangGraph for complex, multi-stage reasoning.
	- Parallel Sub-problem Processing: Intelligent decomposition of complex queries into independent atomic tasks executed in parallel.
	- Multimodal OCR Intelligence: High-fidelity vision extraction from up to 5 concurrent images with specialized math support.
	- Hybrid Solving Engine: Seamlessly combines symbolic precision (Wolfram Alpha) with algorithmic logic (Python Executor).
	- Intelligent Long-Term Memory: Massive 256K token context window with proactive memory management and token tracking.
	- Premium UI/UX: Modern glassmorphism design with reactive animations, interactive tours, and native LaTeX rendering.

	## System Architecture and Pipeline

	The system is engineered as a directed acyclic graph (DAG) of specialized nodes, managed by a central orchestrator that maintains a consistent state throughout the conversation turn.

	### The Execution Pipeline

	1. Vision Ingestion (OCR Agent): Processes up to 5 concurrent image inputs. Utilizing Llama-4 Maverick, it extracts raw text and LaTeX-formatted mathematical expressions.
	2. Strategic Decomposition (Planner): Analyzes user intent and OCR data to generate a vectorized execution plan. It decomposes composite problems into independent atomic tasks (JSON defined).
	3. Parallel Orchestration (Executor): The core processing engine that spawns asynchronous execution threads for each atomic task:
	- Symbolic Branch: Direct interface with Wolfram Alpha API for verified algebraic and calculus manipulation.
	- Algorithmic Branch: Python Code Engine (Qwen3-32B) for numerical methods or complex multi-step logic.
	- Heuristic Branch: Direct LLM solving for theoretical or conceptual queries.
	4. Self-Correction Loop (Code Engine): If the Algorithmic Branch encounters execution errors, a specialized CodeFixer (GPT-OSS-120B) performs recursive debugging and code modification.
	5. Contextual Synthesis (Synthetic Agent): Aggregates atomic results, resolves inter-task dependencies, and consults conversation history to produce a structured, pedagogical response.

	### Technical Workflow Diagram

	```mermaid
	graph TD
	User([User Request]) --> API[FastAPI Entry]
	API --> State[Agent State Initialization]
	State --> OCR{OCR Node}

	OCR -- Multi-Image --> Vision[Llama-4 Maverick]
	Vision --> Planner[Planner Node: Kimi K2]
	OCR -- Text Only --> Planner

	Planner --> Plan{Execution Plan}
	Plan -- All Direct --> Synthetic[Synthetic Agent]
	Plan -- Tool Required --> Executor[Parallel Executor Node]

	subgraph ParallelTasks["Async Task Orchestration"]
	Executor --> Wolfram[Wolfram Alpha API]
	Executor --> Code[Qwen3 Code Gen]
	Code --> Exec[Python Executor]
	Exec -- Error --> Fixer[GPT-OSS-120B Fixer]
	Fixer --> Exec
	end

	ParallelTasks --> Synthetic
	Synthetic --> Render[LaTeX Formatter]
	Render --> SSE[SSE Stream]
	SSE --> User

	subgraph Observability["System Monitoring"]
	Tracing[LangSmith Trace]
	Memory[Session Memory Tracker]
	RateLimit[Token/Request Limiter]
	end

	API -.-> Observability
	Executor -.-> Observability

	```

	## Fault Tolerance and Error Handling

	Pochi is built with a "Resilience-First" mindset, ensuring that the system remains operational and provides accurate results even when facing API failures or ambiguous inputs.

	### 1. Model Redundancy and Failover
	- OCR Failover: If the primary vision model (Maverick) encounters rate limits or internal errors, the system automatically redirects requests to a high-speed fallback model (Scout).
	- Model Switching: The `ModelManager` dynamically monitors model health and rate limits (RPM/TPM), performing seamless transitions between tiers without task interruption.

	### 2. "Self-Healing" Algorithmic Solving
	- Recursive Debugging: The Python Code Engine is not a simple "one-shot" executor. If generated code fails (SyntaxError, ZeroDivision, etc.), the system sends the error log back to the `CodeFixer` agent.
	- Fix Loop: The system allows for multiple recursive fix attempts, where the agent analyzes the stack trace and re-writes the logic until a successful execution is achieved.

	### 3. Graceful Degradation of Tools
	- Wolfram-to-Code Fallback: Symbolic math is the gold standard for precision. However, if the Wolfram Alpha API exceeds its 2000-req/month quota or times out, the system automatically shifts the problem to the Algorithmic Branch for a numerical solve.
	- Synthesis Resilience: If the Synthetic Agent fails to format the final response (e.g., due to context length), the system performs a "raw-safe" synthesis, delivering the tool results directly to the user to ensure no data is lost.

	### 4. Robust State and Parsing
	- Durable IO: The background agent task saves intermediate results to the database immediately upon generation. This ensures that even if a client disconnects during a 20-second calculation, the result is waiting for them upon refresh.
	- JSON Recovery: LLMs occasionally return malformed JSON. The `Planner` includes a multi-stage recovery logic that uses regex and string normalization to repair broken JSON blocks, preventing system crashes on minor formatting errors.

	### 5. Memory and Resource Safety
	- Context Protection: The `SessionMemoryTracker` proactively blocks requests that would exceed the 256K token limit, preventing "half-baked" or truncated responses from the LLM.
	- Rate Limit Resilience: Integrated backoff and retry mechanisms for all third-party API calls (Groq, Wolfram, LangSmith).

	## Model Distribution and Specialization

	\| Component \| Model Identifier \| Specialization \|
	\| :--- \| :--- \| :--- \|
	\| OCR (Primary) \| Llama-4 Maverick \| Multi-modal mathematical extraction. \|
	\| OCR (Fallback) \| Llama-4 Scout \| High-speed redundancy for simple OCR. \|
	\| Planner & Synthesis \| Kimi K2-Instruct \| 256K Context, complex reasoning, and pedagogy. \|
	\| Code Generation \| Qwen3-32B-Instruct \| Optimized for Pythonic mathematical logic. \|
	\| Code Rectification \| GPT-OSS-120B \| Deep-context code debugging and error resolution. \|
	\| Symbolic Logic \| Wolfram Alpha \| Deterministic symbolic computation (2000 req/mo). \|

	## Project Structure

	```text
	.
	├── backend/ # FastAPI Application & LangGraph Agents
	│ ├── agent/ # Multi-agent logic (Nodes, Graph, State)
	│ ├── database/ # SQLite models and migrations
	│ ├── tools/ # Symbolic & Algorithmic executor tools
	│ └── utils/ # Memory tracking, rate limiting, tracing
	├── frontend/ # React (Vite) Application
	│ ├── src/
	│ │ ├── components/ # UI components (Math rendering, Tour)
	│ │ └── App.jsx # Main application logic
	├── Dockerfile # Containerized deployment
	├── pyproject.toml # Python dependencies & metadata
	└── README.md # Technical documentation
	```

	## Mathematics & Computation Stack

	Pochi utilizes a heavy-duty scientific stack for high-precision calculations:
	- Symbolic: SymPy, Wolfram Alpha API
	- Numerical: NumPy, SciPy, Mpmath
	- Optimization: CVXpy, PuLP
	- Visuals: Matplotlib, Seaborn, Plotly
	- Data: Pandas, Polars, Statsmodels

	## Local Deployment

	### Environment Configuration
	Create a `.env` file in the root directory:
	```env
	GROQ_API_KEY=your_key_here
	WOLFRAM_ALPHA_APP_ID=your_id_here
	LANGSMITH_API_KEY=your_key_here (optional for tracking)
	LANGSMITH_PROJECT=calculus-chatbot
	LANGSMITH_TRACING=true
	```

	### Backend Infrastructure
	1. Initialize virtual environment: `uv venv && source .venv/bin/activate`
	2. Install dependencies: `uv pip install -r requirements.txt`
	3. Launch Service: `python main.py`

	### Frontend Application
	1. Navigate to workspace: `cd frontend`
	2. Install packages: `npm install`
	3. Development server: `npm run dev`

	### Docker Deployment
	Build and run the entire stack:
	```bash
	docker build -t pochi-app .
	docker run -p 7860:7860 -v ./data:/data --env-file .env pochi-app
	```

	## API Documentation

	The backend service automatically generates interactive API documentation.
	- Swagger UI: `http://localhost:7860/docs`
	- ReDoc: `http://localhost:7860/redoc`

	## Advanced Customization

	### Prompt Engineering
	The system's persona and logic are defined in `backend/agent/prompts.py`:
	- GUARD_PROMPT: Defines the "Pochi" persona and strict safety guardrails.
	- TOT_PROMPT: Enforces the Tree-of-Thought reasoning process (Plan -> Solve -> Verify).
	- PLANNER_SYSTEM_PROMPT: Controls the multi-modal decomposition logic.

	Developers can modify these constants to adjust the chatbot's tone or reasoning strictness.

	## Security & Privacy Guidelines

	- Session Isolation: User sessions are logically isolated in the database (`conversations` table) and memory cache.
	- Transient Data: Uploaded images are processed in-memory (or temp storage) and converted to base64/embeddings; they are not permanently retained on disk for privacy.

	## Known Limitations

	- Multimodal Cap: Supports a maximum of 5 distinct images per query to manage context window limits.
	- Symbolic Rate Limit: Wolfram Alpha requests are capped at 2000/month. Heavy usage will degrade to the numerical Python solver (Qwen3).
	- Latency: Complex multi-step reasoning (Plan -> Code -> Fix -> Synthesize) may take 15-30s to fully resolve.

	### AI Model Rate Limits

	The system enforces strict rate limits to ensure stability and usage fairness:

	\| Model ID \| RPM (Req/Min) \| RPD (Req/Day) \| TPM (Tokens/Min) \| TPD (Tokens/Day) \|
	\| :--- \| :---: \| :---: \| :---: \| :---: \|
	\| Kimi K2 Instruct \| 60 \| 1,000 \| 10,000 \| 300,000 \|
	\| Llama-4 Maverick \| 30 \| 1,000 \| 6,000 \| 500,000 \|
	\| Llama-4 Scout \| 30 \| 1,000 \| 30,000 \| 500,000 \|
	\| Qwen3-32B \| 60 \| 1,000 \| 6,000 \| 500,000 \|
	\| GPT-OSS-120B \| 30 \| 1,000 \| 8,000 \| 200,000 \|

	## API Usage Examples

	### Natural Language Calculus
	> "Tính đạo hàm của f(x) = x^2 + 3x + 2"

	### Multimodal Math Analysis (Image Support)
	> [Upload 2 images of a calculus problem] "Giải bài toán trong ảnh sau"

	### Algorithmic Mathematical Tasks
	> "Sử dụng mã Python để tìm 100 số nguyên tố đầu tiên và giải thích thuật toán Sieve of Eratosthenes."

	## Troubleshooting

	\| Issue \| Possible Cause \| Solution \|
	\| :--- \| :--- \| :--- \|
	\| 413 Payload Too Large \| Uploading images > 10MB total. \| Reduce image size or upload fewer files per turn. \|
	\| 429 Too Many Requests \| Exceeded Wolfram or LLM rate limits. \| Wait 60s or switch to a different model tier in `.env`. \|
	\| LangSmith Error \| Invalid or missing API Key. \| Set `LANGSMITH_TRACING=false` in `.env` to disable. \|
	\| Docker Build Fail \| Network timeout on `uv sync`. \| Check internet connection or increase Docker memory limit. \|

	## Contributing

	We welcome contributions! Please follow these steps:
	1. Fork the repository.
	2. Create a feature branch: `git checkout -b feature/amazing-feature`.
	3. Commit your changes: `git commit -m 'Add amazing feature'`.
	4. Push to the branch: `git push origin feature/amazing-feature`.
	5. Open a Pull Request.

	## License

	Distributed under the MIT License. See `LICENSE` for more information.

	## Acknowledgments

	We deeply appreciate the open-source community and the providers of the powerful technologies that make Pochi possible:

	- AI & Logic Providers:
	- LangChain & LangGraph: For the robust orchestration framework.
	- Groq: For ultra-low latency Llama inference.
	- Alibaba: For the Qwen model.
	- OpenAI: For the GPT-oss model.
	- Moonshot AI: For the Kimi reasoning model.
	- Meta AI: For the Llama vision models.
	- Wolfram Alpha: For the symbolic computation engine.
	- Frontend Ecosystem:
	- React & Vite: For the blazing fast UI.
	- Lucide React: For the beautiful icon set.