design.md
PygmyClaw Agent System
app.py β UI
pygmyclaw.py β AI agent + queue + HF storage
pygmyclaw_multitool.py β tool execution engine
tools.json β tool registry stored on HuggingFace
memory.json β task memory stored on HuggingFace
βββββββββββββββββ
β Streamlit β
β app.py β
ββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββ
β PygmyClaw β
β pygmyclaw.py β
ββββββββ¬βββββββββ
β
Queue + Agent logic
β
βΌ
βββββββββββββββββββββββ
β pygmyclaw_multitoolβ
β run_tool() β
βββββββββββββββββββββββ
β
βΌ
HF Storage (tools.json / memory.json)
1. Overview
PygmyClaw is a local AI agent framework designed to run in a Docker Space.
The system combines:
- speculative decoding for faster generation
- tool-calling architecture
- dynamic tool creation
- artifact generation (files/code/data)
- background task execution
- interactive UI
The goal is to create a self-extensible AI system where the agent can:
- use tools
- create new tools
- generate code
- install dependencies
- run programs
- manage tasks
- interact through a UI
2. High-Level System Architecture
User
β
HF Space UI (Gradio)
β
Agent Engine (pygmyclaw.py)
β
Speculative Decoding Engine
β
Tool Call Parser
β
Tool Executor (subprocess)
β
Dynamic Tool Registry
β
Workspace / Artifacts / Queue
β
Result returned to LLM
3. System Layers
The system is organized into five layers.
Layer 1 β User Interface
Provides an interactive interface to the agent.
Responsibilities:
- display chat interaction
- show generated artifacts
- provide code editor
- allow running generated code
- show execution output
UI components:
Chat Panel
Artifact Viewer
Code Editor
Execution Console
Example layout:
----------------------------------
Chat | Code Editor
|
| demo_nn.py
|
| [Run] [Save]
----------------------------------
Console Output
----------------------------------
4. Layer 2 β Agent Engine
Implemented in:
pygmyclaw.py
This is the central orchestrator of the system.
Responsibilities:
- manage LLM interaction
- run speculative decoding
- parse tool calls
- execute tools
- maintain context
- handle queues
- coordinate artifacts
The engine runs an agent execution loop.
5. Agent Execution Loop
The entire system is driven by the following loop.
User Prompt
β
Agent (LLM)
β
LLM outputs JSON tool call
β
Tool executes
β
Result returned to LLM
β
LLM continues reasoning
β
Final response
Expanded loop:
User Prompt
β
LLM reasoning
β
Tool call JSON
β
Tool executor
β
Tool result
β
Context updated
β
LLM reasoning continues
The loop stops when the LLM returns a final answer instead of a tool call.
6. Speculative Decoding Engine
PygmyClaw speeds up generation using speculative decoding.
Architecture:
Drafter 1
Drafter 2
Drafter 3
β
Verifier
Flow:
User prompt
β
Draft tokens generated
β
Verifier checks tokens
β
Accept or reject
This improves generation speed while maintaining accuracy.
Typical configuration:
3 draft models
1 verifier model
Each runs in a separate Ollama instance.
7. Tool System
Tools allow the agent to perform actions outside the LLM.
Tools run in an isolated subprocess:
pygmyclaw_multitool.py
Execution flow:
Agent
β
Tool call JSON
β
Subprocess execution
β
Tool result JSON
Example tool call:
{
"tool": "sys_info",
"parameters": {}
}
Tool response:
{
"os": "Linux",
"python_version": "3.11"
}
8. Tool Categories
The system supports multiple tool types.
System Tools
sys_info
list_files
read_file
write_file
Environment Tools
install_python_package
check_package_installed
execute_shell
Code Tools
write_python_code
run_python_file
format_code
Artifact Tools
create_artifact
update_artifact
delete_artifact
Agent Tools
create_agent
list_agents
run_agent
9. Dynamic Tool Creation
Agents can create new tools dynamically.
Example prompt:
create a tool to fetch python documentation
Agent calls:
{
"tool": "create_tool",
"parameters": {
"name": "python_docs",
"description": "search python docs"
}
}
System creates:
tools/python_docs.py
Tool becomes immediately available.
This enables self-extending agents.
10. Tool Discovery
At startup the engine scans the tools directory.
tools/
echo.py
sys_info.py
run_python.py
A tool registry is generated.
Example:
TOOLS = {
echo,
sys_info,
run_python
}
The tool list is provided to the LLM.
11. Artifact System
Artifacts are files generated by the agent.
Examples:
code
datasets
documents
images
logs
Directory structure:
artifacts/
code/
data/
documents/
Example artifact:
artifacts/code/addition.py
Artifacts enable UI interaction.
12. Artifact UI Interaction
When artifacts are created the UI displays controls.
Example:
addition.py
Edit
Run
Download
Artifact metadata may include:
language
dependencies
runnable
created_by
13. Workspace Environment
All agent work occurs in a dedicated workspace.
workspace/
venv/
artifacts/
tools/
agents/
The workspace provides:
- dependency isolation
- file management
- persistent agent data
14. Dependency Management
Agents may install packages.
Example prompt:
create a pytorch neural network demo
Agent detects dependency:
torch
Tool call:
install_python_package("torch")
Installed inside the workspace environment.
workspace/venv/
15. Code Generation Workflow
Example request:
create a neural network code demo in python using pytorch
Execution flow:
User prompt
β
Agent reasoning
β
Install dependency
β
Generate code
β
Save artifact
β
UI displays code
Artifact example:
artifacts/code/pytorch_nn_demo.py
UI shows:
Edit
Run
Download
16. Code Execution Workflow
When the user presses Run:
UI
β
run_python_file tool
β
workspace python interpreter
β
program execution
β
stdout returned
Console output appears in the UI.
17. Task Queue
The system supports background tasks.
Queue storage options:
Redis
JSON file
Example task:
{
"id": "123",
"prompt": "generate dataset"
}
Queue worker processes tasks asynchronously.
18. Agent Registry
Agents are stored as configuration files.
agents/
python_coder.json
research_agent.json
Example agent definition:
{
"name": "python_coder",
"model": "qwen2.5",
"tools": [
"write_python_code",
"run_python_file"
]
}
Agents can be created dynamically.
19. Redis Usage
Redis is optional but improves scalability.
Uses:
task queue
agent memory
caching
Configuration can be stored in Redis or local config.
20. Hugging Face Space Deployment
The system runs inside a Docker Space.
Components:
Python
Ollama
Gradio
Redis (optional)
Startup sequence:
Start container
β
Start Ollama
β
Load models
β
Start agent engine
β
Launch UI
21. Security Considerations
Important restrictions include:
- package allowlist
- sandboxed tool execution
- workspace file isolation
- limited shell access
Example allowed packages:
numpy
pandas
torch
scikit-learn
matplotlib
22. End-to-End Example
User request:
create a neural network code demo in python that uses pytorch
System execution:
User prompt
β
Agent reasoning
β
install_python_package(torch)
β
write_file(pytorch_nn_demo.py)
β
artifact created
β
UI displays code
User interaction:
Edit code
Run code
Download file
Execution output appears in the console.
23. Final System Architecture
User
β
HF Space UI
β
Agent Engine
β
Speculative Decoding
β
Tool Parser
β
Tool Executor
β
Dynamic Tools
β
Workspace
β
Artifacts / Agents / Queue
β
Result returned to LLM
24. Design Principles
The system follows several core principles.
- Everything is a tool
Actions are performed through tools rather than hardcoded logic.
- Agents can extend themselves
Agents may create tools and agents.
- Artifacts are first-class outputs
Generated files are accessible through the UI.
- Isolation and safety
Tools run in subprocesses.
- Local and lightweight
System runs entirely locally using Ollama.
25. Future Extensions
Potential future capabilities include:
multi-agent collaboration
autonomous project generation
browser automation
dataset generation
long-term memory