Nexa_Labs / Spec.md
Allanatrix's picture
Upload 57 files
d8328bf verified
### 1. Purpose
The **NexaSci Agent Kit** is a self-contained, local-first agent stack built around:
- **NexaSci Assistant** — a 10B post-trained scientific reasoning model
- **SPECTER (or similar)** — a scientific paper embedding model
- **Tool Server** — FastAPI-based tool-calling backend
- **Sandbox Environment** — controlled Python execution + scientific libraries
- **Simple Web UI** — local interface for interactive use
The kit is designed to:
- Let technical users run the **full scientific agent locally** (on their own GPU)
- Provide a **reusable template** for future agents (e.g., SWE, bio, materials)
- Integrate **reasoning, retrieval, code, and scientific tools** in one place
- Avoid any requirement for hosted services / managed SaaS
----------
### 2. High-Level Architecture
**Components:**
1. **LLM: NexaSci Assistant**
- 10B model
- Post-trained for:
- tool calling (JSON ToolCall / ToolResult protocol)
- structured scientific outputs (hypothesis, methodology, limitations, etc.)
- paper usage + citations
- self-assessment (“I’m not sure → call tools”)
2. **Embedding Model: SPECTER (or similar)**
- Scientific document embedding model
- Used to:
- embed paper abstracts / sections
- perform semantic search over a local corpus
- support similarity queries for the agent
- Runs on CPU or GPU (optional acceleration)
3. **Tool Server (FastAPI)**
- Exposes tools to NexaSci:
- `python.run`: sandboxed Python executor
- `papers.search`: query external APIs or local index
- `papers.fetch`: get metadata/abstracts
- `papers.search_corpus`: query SPECTER-based local corpus (optional)
- Can be extended with:
- chemistry engines (e.g., RDKit-ish workflows)
- PDE solvers (e.g., Fenics-like wrappers)
- quantum simulation stubs
4. **Agent Controller**
- Orchestrates the agent loop:
- send user prompt + history to LLM
- parse tool calls
- call tool server
- feed back results
- stop on `final` message
- Stateless, minimal, and reusable across agents
5. **Web UI**
- Lightweight, local-only UI
- Provides:
- input box
- streaming output
- optional view of tool traces
- Built with something simple (e.g. FastAPI + HTML/JS, or Gradio/Streamlit)
----------
### 3. Repository Layout
Proposed repo structure:
`nexa-sci-agent-kit/
SPEC.md
README.md
docker/
Dockerfile # GPU-accelerated base image
docker-compose.yml # optional, for combined agent+tools+ui
agent/
controller.py # agent loop (LLM ↔ tools)
client_llm.py # NexaSci loading + chat interface (transformers/vLLM)
tool_client.py # HTTP client for FastAPI tools
config.yaml # model + server config (ports, endpoints, HF repo)
tools/
server.py # FastAPI app exposing tools
schemas.py # Pydantic models for ToolCall/ToolResult
python_sandbox.py # sandboxing helpers
paper_sources/
arxiv_client.py
pubmed_client.py
corpus_search.py # SPECTER-based local search
webui/
app.py # minimal web server (can be Gradio/Streamlit/FastAPI)
static/ # JS/CSS assets (if needed)
templates/ # optional HTML templates
examples/
run_local_agent.py # CLI demo (no UI)
sample_prompts.md # curated example prompts
scripts/
download_models.py # pull NexaSci + SPECTER weights
init_corpus.py # optional: build local paper index
install.sh # convenience installer
requirements.txt`
This layout is **reusable**: swap `client_llm.py` + tools, and you have a SWE agent kit.
----------
### 4. Models
#### 4.1 NexaSci Assistant (LLM)
- **Weights:** hosted on Hugging Face (e.g. `darkstar/nexa-sci-10b`)
- **Form:** merged distilled + tool-calling QLoRA
- **Capabilities:**
- Hypothesis + methodology generation
- Tool calling (Python, paper search)
- Structured JSON final reports
- Uncertainty detection → calls tools when unsure
**Load options:**
- **Transformers** (`AutoModelForCausalLM`) for simplicity
- **vLLM** for GPU-accelerated inference with long contexts / parallel requests
Config in `agent/config.yaml`:
`model_repo: "darkstar/nexa-sci-10b" backend: "vllm" # or "transformers" max_tokens: 1024 temperature: 0.3 top_p: 0.9 tool_prefix: "~~~toolcall" tool_suffix: "~~~" final_prefix: "~~~final" final_suffix: "~~~"`
#### 4.2 Embedding Model (SPECTER or similar)
- **Weights:** e.g. a SPECTER HF repo
- **Use:**
- embed titles/abstracts/sections
- populate FAISS / similar index
- support `papers.search_corpus` tool
Config in `agent/config.yaml`:
`embedding_model_repo: "allenai/specter2_base" # example embedding_device: "cuda" # or "cpu"`
----------
### 5. Tool Server & Sandbox
#### 5.1 FastAPI Tool Server
`tools/server.py`:
- Endpoint examples:
- `POST /tools/python.run`
- Input: `{ "code": "...", "timeout_s": 5 }`
- Output: `{ "stdout": "...", "stderr": "...", "artifacts": [] }`
- `POST /tools/papers.search`
- Input: `{ "query": "...", "top_k": 10 }`
- Output: `[ { "title": "...", "abstract": "...", "doi": "...", "year": 2020 } ]`
- `POST /tools/papers.fetch`
- Input: `{ "doi": "10.XXXX/..." }`
- Output: `{ "title": "...", "abstract": "...", "bibtex": "...", ... }`
- `POST /tools/papers.search_corpus` (optional, embedding-based)
- Input: `{ "query": "...", "top_k": 20 }`
- Output: `[ { "paper_id": "...", "title": "...", "abstract": "...", "score": 0.87 } ]`
#### 5.2 Python Sandbox
`tools/python_sandbox.py` handles:
- Execution in a restricted namespace:
- `numpy`, `scipy`, `pandas`, `matplotlib` available
- optional domain libs: `sympy`, `rdkit`, `ase`, simple PDE solvers
- Constraints:
- time limit (e.g. 5–10 seconds)
- memory limit (via resource module)
- no file system access outside a temp dir
- no network
- Returns:
- stdout / stderr
- optional artifact paths (e.g. plots in `/tmp/artifacts`)
This gives the agent a **safe-ish** playground for:
- simple chemistry calcs
- ODE/PDE toy simulations
- statistical summaries
- plotting
(Domain-heavy engines can be added as specialized tools later.)
----------
### 6. Agent Controller
`agent/controller.py` implements the core loop:
1. Initialize messages with:
- system prompt (scientific assistant, tool protocol)
- user prompt
2. Call `client_llm.generate(messages)`
3. Parse output:
- If it contains a ToolCall block → parse JSON → dispatch via `tool_client.py`
- Append a `tool` message with the tool result
4. Repeat until a Final block is produced
5. Return final JSON + pretty-render (for UI)
Design goals:
- Keep controller **stateless** and **minimal**
- Use a small set of message roles: `system`, `user`, `assistant`, `tool`
- Make it trivial to plug in a different LLM backend
----------
### 7. Web UI
`webui/app.py`:
- Provides a local web interface:
- text area for prompt
- dropdown for “mode” (e.g. “Explain paper”, “Design experiment”, “Run simulation”)
- button to run agent
- area to show:
- final answer
- optional tool trace (expandable)
- Implementation options:
- **Gradio**: fastest way to get a web UI
- **Streamlit**: also easy, nice for scientists
- Or a simple HTML/JS frontend served via FastAPI
This is _local-only_ by default.
----------
### 8. Docker & GPU Acceleration
#### 8.1 Dockerfile
`docker/Dockerfile` (conceptual spec):
- Base image: `nvidia/cuda:12.x-cudnn-runtime-ubuntu20.04`
- Install:
- Python 3.10+
- `pip`, `uv` or `conda` (your call)
- `torch` + CUDA
- `transformers`, `vllm` (optional)
- `fastapi`, `uvicorn`
- `sentence-transformers` or `specter` deps
- `numpy`, `scipy`, `pandas`, `matplotlib`
- any light scientific deps you want in v1
- Copy repo
- `pip install -r requirements.txt`
- Default `CMD`:
- either start tool server OR start web UI
- Docker Compose can spin both
#### 8.2 Usage
Example:
`docker build -t nexa-sci-agent-kit -f docker/Dockerfile .
docker run --gpus all -p 8000:8000 -p 7860:7860 nexa-sci-agent-kit`
This should bring up:
- tool server on port 8000
- web UI on port 7860
----------
### 9. Reusability / Template Design
The kit is meant to be cloned as:
- `nexa-sci-agent-kit` → scientific agent
- `nexa-swe-agent-kit` → SWE/debugging agent
- etc.
To create a new agent kit, you:
- swap `model_repo` in `config.yaml`
- swap or extend tools in `tools/server.py`
- adjust system prompt in `agent/client_llm.py`
- optionally adjust UI text
Everything else stays the same.