Spaces:
Paused
Paused
| ### 1. Purpose | |
| The **NexaSci Agent Kit** is a self-contained, local-first agent stack built around: | |
| - **NexaSci Assistant** — a 10B post-trained scientific reasoning model | |
| - **SPECTER (or similar)** — a scientific paper embedding model | |
| - **Tool Server** — FastAPI-based tool-calling backend | |
| - **Sandbox Environment** — controlled Python execution + scientific libraries | |
| - **Simple Web UI** — local interface for interactive use | |
| The kit is designed to: | |
| - Let technical users run the **full scientific agent locally** (on their own GPU) | |
| - Provide a **reusable template** for future agents (e.g., SWE, bio, materials) | |
| - Integrate **reasoning, retrieval, code, and scientific tools** in one place | |
| - Avoid any requirement for hosted services / managed SaaS | |
| ---------- | |
| ### 2. High-Level Architecture | |
| **Components:** | |
| 1. **LLM: NexaSci Assistant** | |
| - 10B model | |
| - Post-trained for: | |
| - tool calling (JSON ToolCall / ToolResult protocol) | |
| - structured scientific outputs (hypothesis, methodology, limitations, etc.) | |
| - paper usage + citations | |
| - self-assessment (“I’m not sure → call tools”) | |
| 2. **Embedding Model: SPECTER (or similar)** | |
| - Scientific document embedding model | |
| - Used to: | |
| - embed paper abstracts / sections | |
| - perform semantic search over a local corpus | |
| - support similarity queries for the agent | |
| - Runs on CPU or GPU (optional acceleration) | |
| 3. **Tool Server (FastAPI)** | |
| - Exposes tools to NexaSci: | |
| - `python.run`: sandboxed Python executor | |
| - `papers.search`: query external APIs or local index | |
| - `papers.fetch`: get metadata/abstracts | |
| - `papers.search_corpus`: query SPECTER-based local corpus (optional) | |
| - Can be extended with: | |
| - chemistry engines (e.g., RDKit-ish workflows) | |
| - PDE solvers (e.g., Fenics-like wrappers) | |
| - quantum simulation stubs | |
| 4. **Agent Controller** | |
| - Orchestrates the agent loop: | |
| - send user prompt + history to LLM | |
| - parse tool calls | |
| - call tool server | |
| - feed back results | |
| - stop on `final` message | |
| - Stateless, minimal, and reusable across agents | |
| 5. **Web UI** | |
| - Lightweight, local-only UI | |
| - Provides: | |
| - input box | |
| - streaming output | |
| - optional view of tool traces | |
| - Built with something simple (e.g. FastAPI + HTML/JS, or Gradio/Streamlit) | |
| ---------- | |
| ### 3. Repository Layout | |
| Proposed repo structure: | |
| `nexa-sci-agent-kit/ | |
| SPEC.md | |
| README.md | |
| docker/ | |
| Dockerfile # GPU-accelerated base image | |
| docker-compose.yml # optional, for combined agent+tools+ui | |
| agent/ | |
| controller.py # agent loop (LLM ↔ tools) | |
| client_llm.py # NexaSci loading + chat interface (transformers/vLLM) | |
| tool_client.py # HTTP client for FastAPI tools | |
| config.yaml # model + server config (ports, endpoints, HF repo) | |
| tools/ | |
| server.py # FastAPI app exposing tools | |
| schemas.py # Pydantic models for ToolCall/ToolResult | |
| python_sandbox.py # sandboxing helpers | |
| paper_sources/ | |
| arxiv_client.py | |
| pubmed_client.py | |
| corpus_search.py # SPECTER-based local search | |
| webui/ | |
| app.py # minimal web server (can be Gradio/Streamlit/FastAPI) | |
| static/ # JS/CSS assets (if needed) | |
| templates/ # optional HTML templates | |
| examples/ | |
| run_local_agent.py # CLI demo (no UI) | |
| sample_prompts.md # curated example prompts | |
| scripts/ | |
| download_models.py # pull NexaSci + SPECTER weights | |
| init_corpus.py # optional: build local paper index | |
| install.sh # convenience installer | |
| requirements.txt` | |
| This layout is **reusable**: swap `client_llm.py` + tools, and you have a SWE agent kit. | |
| ---------- | |
| ### 4. Models | |
| #### 4.1 NexaSci Assistant (LLM) | |
| - **Weights:** hosted on Hugging Face (e.g. `darkstar/nexa-sci-10b`) | |
| - **Form:** merged distilled + tool-calling QLoRA | |
| - **Capabilities:** | |
| - Hypothesis + methodology generation | |
| - Tool calling (Python, paper search) | |
| - Structured JSON final reports | |
| - Uncertainty detection → calls tools when unsure | |
| **Load options:** | |
| - **Transformers** (`AutoModelForCausalLM`) for simplicity | |
| - **vLLM** for GPU-accelerated inference with long contexts / parallel requests | |
| Config in `agent/config.yaml`: | |
| `model_repo: "darkstar/nexa-sci-10b" backend: "vllm" # or "transformers" max_tokens: 1024 temperature: 0.3 top_p: 0.9 tool_prefix: "~~~toolcall" tool_suffix: "~~~" final_prefix: "~~~final" final_suffix: "~~~"` | |
| #### 4.2 Embedding Model (SPECTER or similar) | |
| - **Weights:** e.g. a SPECTER HF repo | |
| - **Use:** | |
| - embed titles/abstracts/sections | |
| - populate FAISS / similar index | |
| - support `papers.search_corpus` tool | |
| Config in `agent/config.yaml`: | |
| `embedding_model_repo: "allenai/specter2_base" # example embedding_device: "cuda" # or "cpu"` | |
| ---------- | |
| ### 5. Tool Server & Sandbox | |
| #### 5.1 FastAPI Tool Server | |
| `tools/server.py`: | |
| - Endpoint examples: | |
| - `POST /tools/python.run` | |
| - Input: `{ "code": "...", "timeout_s": 5 }` | |
| - Output: `{ "stdout": "...", "stderr": "...", "artifacts": [] }` | |
| - `POST /tools/papers.search` | |
| - Input: `{ "query": "...", "top_k": 10 }` | |
| - Output: `[ { "title": "...", "abstract": "...", "doi": "...", "year": 2020 } ]` | |
| - `POST /tools/papers.fetch` | |
| - Input: `{ "doi": "10.XXXX/..." }` | |
| - Output: `{ "title": "...", "abstract": "...", "bibtex": "...", ... }` | |
| - `POST /tools/papers.search_corpus` (optional, embedding-based) | |
| - Input: `{ "query": "...", "top_k": 20 }` | |
| - Output: `[ { "paper_id": "...", "title": "...", "abstract": "...", "score": 0.87 } ]` | |
| #### 5.2 Python Sandbox | |
| `tools/python_sandbox.py` handles: | |
| - Execution in a restricted namespace: | |
| - `numpy`, `scipy`, `pandas`, `matplotlib` available | |
| - optional domain libs: `sympy`, `rdkit`, `ase`, simple PDE solvers | |
| - Constraints: | |
| - time limit (e.g. 5–10 seconds) | |
| - memory limit (via resource module) | |
| - no file system access outside a temp dir | |
| - no network | |
| - Returns: | |
| - stdout / stderr | |
| - optional artifact paths (e.g. plots in `/tmp/artifacts`) | |
| This gives the agent a **safe-ish** playground for: | |
| - simple chemistry calcs | |
| - ODE/PDE toy simulations | |
| - statistical summaries | |
| - plotting | |
| (Domain-heavy engines can be added as specialized tools later.) | |
| ---------- | |
| ### 6. Agent Controller | |
| `agent/controller.py` implements the core loop: | |
| 1. Initialize messages with: | |
| - system prompt (scientific assistant, tool protocol) | |
| - user prompt | |
| 2. Call `client_llm.generate(messages)` | |
| 3. Parse output: | |
| - If it contains a ToolCall block → parse JSON → dispatch via `tool_client.py` | |
| - Append a `tool` message with the tool result | |
| 4. Repeat until a Final block is produced | |
| 5. Return final JSON + pretty-render (for UI) | |
| Design goals: | |
| - Keep controller **stateless** and **minimal** | |
| - Use a small set of message roles: `system`, `user`, `assistant`, `tool` | |
| - Make it trivial to plug in a different LLM backend | |
| ---------- | |
| ### 7. Web UI | |
| `webui/app.py`: | |
| - Provides a local web interface: | |
| - text area for prompt | |
| - dropdown for “mode” (e.g. “Explain paper”, “Design experiment”, “Run simulation”) | |
| - button to run agent | |
| - area to show: | |
| - final answer | |
| - optional tool trace (expandable) | |
| - Implementation options: | |
| - **Gradio**: fastest way to get a web UI | |
| - **Streamlit**: also easy, nice for scientists | |
| - Or a simple HTML/JS frontend served via FastAPI | |
| This is _local-only_ by default. | |
| ---------- | |
| ### 8. Docker & GPU Acceleration | |
| #### 8.1 Dockerfile | |
| `docker/Dockerfile` (conceptual spec): | |
| - Base image: `nvidia/cuda:12.x-cudnn-runtime-ubuntu20.04` | |
| - Install: | |
| - Python 3.10+ | |
| - `pip`, `uv` or `conda` (your call) | |
| - `torch` + CUDA | |
| - `transformers`, `vllm` (optional) | |
| - `fastapi`, `uvicorn` | |
| - `sentence-transformers` or `specter` deps | |
| - `numpy`, `scipy`, `pandas`, `matplotlib` | |
| - any light scientific deps you want in v1 | |
| - Copy repo | |
| - `pip install -r requirements.txt` | |
| - Default `CMD`: | |
| - either start tool server OR start web UI | |
| - Docker Compose can spin both | |
| #### 8.2 Usage | |
| Example: | |
| `docker build -t nexa-sci-agent-kit -f docker/Dockerfile . | |
| docker run --gpus all -p 8000:8000 -p 7860:7860 nexa-sci-agent-kit` | |
| This should bring up: | |
| - tool server on port 8000 | |
| - web UI on port 7860 | |
| ---------- | |
| ### 9. Reusability / Template Design | |
| The kit is meant to be cloned as: | |
| - `nexa-sci-agent-kit` → scientific agent | |
| - `nexa-swe-agent-kit` → SWE/debugging agent | |
| - etc. | |
| To create a new agent kit, you: | |
| - swap `model_repo` in `config.yaml` | |
| - swap or extend tools in `tools/server.py` | |
| - adjust system prompt in `agent/client_llm.py` | |
| - optionally adjust UI text | |
| Everything else stays the same. |