Spaces:
Paused
A newer version of the Gradio SDK is available:
6.7.0
1. Purpose
The NexaSci Agent Kit is a self-contained, local-first agent stack built around:
NexaSci Assistant — a 10B post-trained scientific reasoning model
SPECTER (or similar) — a scientific paper embedding model
Tool Server — FastAPI-based tool-calling backend
Sandbox Environment — controlled Python execution + scientific libraries
Simple Web UI — local interface for interactive use
The kit is designed to:
Let technical users run the full scientific agent locally (on their own GPU)
Provide a reusable template for future agents (e.g., SWE, bio, materials)
Integrate reasoning, retrieval, code, and scientific tools in one place
Avoid any requirement for hosted services / managed SaaS
2. High-Level Architecture
Components:
LLM: NexaSci Assistant
10B model
Post-trained for:
tool calling (JSON ToolCall / ToolResult protocol)
structured scientific outputs (hypothesis, methodology, limitations, etc.)
paper usage + citations
self-assessment (“I’m not sure → call tools”)
Embedding Model: SPECTER (or similar)
Scientific document embedding model
Used to:
embed paper abstracts / sections
perform semantic search over a local corpus
support similarity queries for the agent
Runs on CPU or GPU (optional acceleration)
Tool Server (FastAPI)
Exposes tools to NexaSci:
python.run: sandboxed Python executorpapers.search: query external APIs or local indexpapers.fetch: get metadata/abstractspapers.search_corpus: query SPECTER-based local corpus (optional)
Can be extended with:
chemistry engines (e.g., RDKit-ish workflows)
PDE solvers (e.g., Fenics-like wrappers)
quantum simulation stubs
Agent Controller
Orchestrates the agent loop:
send user prompt + history to LLM
parse tool calls
call tool server
feed back results
stop on
finalmessage
Stateless, minimal, and reusable across agents
Web UI
Lightweight, local-only UI
Provides:
input box
streaming output
optional view of tool traces
Built with something simple (e.g. FastAPI + HTML/JS, or Gradio/Streamlit)
3. Repository Layout
Proposed repo structure:
`nexa-sci-agent-kit/ SPEC.md README.md
docker/ Dockerfile # GPU-accelerated base image docker-compose.yml # optional, for combined agent+tools+ui
agent/ controller.py # agent loop (LLM ↔ tools) client_llm.py # NexaSci loading + chat interface (transformers/vLLM) tool_client.py # HTTP client for FastAPI tools config.yaml # model + server config (ports, endpoints, HF repo)
tools/ server.py # FastAPI app exposing tools schemas.py # Pydantic models for ToolCall/ToolResult python_sandbox.py # sandboxing helpers paper_sources/ arxiv_client.py pubmed_client.py corpus_search.py # SPECTER-based local search
webui/ app.py # minimal web server (can be Gradio/Streamlit/FastAPI) static/ # JS/CSS assets (if needed) templates/ # optional HTML templates
examples/ run_local_agent.py # CLI demo (no UI) sample_prompts.md # curated example prompts
scripts/ download_models.py # pull NexaSci + SPECTER weights init_corpus.py # optional: build local paper index install.sh # convenience installer
requirements.txt`
This layout is reusable: swap client_llm.py + tools, and you have a SWE agent kit.
4. Models
4.1 NexaSci Assistant (LLM)
Weights: hosted on Hugging Face (e.g.
darkstar/nexa-sci-10b)Form: merged distilled + tool-calling QLoRA
Capabilities:
Hypothesis + methodology generation
Tool calling (Python, paper search)
Structured JSON final reports
Uncertainty detection → calls tools when unsure
Load options:
Transformers (
AutoModelForCausalLM) for simplicityvLLM for GPU-accelerated inference with long contexts / parallel requests
Config in agent/config.yaml:
model_repo: "darkstar/nexa-sci-10b" backend: "vllm" # or "transformers" max_tokens: 1024 temperature: 0.3 top_p: 0.9 tool_prefix: "~~~toolcall" tool_suffix: "~~~" final_prefix: "~~~final" final_suffix: "~~~"
4.2 Embedding Model (SPECTER or similar)
Weights: e.g. a SPECTER HF repo
Use:
embed titles/abstracts/sections
populate FAISS / similar index
support
papers.search_corpustool
Config in agent/config.yaml:
embedding_model_repo: "allenai/specter2_base" # example embedding_device: "cuda" # or "cpu"
5. Tool Server & Sandbox
5.1 FastAPI Tool Server
tools/server.py:
Endpoint examples:
POST /tools/python.runInput:
{ "code": "...", "timeout_s": 5 }Output:
{ "stdout": "...", "stderr": "...", "artifacts": [] }
POST /tools/papers.searchInput:
{ "query": "...", "top_k": 10 }Output:
[ { "title": "...", "abstract": "...", "doi": "...", "year": 2020 } ]
POST /tools/papers.fetchInput:
{ "doi": "10.XXXX/..." }Output:
{ "title": "...", "abstract": "...", "bibtex": "...", ... }
POST /tools/papers.search_corpus(optional, embedding-based)Input:
{ "query": "...", "top_k": 20 }Output:
[ { "paper_id": "...", "title": "...", "abstract": "...", "score": 0.87 } ]
5.2 Python Sandbox
tools/python_sandbox.py handles:
Execution in a restricted namespace:
numpy,scipy,pandas,matplotlibavailableoptional domain libs:
sympy,rdkit,ase, simple PDE solvers
Constraints:
time limit (e.g. 5–10 seconds)
memory limit (via resource module)
no file system access outside a temp dir
no network
Returns:
stdout / stderr
optional artifact paths (e.g. plots in
/tmp/artifacts)
This gives the agent a safe-ish playground for:
simple chemistry calcs
ODE/PDE toy simulations
statistical summaries
plotting
(Domain-heavy engines can be added as specialized tools later.)
6. Agent Controller
agent/controller.py implements the core loop:
Initialize messages with:
system prompt (scientific assistant, tool protocol)
user prompt
Call
client_llm.generate(messages)Parse output:
If it contains a ToolCall block → parse JSON → dispatch via
tool_client.pyAppend a
toolmessage with the tool result
Repeat until a Final block is produced
Return final JSON + pretty-render (for UI)
Design goals:
Keep controller stateless and minimal
Use a small set of message roles:
system,user,assistant,toolMake it trivial to plug in a different LLM backend
7. Web UI
webui/app.py:
Provides a local web interface:
text area for prompt
dropdown for “mode” (e.g. “Explain paper”, “Design experiment”, “Run simulation”)
button to run agent
area to show:
final answer
optional tool trace (expandable)
Implementation options:
Gradio: fastest way to get a web UI
Streamlit: also easy, nice for scientists
Or a simple HTML/JS frontend served via FastAPI
This is local-only by default.
8. Docker & GPU Acceleration
8.1 Dockerfile
docker/Dockerfile (conceptual spec):
Base image:
nvidia/cuda:12.x-cudnn-runtime-ubuntu20.04Install:
Python 3.10+
pip,uvorconda(your call)torch+ CUDAtransformers,vllm(optional)fastapi,uvicornsentence-transformersorspecterdepsnumpy,scipy,pandas,matplotlibany light scientific deps you want in v1
Copy repo
pip install -r requirements.txtDefault
CMD:either start tool server OR start web UI
Docker Compose can spin both
8.2 Usage
Example:
docker build -t nexa-sci-agent-kit -f docker/Dockerfile . docker run --gpus all -p 8000:8000 -p 7860:7860 nexa-sci-agent-kit
This should bring up:
tool server on port 8000
web UI on port 7860
9. Reusability / Template Design
The kit is meant to be cloned as:
nexa-sci-agent-kit→ scientific agentnexa-swe-agent-kit→ SWE/debugging agentetc.
To create a new agent kit, you:
swap
model_repoinconfig.yamlswap or extend tools in
tools/server.pyadjust system prompt in
agent/client_llm.pyoptionally adjust UI text
Everything else stays the same.