Spaces:

Allanatrix
/

Nexa_Labs

Paused

App Files Files Community

Nexa_Labs / Spec.md

Allanatrix

Upload 57 files

d8328bf verified 3 months ago

preview code

raw

history blame contribute delete

10.4 kB

A newer version of the Gradio SDK is available: 6.7.0

Upgrade

1. Purpose

The NexaSci Agent Kit is a self-contained, local-first agent stack built around:

NexaSci Assistant — a 10B post-trained scientific reasoning model
SPECTER (or similar) — a scientific paper embedding model
Tool Server — FastAPI-based tool-calling backend
Sandbox Environment — controlled Python execution + scientific libraries
Simple Web UI — local interface for interactive use

The kit is designed to:

Let technical users run the full scientific agent locally (on their own GPU)
Provide a reusable template for future agents (e.g., SWE, bio, materials)
Integrate reasoning, retrieval, code, and scientific tools in one place
Avoid any requirement for hosted services / managed SaaS

2. High-Level Architecture

Components:

LLM: NexaSci Assistant
- 10B model
- Post-trained for:
  - tool calling (JSON ToolCall / ToolResult protocol)
  - structured scientific outputs (hypothesis, methodology, limitations, etc.)
  - paper usage + citations
  - self-assessment (“I’m not sure → call tools”)
Embedding Model: SPECTER (or similar)
- Scientific document embedding model
- Used to:
  - embed paper abstracts / sections
  - perform semantic search over a local corpus
  - support similarity queries for the agent
- Runs on CPU or GPU (optional acceleration)
Tool Server (FastAPI)
- Exposes tools to NexaSci:
  - python.run: sandboxed Python executor
  - papers.search: query external APIs or local index
  - papers.fetch: get metadata/abstracts
  - papers.search_corpus: query SPECTER-based local corpus (optional)
- Can be extended with:
  - chemistry engines (e.g., RDKit-ish workflows)
  - PDE solvers (e.g., Fenics-like wrappers)
  - quantum simulation stubs
Agent Controller
- Orchestrates the agent loop:
  - send user prompt + history to LLM
  - parse tool calls
  - call tool server
  - feed back results
  - stop on final message
- Stateless, minimal, and reusable across agents
Web UI
- Lightweight, local-only UI
- Provides:
  - input box
  - streaming output
  - optional view of tool traces
- Built with something simple (e.g. FastAPI + HTML/JS, or Gradio/Streamlit)

3. Repository Layout

Proposed repo structure:

`nexa-sci-agent-kit/ SPEC.md README.md

docker/ Dockerfile # GPU-accelerated base image docker-compose.yml # optional, for combined agent+tools+ui

agent/ controller.py # agent loop (LLM ↔ tools) client_llm.py # NexaSci loading + chat interface (transformers/vLLM) tool_client.py # HTTP client for FastAPI tools config.yaml # model + server config (ports, endpoints, HF repo)

tools/ server.py # FastAPI app exposing tools schemas.py # Pydantic models for ToolCall/ToolResult python_sandbox.py # sandboxing helpers paper_sources/ arxiv_client.py pubmed_client.py corpus_search.py # SPECTER-based local search

webui/ app.py # minimal web server (can be Gradio/Streamlit/FastAPI) static/ # JS/CSS assets (if needed) templates/ # optional HTML templates

examples/ run_local_agent.py # CLI demo (no UI) sample_prompts.md # curated example prompts

scripts/ download_models.py # pull NexaSci + SPECTER weights init_corpus.py # optional: build local paper index install.sh # convenience installer

requirements.txt`

This layout is reusable: swap client_llm.py + tools, and you have a SWE agent kit.

4. Models

4.1 NexaSci Assistant (LLM)

Weights: hosted on Hugging Face (e.g. darkstar/nexa-sci-10b)
Form: merged distilled + tool-calling QLoRA
Capabilities:
- Hypothesis + methodology generation
- Tool calling (Python, paper search)
- Structured JSON final reports
- Uncertainty detection → calls tools when unsure

Load options:

Transformers (AutoModelForCausalLM) for simplicity
vLLM for GPU-accelerated inference with long contexts / parallel requests

Config in agent/config.yaml:

model_repo: "darkstar/nexa-sci-10b" backend: "vllm" # or "transformers" max_tokens: 1024 temperature: 0.3 top_p: 0.9 tool_prefix: "~~~toolcall" tool_suffix: "~~~" final_prefix: "~~~final" final_suffix: "~~~"

4.2 Embedding Model (SPECTER or similar)

Weights: e.g. a SPECTER HF repo
Use:
- embed titles/abstracts/sections
- populate FAISS / similar index
- support papers.search_corpus tool

Config in agent/config.yaml:

embedding_model_repo: "allenai/specter2_base" # example embedding_device: "cuda" # or "cpu"

5. Tool Server & Sandbox

5.1 FastAPI Tool Server

tools/server.py:

Endpoint examples:
- POST /tools/python.run
  - Input: { "code": "...", "timeout_s": 5 }
  - Output: { "stdout": "...", "stderr": "...", "artifacts": [] }
- POST /tools/papers.search
  - Input: { "query": "...", "top_k": 10 }
  - Output: [ { "title": "...", "abstract": "...", "doi": "...", "year": 2020 } ]
- POST /tools/papers.fetch
  - Input: { "doi": "10.XXXX/..." }
  - Output: { "title": "...", "abstract": "...", "bibtex": "...", ... }
- POST /tools/papers.search_corpus (optional, embedding-based)
  - Input: { "query": "...", "top_k": 20 }
  - Output: [ { "paper_id": "...", "title": "...", "abstract": "...", "score": 0.87 } ]

5.2 Python Sandbox

tools/python_sandbox.py handles:

Execution in a restricted namespace:
- numpy, scipy, pandas, matplotlib available
- optional domain libs: sympy, rdkit, ase, simple PDE solvers
Constraints:
- time limit (e.g. 5–10 seconds)
- memory limit (via resource module)
- no file system access outside a temp dir
- no network
Returns:
- stdout / stderr
- optional artifact paths (e.g. plots in /tmp/artifacts)

This gives the agent a safe-ish playground for:

simple chemistry calcs
ODE/PDE toy simulations
statistical summaries
plotting

(Domain-heavy engines can be added as specialized tools later.)

6. Agent Controller

agent/controller.py implements the core loop:

Initialize messages with:
- system prompt (scientific assistant, tool protocol)
- user prompt
Call client_llm.generate(messages)
Parse output:
- If it contains a ToolCall block → parse JSON → dispatch via tool_client.py
- Append a tool message with the tool result
Repeat until a Final block is produced
Return final JSON + pretty-render (for UI)

Design goals:

Keep controller stateless and minimal
Use a small set of message roles: system, user, assistant, tool
Make it trivial to plug in a different LLM backend

7. Web UI

webui/app.py:

Provides a local web interface:
- text area for prompt
- dropdown for “mode” (e.g. “Explain paper”, “Design experiment”, “Run simulation”)
- button to run agent
- area to show:
  - final answer
  - optional tool trace (expandable)
Implementation options:
- Gradio: fastest way to get a web UI
- Streamlit: also easy, nice for scientists
- Or a simple HTML/JS frontend served via FastAPI

This is local-only by default.

8. Docker & GPU Acceleration

8.1 Dockerfile

docker/Dockerfile (conceptual spec):

Base image: nvidia/cuda:12.x-cudnn-runtime-ubuntu20.04
Install:
- Python 3.10+
- pip, uv or conda (your call)
- torch + CUDA
- transformers, vllm (optional)
- fastapi, uvicorn
- sentence-transformers or specter deps
- numpy, scipy, pandas, matplotlib
- any light scientific deps you want in v1
Copy repo
pip install -r requirements.txt
Default CMD:
- either start tool server OR start web UI
- Docker Compose can spin both

8.2 Usage

Example:

docker build -t nexa-sci-agent-kit -f docker/Dockerfile . docker run --gpus all -p 8000:8000 -p 7860:7860 nexa-sci-agent-kit

This should bring up:

tool server on port 8000
web UI on port 7860

9. Reusability / Template Design

The kit is meant to be cloned as:

nexa-sci-agent-kit → scientific agent
nexa-swe-agent-kit → SWE/debugging agent
etc.

To create a new agent kit, you:

swap model_repo in config.yaml
swap or extend tools in tools/server.py
adjust system prompt in agent/client_llm.py
optionally adjust UI text

Everything else stays the same.