Spaces:

InternScience
/

ResearchHarness

Running

File size: 11,784 Bytes

e5e4fd4
 
f209a8f
 
 
e5e4fd4
f209a8f
e5e4fd4
 
f209a8f
e5e4fd4
 
353ee9f
f209a8f
353ee9f
9a926d3
353ee9f
f209a8f
353ee9f
 
 
 
 
 
 
 
 
 
9a926d3
 
 
353ee9f
 
 
 
 
 
 
 
0dc4dd6
 
 
353ee9f
4aa8f49
353ee9f
4aa8f49
 
 
353ee9f
0dc4dd6
 
 
 
 
4aa8f49
353ee9f
4aa8f49
 
353ee9f
4aa8f49
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
353ee9f
 
 
 
 
49476b4
353ee9f
 
e81bd91
 
 
 
353ee9f
23e9251
353ee9f
 
 
 
 
 
 
f209a8f
 
 
 
 
 
 
 
 
a58ab5d
 
f209a8f
771e544
f209a8f
 
 
 
 
 
 
 
 
86726df
 
1fefce0
771e544
1fefce0
 
f209a8f
 
 
 
 
 
 
 
 
49476b4
f209a8f
 
e81bd91
 
 
f209a8f
1fefce0
 
 
 
 
 
 
 
 
 
 
 
f209a8f
 
 
 
 
 
 
353ee9f
 
 
 
245f73a
 
353ee9f

---
title: ResearchHarness
emoji: 🚀
colorFrom: blue
colorTo: yellow
sdk: docker
app_port: 7860
pinned: false
license: mit
short_description: Lightweight harness for tool-using LLM agents.
---

# ResearchHarness Space Maintenance Notes

This repository is the Hugging Face Docker Space deployment for
[`ResearchHarness`](https://github.com/InternScience/ResearchHarness). It is an online
app mirror, not the public open-source documentation and not a full source mirror.

The public project README, tutorials, benchmark notes, API server documentation,
and local CLI documentation belong in the main GitHub repository. This Space
README should stay focused on long-term deployment maintenance: what is copied
from the main repo, what is intentionally changed for hosted use, and what is
new in the Space.

## Repository Relationship

| Repository | Role |
| --- | --- |
| `github.com/InternScience/ResearchHarness` | Main open-source runtime, CLI, API server, frontend, docs, tests, and benchmark adapters. |
| `huggingface.co/spaces/InternScience/ResearchHarness` | Hugging Face Space app that hosts the browser frontend with managed temporary workspaces. |
| `huggingface.co/datasets/InternScience/ResearchHarness-Data` | Hugging Face dataset receiving collected hosted-run trajectory PRs. |

Maintenance rule:

- Copy only the runtime/frontend pieces needed by the hosted app.
- Do not blindly sync the whole main repository into this Space.
- Space-only deployment logic must not be copied back into the main repo unless
  it is genuinely general-purpose.
- Public documentation should be updated in the main repo, not duplicated here.
- Treat the tables below as the sync boundary. Fully synced files may be copied
  from the main repo and diff-checked. Partially synced files must be updated
  with targeted patches only; do not overwrite them with main-repo files.

## Sync Policy

The Space should stay small and deployment-focused. When the main repository
changes, sync only the files needed by the hosted browser app, then inspect the
diff manually. Do not copy the whole main repository into this Space.

For partially synced files, a clean sync means the diff contains only the
specific hunk needed for the current bug or feature. If a full-file copy creates
large unrelated changes, restore the Space version from `HEAD` and reapply the
minimal patch.

### Fully Synced From The Main Repository

These files/directories should normally match the main repo exactly, unless a
future Space-specific need is documented here:

| Path | Purpose |
| --- | --- |
| `agent_base/base.py` | Base agent interface. |
| `agent_base/console_utils.py` | Shared console/event formatting helpers. |
| `agent_base/context_compact.py` | Context compaction logic. |
| `agent_base/model_profiles.py` | Provider/model profile helpers. |
| `agent_base/prompt.py` | Base system prompt. |
| `agent_base/prompts/system_base.md` | Shared base prompt text. |
| `agent_base/provider_compat.py` | Provider compatibility normalization. |
| `agent_base/session_state.py` | Session state serialization. |
| `agent_base/tools/*.py` | Tool implementations exposed by the Space app. |
| `agent_base/trace_utils.py` | Trace writing utilities. |
| `agent_base/utils.py` | Shared runtime utilities, including default `.env` loading. |
| `VERSION` | Version marker shown by the app/runtime when needed. |

### Partially Synced And Space-Modified

These files are related to main-repo files, but must be merged manually because
the hosted Space has different deployment semantics:

| Path | Maintenance rule |
| --- | --- |
| `agent_base/react_agent.py` | Keep core ReAct/runtime behavior aligned with main. Preserve Space compatibility only when it is genuinely required by the hosted app. |
| `frontend/local_server.py` | Based on the main local frontend server, but Space-modified for managed temporary workspaces, forced `agent_workspace/` + `agent_trace/` layout, workspace zip download, automatic cleanup, trajectory collection hooks, and no arbitrary server-folder picker semantics. Never overwrite this file blindly from main. |
| `frontend/static/index.html` | Starts from the main frontend HTML, but removes the local workspace picker and adds hosted workspace download UI. |
| `frontend/static/app.js` | Starts from the main frontend client, but removes local folder selection and adds download-token / workspace-zip handling. |
| `frontend/static/app.css` | Starts from the main frontend CSS, but includes Space-only hosted workspace/download styling and omits local folder picker modal styles. |
| `requirements.txt` | Starts from the main runtime dependencies, but keeps Space-only hosted dependencies such as `huggingface_hub` and `uvicorn[standard]`. |
| `app.py` | Space-only FastAPI/Hugging Face entrypoint. It owns startup, cleanup scheduling, static mounting, and hosted defaults. |
| `check_space_runtime.py` | Space-only smoke test for deployment import/runtime sanity. |
| `Dockerfile` | Space-only Docker build. |
| `.dockerignore` | Space-only Docker context pruning. |
| `.gitattributes` | Space repository metadata. |
| `.gitignore` | Space-only generated files, cache, and temporary run ignores. |
| `README.md` | Space maintenance notes only. Public project docs belong in the main repo. |

### Out Of Scope For The Space

These main-repo areas should not be copied into this Space unless the hosted app
explicitly starts using them:

| Main-repo path | Reason |
| --- | --- |
| `pyproject.toml`, `MANIFEST.in`, `researchharness/` | PyPI packaging belongs to the main open-source repo, not the hosted app mirror. |
| `.github/` | GitHub CI/release automation does not run in the Hugging Face Space repo. |
| `run_agent.py`, `run_server.py`, `run_frontend.py` | Local CLI/API/frontend entrypoints are not how the Space is launched. |
| `api/` | OpenAI-compatible API server is not part of the Space app. |
| `benchmarks/` | Benchmark adapters and benchmark docs belong to the main repo. |
| `docs/` | Long-form tutorials belong to the main repo. |
| `tests/` | Main local/CI tests belong to the main repo; Space keeps only focused smoke checks. |
| `.env.example` | Public environment template belongs to the main repo. |
| `agent_base/tools/README.md` | Tool documentation belongs to the main repo; Space keeps only runtime code. |
| `agent_base/prompts/plugins/` | Plugin prompt assets are not used by the hosted app unless a future Space feature explicitly needs them. |
| `workspace/`, `api_runs/`, `traces/` | Local placeholder/runtime directories are not checked into Space. |
| local benchmark helpers such as `benchmarks/**/local_*` | Local development helpers must not be deployed. |

Keeping these files out prevents stale code paths and misleading documentation
from accumulating in the Space.

## Space-Specific Runtime Behavior

These behaviors are intentional hosted-app deltas:

- Users cannot select arbitrary server folders. Each new chat gets an isolated
  managed run directory under `RH_SPACE_RUNS_DIR`.
- The runtime layout is always:
  `run_.../agent_workspace/` for agent-visible files and
  `run_.../agent_trace/` for traces and `session_state_*.json`.
- Uploaded images are saved under `agent_workspace/inputs/images/` and are also
  passed to the model as image inputs when supported.
- Users can download files created or handled by the agent with the
  `Download workspace.zip` button. The zip contains only the current chat's
  `agent_workspace/`; it does not include `agent_trace/`, server files, or
  Space secrets.
- The frontend exposes a per-run model dropdown. Current options are `gpt-5.5`
  and `claude-opus-4-8`; the selection must stay local to that run and must not
  mutate global process environment variables.
- Completed runs are packaged for trajectory collection and submitted as pull
  requests to the configured Hugging Face dataset after the batch threshold is
  reached.
- Old inactive runs are cleaned periodically so the Space does not grow without
  bound.

## Required Secrets

Configure these as Hugging Face Space secrets before starting the app:

| Secret | Purpose |
| --- | --- |
| `API_KEY` | API key for your OpenAI-compatible LLM provider. |
| `API_BASE` | OpenAI-compatible `/v1` endpoint. |
| `MODEL_NAME` | Main model used by ResearchHarness. |
| `SERPER_KEY` | WebSearch / ScholarSearch key from <https://serper.dev/>. |
| `JINA_KEY` | WebFetch key from <https://jina.ai/>. |
| `MINERU_TOKEN` | ReadPDF key from <https://mineru.net/>. |
| `HF_TOKEN` | Hugging Face token with write access to `InternScience/ResearchHarness-Data`. |

## Optional Runtime Variables

| Variable | Default | Meaning |
| --- | --- | --- |
| `RH_SPACE_RUNS_DIR` | `/tmp/researchharness_space/runs` | Parent directory for temporary per-chat runs. |
| `RH_SPACE_RETENTION_SECONDS` | `21600` | Delete inactive runs older than this many seconds. |
| `RH_SPACE_MAX_RUNS` | `40` | Keep at most this many inactive runs. |
| `RH_SPACE_CLEANUP_INTERVAL_SECONDS` | `900` | Background cleanup interval. |
| `WEBFETCH_TIMEOUT_SECONDS` | `180` | Overall timeout for one WebFetch tool call. |
| `WEBFETCH_MAX_CHARS` | `30000` | Hard maximum characters returned by one URL-only WebFetch call. |
| `RH_COLLECTION_ENABLED` | `true` | Automatically collect completed hosted runs. |
| `RH_COLLECTION_DATASET_REPO` | `InternScience/ResearchHarness-Data` | Dataset repo that receives trajectory PRs. |
| `RH_COLLECTION_BATCH_SIZE` | `5` | Create one dataset PR after this many collected runs. |
| `RH_COLLECTION_MAX_BUNDLE_BYTES` | `20971520` | Drop a single run bundle if it exceeds this byte limit. |
| `PORT` | `7860` | Port used by Hugging Face Docker Spaces. |

## Runtime Layout

```text
/tmp/researchharness_space/runs/
└── run_YYYYMMDD_HHMMSS_<random>/
    ├── agent_workspace/
    │   └── inputs/images/        # user uploaded images, when present
    └── agent_trace/              # trace JSONL and session_state_*.json
```

The frontend exposes the chat UI and a single `Download workspace.zip` action
for the current chat. The workspace path is managed by the server so hosted
users cannot browse or select server folders.

## Trajectory Collection

Hosted mode automatically collects completed runs without exposing extra UI to users:

- Each completed run is zipped from `agent_workspace/` and `agent_trace/`.
- A `manifest.json` is included inside the zip, and a sidecar `.json` file is kept beside the pending zip.
- If a single bundle is larger than `RH_COLLECTION_MAX_BUNDLE_BYTES` (`20MB` by default), it is dropped immediately.
- Once `RH_COLLECTION_BATCH_SIZE` pending bundles exist, the Space creates a pull request in the configured Hugging Face dataset repo.
- After the dataset PR is created successfully, those local pending bundles are deleted.
- If upload fails, pending bundles are retained and `last_upload_error.json` is written under the local collection directory.
- No redaction is applied in this core hosted collector; keep the dataset private unless you intentionally want to publish the collected traces.

## Local Smoke Test

```bash
python app.py
```

Then open `http://127.0.0.1:7860`.

Before pushing Space changes, run at least:

```bash
python3 check_space_runtime.py

python3 -B - <<'PY'
from pathlib import Path
import py_compile

for path in Path(".").rglob("*.py"):
    if ".git" not in path.parts:
        py_compile.compile(str(path), doraise=True)
print("syntax ok")
PY

RH_COLLECTION_ENABLED=false python3 -B - <<'PY'
from fastapi.testclient import TestClient
import app

client = TestClient(app.app)
response = client.get("/")
assert response.status_code == 200
assert "ResearchHarness" in response.text
print("app ok")
PY

node --check frontend/static/app.js
git diff --check
```