Nomearod Claude Opus 4.6 (1M context) commited on
Commit
cd0c04f
Β·
1 Parent(s): 88c4ec6

feat: switch deployment to Hugging Face Spaces (16GB free tier)

Browse files

Render free tier OOMs at 512MB with sentence-transformers. HF Spaces
provides 16GB RAM free with Docker support.

- Dockerfile: user 1000 (HF requirement), port 7860, models pre-downloaded
- README: HF Spaces frontmatter, updated demo URL and curl examples
- Startup warmup re-enabled (16GB is plenty)
- Remove render.yaml

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (4) hide show
  1. README.md +14 -5
  2. agent_bench/serving/app.py +11 -16
  3. docker/Dockerfile +13 -9
  4. render.yaml +0 -19
README.md CHANGED
@@ -1,3 +1,12 @@
 
 
 
 
 
 
 
 
 
1
  # agent-bench
2
 
3
  ![CI](https://github.com/tyy0811/agent-bench/actions/workflows/ci.yaml/badge.svg)
@@ -27,21 +36,21 @@ Evaluated on 27 hand-crafted questions using **gpt-4o-mini** ($0.0004/query) ove
27
 
28
  ## Live Demo
29
 
30
- **https://agent-bench.onrender.com** (Frankfurt, free tier β€” first request after idle may take ~30-60s for cold start)
31
 
32
  ```bash
33
  # In-scope question (expect answer with sources)
34
- curl -X POST https://agent-bench.onrender.com/ask \
35
  -H "Content-Type: application/json" \
36
  -d '{"question": "How do I define a path parameter in FastAPI?"}'
37
 
38
  # Out-of-scope question (expect grounded refusal)
39
- curl -X POST https://agent-bench.onrender.com/ask \
40
  -H "Content-Type: application/json" \
41
  -d '{"question": "How do I cook pasta?"}'
42
 
43
  # Health check
44
- curl https://agent-bench.onrender.com/health
45
  ```
46
 
47
  ## Quick Start (Local)
@@ -156,7 +165,7 @@ See [DECISIONS.md](DECISIONS.md) for rationale on building from primitives, RRF
156
  | Retrieval precision | RRF only | RRF + cross-encoder | Reranking |
157
  | Provider resilience | None | Retry + backoff | Error handling |
158
  | Rate limiting | None | 10 RPM per IP | API hardening |
159
- | Cloud deployment | None | Render (Frankfurt) | Docker β†’ production |
160
  | CI/CD | None | GitHub Actions | Automated quality gates |
161
 
162
  See [DECISIONS.md](DECISIONS.md) for the reasoning behind each design choice.
 
1
+ ---
2
+ title: agent-bench
3
+ emoji: "πŸ”"
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ app_port: 7860
8
+ ---
9
+
10
  # agent-bench
11
 
12
  ![CI](https://github.com/tyy0811/agent-bench/actions/workflows/ci.yaml/badge.svg)
 
36
 
37
  ## Live Demo
38
 
39
+ **https://tyy0811-agent-bench.hf.space** (Hugging Face Spaces β€” first request after idle may take ~30s for cold start)
40
 
41
  ```bash
42
  # In-scope question (expect answer with sources)
43
+ curl -X POST https://tyy0811-agent-bench.hf.space/ask \
44
  -H "Content-Type: application/json" \
45
  -d '{"question": "How do I define a path parameter in FastAPI?"}'
46
 
47
  # Out-of-scope question (expect grounded refusal)
48
+ curl -X POST https://tyy0811-agent-bench.hf.space/ask \
49
  -H "Content-Type: application/json" \
50
  -d '{"question": "How do I cook pasta?"}'
51
 
52
  # Health check
53
+ curl https://tyy0811-agent-bench.hf.space/health
54
  ```
55
 
56
  ## Quick Start (Local)
 
165
  | Retrieval precision | RRF only | RRF + cross-encoder | Reranking |
166
  | Provider resilience | None | Retry + backoff | Error handling |
167
  | Rate limiting | None | 10 RPM per IP | API hardening |
168
+ | Cloud deployment | None | HF Spaces (Docker) | Docker β†’ production |
169
  | CI/CD | None | GitHub Actions | Automated quality gates |
170
 
171
  See [DECISIONS.md](DECISIONS.md) for the reasoning behind each design choice.
agent_bench/serving/app.py CHANGED
@@ -104,21 +104,16 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
104
  app.add_middleware(RateLimitMiddleware, requests_per_minute=config.serving.rate_limit_rpm)
105
  app.include_router(router)
106
 
107
- # Startup warmup: skip on constrained environments (Render free tier 512MB).
108
- # Models load lazily on first request instead.
109
- import os
110
-
111
- if os.environ.get("AGENT_BENCH_ENV") != "production":
112
-
113
- @app.on_event("startup")
114
- async def warmup() -> None:
115
- import structlog
116
-
117
- log = structlog.get_logger()
118
- log.info("warmup_start")
119
- _ = embedder.embed("warmup")
120
- if reranker is not None:
121
- _ = reranker.model # noqa: F841
122
- log.info("warmup_complete")
123
 
124
  return app
 
104
  app.add_middleware(RateLimitMiddleware, requests_per_minute=config.serving.rate_limit_rpm)
105
  app.include_router(router)
106
 
107
+ # Startup warmup: eager-load models to reduce cold start latency
108
+ @app.on_event("startup")
109
+ async def warmup() -> None:
110
+ import structlog
111
+
112
+ log = structlog.get_logger()
113
+ log.info("warmup_start")
114
+ _ = embedder.embed("warmup")
115
+ if reranker is not None:
116
+ _ = reranker.model # noqa: F841
117
+ log.info("warmup_complete")
 
 
 
 
 
118
 
119
  return app
docker/Dockerfile CHANGED
@@ -1,12 +1,15 @@
1
  FROM python:3.11-slim
2
- WORKDIR /app
3
 
4
- # Copy all source before pip install (package needs agent_bench/ to build)
5
- COPY pyproject.toml .
6
- COPY agent_bench/ agent_bench/
7
- COPY configs/ configs/
8
- COPY data/ data/
9
- COPY scripts/ scripts/
 
 
 
 
10
 
11
  RUN pip install --no-cache-dir .
12
 
@@ -17,5 +20,6 @@ RUN python -c "from sentence_transformers import CrossEncoder; CrossEncoder('cro
17
  # Run ingestion at build time so the store is ready
18
  RUN python scripts/ingest.py --doc-dir data/tech_docs/ --store-path .cache/store
19
 
20
- EXPOSE 8000
21
- CMD ["uvicorn", "agent_bench.serving.app:create_app", "--factory", "--host", "0.0.0.0", "--port", "8000"]
 
 
1
  FROM python:3.11-slim
 
2
 
3
+ # HF Spaces requires user ID 1000
4
+ RUN useradd -m -u 1000 user
5
+ WORKDIR /home/user/app
6
+
7
+ # Copy source and install
8
+ COPY --chown=user pyproject.toml .
9
+ COPY --chown=user agent_bench/ agent_bench/
10
+ COPY --chown=user configs/ configs/
11
+ COPY --chown=user data/ data/
12
+ COPY --chown=user scripts/ scripts/
13
 
14
  RUN pip install --no-cache-dir .
15
 
 
20
  # Run ingestion at build time so the store is ready
21
  RUN python scripts/ingest.py --doc-dir data/tech_docs/ --store-path .cache/store
22
 
23
+ USER user
24
+ EXPOSE 7860
25
+ CMD ["uvicorn", "agent_bench.serving.app:create_app", "--factory", "--host", "0.0.0.0", "--port", "7860"]
render.yaml DELETED
@@ -1,19 +0,0 @@
1
- services:
2
- - type: web
3
- name: agent-bench
4
- runtime: python
5
- region: frankfurt
6
- plan: free
7
- autoDeploy: true
8
- buildCommand: pip install . && python scripts/ingest.py --doc-dir data/tech_docs/ --store-path .cache/store
9
- startCommand: uvicorn agent_bench.serving.app:create_app --factory --host 0.0.0.0 --port $PORT
10
- envVars:
11
- - key: OPENAI_API_KEY
12
- sync: false
13
- - key: AGENT_BENCH_ENV
14
- value: production
15
- - key: PYTHON_VERSION
16
- value: 3.11.10
17
- - key: PYTHONUNBUFFERED
18
- value: "1"
19
- healthCheckPath: /health