Spaces:
Sleeping
Sleeping
feat: switch deployment to Hugging Face Spaces (16GB free tier)
Browse filesRender free tier OOMs at 512MB with sentence-transformers. HF Spaces
provides 16GB RAM free with Docker support.
- Dockerfile: user 1000 (HF requirement), port 7860, models pre-downloaded
- README: HF Spaces frontmatter, updated demo URL and curl examples
- Startup warmup re-enabled (16GB is plenty)
- Remove render.yaml
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- README.md +14 -5
- agent_bench/serving/app.py +11 -16
- docker/Dockerfile +13 -9
- render.yaml +0 -19
README.md
CHANGED
|
@@ -1,3 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
# agent-bench
|
| 2 |
|
| 3 |

|
|
@@ -27,21 +36,21 @@ Evaluated on 27 hand-crafted questions using **gpt-4o-mini** ($0.0004/query) ove
|
|
| 27 |
|
| 28 |
## Live Demo
|
| 29 |
|
| 30 |
-
**https://agent-bench.
|
| 31 |
|
| 32 |
```bash
|
| 33 |
# In-scope question (expect answer with sources)
|
| 34 |
-
curl -X POST https://agent-bench.
|
| 35 |
-H "Content-Type: application/json" \
|
| 36 |
-d '{"question": "How do I define a path parameter in FastAPI?"}'
|
| 37 |
|
| 38 |
# Out-of-scope question (expect grounded refusal)
|
| 39 |
-
curl -X POST https://agent-bench.
|
| 40 |
-H "Content-Type: application/json" \
|
| 41 |
-d '{"question": "How do I cook pasta?"}'
|
| 42 |
|
| 43 |
# Health check
|
| 44 |
-
curl https://agent-bench.
|
| 45 |
```
|
| 46 |
|
| 47 |
## Quick Start (Local)
|
|
@@ -156,7 +165,7 @@ See [DECISIONS.md](DECISIONS.md) for rationale on building from primitives, RRF
|
|
| 156 |
| Retrieval precision | RRF only | RRF + cross-encoder | Reranking |
|
| 157 |
| Provider resilience | None | Retry + backoff | Error handling |
|
| 158 |
| Rate limiting | None | 10 RPM per IP | API hardening |
|
| 159 |
-
| Cloud deployment | None |
|
| 160 |
| CI/CD | None | GitHub Actions | Automated quality gates |
|
| 161 |
|
| 162 |
See [DECISIONS.md](DECISIONS.md) for the reasoning behind each design choice.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
title: agent-bench
|
| 3 |
+
emoji: "π"
|
| 4 |
+
colorFrom: blue
|
| 5 |
+
colorTo: purple
|
| 6 |
+
sdk: docker
|
| 7 |
+
app_port: 7860
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
# agent-bench
|
| 11 |
|
| 12 |

|
|
|
|
| 36 |
|
| 37 |
## Live Demo
|
| 38 |
|
| 39 |
+
**https://tyy0811-agent-bench.hf.space** (Hugging Face Spaces β first request after idle may take ~30s for cold start)
|
| 40 |
|
| 41 |
```bash
|
| 42 |
# In-scope question (expect answer with sources)
|
| 43 |
+
curl -X POST https://tyy0811-agent-bench.hf.space/ask \
|
| 44 |
-H "Content-Type: application/json" \
|
| 45 |
-d '{"question": "How do I define a path parameter in FastAPI?"}'
|
| 46 |
|
| 47 |
# Out-of-scope question (expect grounded refusal)
|
| 48 |
+
curl -X POST https://tyy0811-agent-bench.hf.space/ask \
|
| 49 |
-H "Content-Type: application/json" \
|
| 50 |
-d '{"question": "How do I cook pasta?"}'
|
| 51 |
|
| 52 |
# Health check
|
| 53 |
+
curl https://tyy0811-agent-bench.hf.space/health
|
| 54 |
```
|
| 55 |
|
| 56 |
## Quick Start (Local)
|
|
|
|
| 165 |
| Retrieval precision | RRF only | RRF + cross-encoder | Reranking |
|
| 166 |
| Provider resilience | None | Retry + backoff | Error handling |
|
| 167 |
| Rate limiting | None | 10 RPM per IP | API hardening |
|
| 168 |
+
| Cloud deployment | None | HF Spaces (Docker) | Docker β production |
|
| 169 |
| CI/CD | None | GitHub Actions | Automated quality gates |
|
| 170 |
|
| 171 |
See [DECISIONS.md](DECISIONS.md) for the reasoning behind each design choice.
|
agent_bench/serving/app.py
CHANGED
|
@@ -104,21 +104,16 @@ def create_app(config: AppConfig | None = None) -> FastAPI:
|
|
| 104 |
app.add_middleware(RateLimitMiddleware, requests_per_minute=config.serving.rate_limit_rpm)
|
| 105 |
app.include_router(router)
|
| 106 |
|
| 107 |
-
# Startup warmup:
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
| 111 |
-
|
| 112 |
-
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
|
| 116 |
-
|
| 117 |
-
|
| 118 |
-
log.info("warmup_start")
|
| 119 |
-
_ = embedder.embed("warmup")
|
| 120 |
-
if reranker is not None:
|
| 121 |
-
_ = reranker.model # noqa: F841
|
| 122 |
-
log.info("warmup_complete")
|
| 123 |
|
| 124 |
return app
|
|
|
|
| 104 |
app.add_middleware(RateLimitMiddleware, requests_per_minute=config.serving.rate_limit_rpm)
|
| 105 |
app.include_router(router)
|
| 106 |
|
| 107 |
+
# Startup warmup: eager-load models to reduce cold start latency
|
| 108 |
+
@app.on_event("startup")
|
| 109 |
+
async def warmup() -> None:
|
| 110 |
+
import structlog
|
| 111 |
+
|
| 112 |
+
log = structlog.get_logger()
|
| 113 |
+
log.info("warmup_start")
|
| 114 |
+
_ = embedder.embed("warmup")
|
| 115 |
+
if reranker is not None:
|
| 116 |
+
_ = reranker.model # noqa: F841
|
| 117 |
+
log.info("warmup_complete")
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 118 |
|
| 119 |
return app
|
docker/Dockerfile
CHANGED
|
@@ -1,12 +1,15 @@
|
|
| 1 |
FROM python:3.11-slim
|
| 2 |
-
WORKDIR /app
|
| 3 |
|
| 4 |
-
#
|
| 5 |
-
|
| 6 |
-
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
COPY
|
|
|
|
|
|
|
|
|
|
|
|
|
| 10 |
|
| 11 |
RUN pip install --no-cache-dir .
|
| 12 |
|
|
@@ -17,5 +20,6 @@ RUN python -c "from sentence_transformers import CrossEncoder; CrossEncoder('cro
|
|
| 17 |
# Run ingestion at build time so the store is ready
|
| 18 |
RUN python scripts/ingest.py --doc-dir data/tech_docs/ --store-path .cache/store
|
| 19 |
|
| 20 |
-
|
| 21 |
-
|
|
|
|
|
|
| 1 |
FROM python:3.11-slim
|
|
|
|
| 2 |
|
| 3 |
+
# HF Spaces requires user ID 1000
|
| 4 |
+
RUN useradd -m -u 1000 user
|
| 5 |
+
WORKDIR /home/user/app
|
| 6 |
+
|
| 7 |
+
# Copy source and install
|
| 8 |
+
COPY --chown=user pyproject.toml .
|
| 9 |
+
COPY --chown=user agent_bench/ agent_bench/
|
| 10 |
+
COPY --chown=user configs/ configs/
|
| 11 |
+
COPY --chown=user data/ data/
|
| 12 |
+
COPY --chown=user scripts/ scripts/
|
| 13 |
|
| 14 |
RUN pip install --no-cache-dir .
|
| 15 |
|
|
|
|
| 20 |
# Run ingestion at build time so the store is ready
|
| 21 |
RUN python scripts/ingest.py --doc-dir data/tech_docs/ --store-path .cache/store
|
| 22 |
|
| 23 |
+
USER user
|
| 24 |
+
EXPOSE 7860
|
| 25 |
+
CMD ["uvicorn", "agent_bench.serving.app:create_app", "--factory", "--host", "0.0.0.0", "--port", "7860"]
|
render.yaml
DELETED
|
@@ -1,19 +0,0 @@
|
|
| 1 |
-
services:
|
| 2 |
-
- type: web
|
| 3 |
-
name: agent-bench
|
| 4 |
-
runtime: python
|
| 5 |
-
region: frankfurt
|
| 6 |
-
plan: free
|
| 7 |
-
autoDeploy: true
|
| 8 |
-
buildCommand: pip install . && python scripts/ingest.py --doc-dir data/tech_docs/ --store-path .cache/store
|
| 9 |
-
startCommand: uvicorn agent_bench.serving.app:create_app --factory --host 0.0.0.0 --port $PORT
|
| 10 |
-
envVars:
|
| 11 |
-
- key: OPENAI_API_KEY
|
| 12 |
-
sync: false
|
| 13 |
-
- key: AGENT_BENCH_ENV
|
| 14 |
-
value: production
|
| 15 |
-
- key: PYTHON_VERSION
|
| 16 |
-
value: 3.11.10
|
| 17 |
-
- key: PYTHONUNBUFFERED
|
| 18 |
-
value: "1"
|
| 19 |
-
healthCheckPath: /health
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|