Spaces:
Running
Running
docs(readme): correct test count 444 → 443
Browse filesReconcile README test-count claim with actual `pytest --collect-only`
output (443 tests). Updates the four occurrences in the badge line,
production-engineering bullet, Testing section, and the comparison
table footer row.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
README.md
CHANGED
|
@@ -6,7 +6,7 @@
|
|
| 6 |
|
| 7 |
Agentic knowledge retrieval system with evaluation benchmark. Custom orchestration pipeline + LangChain baseline, evaluated on matched golden datasets across 3 providers (OpenAI, Anthropic, self-hosted vLLM on Modal) and two corpora (FastAPI + Kubernetes). Zero hallucinated citations on all API provider configurations. The separate self-hosted Mistral-7B benchmark is included to show the practical model-size floor where agentic retrieval starts to break down.
|
| 8 |
|
| 9 |
-
`
|
| 10 |
|
| 11 |
## Benchmark Results
|
| 12 |
|
|
@@ -240,7 +240,7 @@ security:
|
|
| 240 |
- **MLOps:** Provider comparison benchmark (API vs self-hosted, real measured data)
|
| 241 |
- **Security — detection & redaction**: Two-tier prompt injection detection (heuristic regex + DeBERTa classifier), PII redaction on retrieved context, output validation gate (PII leakage, URL hallucination, blocklist)
|
| 242 |
- **Security — audit & compliance**: Append-only JSONL audit trail, HMAC-SHA256 IP hashing (GDPR-aligned), log rotation, config-driven security with Literal-constrained enums
|
| 243 |
-
- **Production engineering**: FastAPI, Docker, CI/CD, structured logging, rate limiting, SSE streaming, conversation sessions,
|
| 244 |
|
| 245 |
<details><summary>API Reference</summary>
|
| 246 |
|
|
@@ -302,7 +302,7 @@ The golden dataset contains 27 hand-crafted FastAPI questions (19 retrieval · 3
|
|
| 302 |
## Testing
|
| 303 |
|
| 304 |
```bash
|
| 305 |
-
make test #
|
| 306 |
make lint # ruff + mypy
|
| 307 |
```
|
| 308 |
|
|
@@ -325,4 +325,4 @@ See [DECISIONS.md](DECISIONS.md) for rationale on building from primitives, RRF
|
|
| 325 |
| **PII redaction** | None | None | Regex + optional NER |
|
| 326 |
| **Output validation** | None | None | PII leakage + URL + blocklist |
|
| 327 |
| **Audit logging** | None | None | JSONL, HMAC-hashed IPs |
|
| 328 |
-
| Tests | 97 | 205 |
|
|
|
|
| 6 |
|
| 7 |
Agentic knowledge retrieval system with evaluation benchmark. Custom orchestration pipeline + LangChain baseline, evaluated on matched golden datasets across 3 providers (OpenAI, Anthropic, self-hosted vLLM on Modal) and two corpora (FastAPI + Kubernetes). Zero hallucinated citations on all API provider configurations. The separate self-hosted Mistral-7B benchmark is included to show the practical model-size floor where agentic retrieval starts to break down.
|
| 8 |
|
| 9 |
+
`443 tests` · `3 providers` · `2 corpora` · `LangChain comparison` · `K8s + Terraform` · `CI`
|
| 10 |
|
| 11 |
## Benchmark Results
|
| 12 |
|
|
|
|
| 240 |
- **MLOps:** Provider comparison benchmark (API vs self-hosted, real measured data)
|
| 241 |
- **Security — detection & redaction**: Two-tier prompt injection detection (heuristic regex + DeBERTa classifier), PII redaction on retrieved context, output validation gate (PII leakage, URL hallucination, blocklist)
|
| 242 |
- **Security — audit & compliance**: Append-only JSONL audit trail, HMAC-SHA256 IP hashing (GDPR-aligned), log rotation, config-driven security with Literal-constrained enums
|
| 243 |
+
- **Production engineering**: FastAPI, Docker, CI/CD, structured logging, rate limiting, SSE streaming, conversation sessions, 443 deterministic tests with mock providers
|
| 244 |
|
| 245 |
<details><summary>API Reference</summary>
|
| 246 |
|
|
|
|
| 302 |
## Testing
|
| 303 |
|
| 304 |
```bash
|
| 305 |
+
make test # 443 deterministic tests, no API keys needed
|
| 306 |
make lint # ruff + mypy
|
| 307 |
```
|
| 308 |
|
|
|
|
| 325 |
| **PII redaction** | None | None | Regex + optional NER |
|
| 326 |
| **Output validation** | None | None | PII leakage + URL + blocklist |
|
| 327 |
| **Audit logging** | None | None | JSONL, HMAC-hashed IPs |
|
| 328 |
+
| Tests | 97 | 205 | 443 |
|