| --- |
| title: Touchdown Compression Classifier |
| emoji: ๐ |
| colorFrom: blue |
| colorTo: green |
| sdk: docker |
| app_port: 7860 |
| --- |
| |
| # Touchdown Compression Classifier |
|
|
| Free CPU Hugging Face Space scaffold for the managed prompt compression API. |
|
|
| Phase 1 serves deterministic deletion-only compression with receipts. The |
| planned classifier backbone is `microsoft/deberta-v3-small`; the API reports |
| classifier status honestly until a trained KEEP/DROP head or ONNX export is |
| mounted. |
|
|
| Endpoints: |
|
|
| - `GET /health` |
| - `POST /v1/compress` |
| - `POST /v1/classify` |
|
|
| Live Space: |
|
|
| - `https://wchen22-touchdown-compression-classifier.hf.space` |
| - Verified 2026-06-11 with HF CLI: runtime stage `RUNNING`, hardware |
| `cpu-basic`, domain `READY`, repo/runtime SHA |
| `0dfe65a6c82c9e7fa37d2c4a32c8eda3ed4e96d7`. |
| - The deployed scaffold supports chunked ONNX artifact inference for long |
| prompts. Use `hf spaces info wchen22/touchdown-compression-classifier --format |
| json` for the current repo/runtime SHA. |
| - Live smoke: |
| `python3 scripts/smoke_compression_api.py --base-url https://wchen22-touchdown-compression-classifier.hf.space --include-classify --include-batch --include-messages --include-gzip` |
| validates `/health`, `/v1/classify`, single `/v1/compress`, and managed |
| `inputs[]` batch, managed `messages[]`, plus gzipped JSON request/response |
| transport. |
| - Real-corpus API benchmark: |
| `python3 scripts/benchmark_compression_api.py --base-url https://wchen22-touchdown-compression-classifier.hf.space --input-jsonl benchmarks/prompts/real/kv_stress_seed.jsonl --limit 4 --tokenizer-model Qwen/Qwen2.5-7B-Instruct --require-exact-tokens`. |
| This calls hosted `/v1/compress` over real prompt rows and fails the run if |
| receipts return estimated token counts. Use this before claiming real-token |
| savings. |
| - Full deployment receipt: |
| `python3 scripts/verify_compression_space.py --expected-sha <sha> --out reports/generated/compression_space/hf_space_verification.json` |
| validates HF runtime metadata, repo/runtime SHA agreement, API smoke, and |
| remote/local Space file parity. |
| - Fresh local receipts are written under |
| `reports/generated/compression_space/`; run the full verifier with the |
| current Space SHA to check runtime, API smoke, and remote/local file parity. |
| Current live receipt: |
| `reports/generated/compression_space/hf_space_verification_2026-06-11-managed-messages.json`. |
| - Latest live result: `/v1/compress` saved 27/102 estimated tokens; |
| managed `inputs[]` returned `input_count=2`, `succeeded=2`, `failed=0`, |
| managed `messages[]` returned `message_count=2` with system-role protection, |
| gzip transport returned `response_content_encoding=gzip`, and `/v1/classify` |
| returned KEEP-only DeBERTa tokenizer labels. Receipts include |
| removed-span/char totals, classifier DROP block reasons, tool-schema |
| preservation counts when `tools` or `tool_schemas` are supplied, and |
| `/health` idempotency TTL reporting. |
| Matching `Idempotency-Key` retries replay the first in-memory response; |
| payload conflicts return HTTP 409. This is per-process memory on the Space, |
| not a durable distributed store. |
| The HTTP surface accepts `Content-Encoding: gzip` request bodies and gzip |
| responses for `Accept-Encoding: gzip` or gzipped requests. If an ingress |
| strips the standard content-encoding header, also send |
| `X-Touchdown-Content-Encoding: gzip`. |
| - `/v1/classify` is tokenizer/fallback KEEP-only until a trained KEEP/DROP head |
| is mounted. `/v1/compress` is rules-first deletion-only compression with |
| safety receipts. The Space app supports both single `input` requests and |
| managed `inputs[]` batches with per-item receipts and partial-error rows. |
| `/v1/compress` now accepts `tokenizer_model`; when the tokenizer loads, |
| receipts report `token_count_exact=true`, `token_count_method=tokenizer`, and |
| the requested model. If it cannot load, receipts remain estimated and the |
| benchmark `--require-exact-tokens` gate fails. |
| - Mount `classifier_manifest.json`, tokenizer files, and optional `model.onnx`; |
| set `TOUCHDOWN_CLASSIFIER_ARTIFACT_DIR` to let the Space use artifact DROP |
| labels through ONNX Runtime or the manifest fallback. ONNX labels are |
| evaluated in chunked windows using manifest `max_length` and `stride`; mounted |
| ONNX labels expose `keep_score`, `drop_score`, and `drop_score_threshold`. |
| DROP spans still pass through protected-span and deletion-only safety gates. |
|
|
| Deploy: |
|
|
| ```bash |
| hf auth login |
| ./deploy.sh <namespace>/touchdown-compression-classifier |
| ``` |
|
|
| Free CPU Spaces are enough for this scaffold; production traffic should move to |
| paid or owned infrastructure after validation. |
|
|