metadata
title: Touchdown Compression Classifier
emoji: 🚀
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
Touchdown Compression Classifier
Free CPU Hugging Face Space scaffold for the managed prompt compression API.
Phase 1 serves deterministic deletion-only compression with receipts. The
planned classifier backbone is microsoft/deberta-v3-small; the API reports
classifier status honestly until a trained KEEP/DROP head or ONNX export is
mounted.
Endpoints:
GET /healthPOST /v1/compressPOST /v1/classify
Live Space:
https://wchen22-touchdown-compression-classifier.hf.space- Verified 2026-06-11 with HF CLI: runtime stage
RUNNING, hardwarecpu-basic, domainREADY, repo/runtime SHA0dfe65a6c82c9e7fa37d2c4a32c8eda3ed4e96d7. - The deployed scaffold supports chunked ONNX artifact inference for long
prompts. Use
hf spaces info wchen22/touchdown-compression-classifier --format jsonfor the current repo/runtime SHA. - Live smoke:
python3 scripts/smoke_compression_api.py --base-url https://wchen22-touchdown-compression-classifier.hf.space --include-classify --include-batch --include-messages --include-gzipvalidates/health,/v1/classify, single/v1/compress, and managedinputs[]batch, managedmessages[], plus gzipped JSON request/response transport. - Real-corpus API benchmark:
python3 scripts/benchmark_compression_api.py --base-url https://wchen22-touchdown-compression-classifier.hf.space --input-jsonl benchmarks/prompts/real/kv_stress_seed.jsonl --limit 4 --tokenizer-model Qwen/Qwen2.5-7B-Instruct --require-exact-tokens. This calls hosted/v1/compressover real prompt rows and fails the run if receipts return estimated token counts. Use this before claiming real-token savings. - Full deployment receipt:
python3 scripts/verify_compression_space.py --expected-sha <sha> --out reports/generated/compression_space/hf_space_verification.jsonvalidates HF runtime metadata, repo/runtime SHA agreement, API smoke, and remote/local Space file parity. - Fresh local receipts are written under
reports/generated/compression_space/; run the full verifier with the current Space SHA to check runtime, API smoke, and remote/local file parity. Current live receipt:reports/generated/compression_space/hf_space_verification_2026-06-11-managed-messages.json. - Latest live result:
/v1/compresssaved 27/102 estimated tokens; managedinputs[]returnedinput_count=2,succeeded=2,failed=0, managedmessages[]returnedmessage_count=2with system-role protection, gzip transport returnedresponse_content_encoding=gzip, and/v1/classifyreturned KEEP-only DeBERTa tokenizer labels. Receipts include removed-span/char totals, classifier DROP block reasons, tool-schema preservation counts whentoolsortool_schemasare supplied, and/healthidempotency TTL reporting. MatchingIdempotency-Keyretries replay the first in-memory response; payload conflicts return HTTP 409. This is per-process memory on the Space, not a durable distributed store. The HTTP surface acceptsContent-Encoding: gziprequest bodies and gzip responses forAccept-Encoding: gzipor gzipped requests. If an ingress strips the standard content-encoding header, also sendX-Touchdown-Content-Encoding: gzip. /v1/classifyis tokenizer/fallback KEEP-only until a trained KEEP/DROP head is mounted./v1/compressis rules-first deletion-only compression with safety receipts. The Space app supports both singleinputrequests and managedinputs[]batches with per-item receipts and partial-error rows./v1/compressnow acceptstokenizer_model; when the tokenizer loads, receipts reporttoken_count_exact=true,token_count_method=tokenizer, and the requested model. If it cannot load, receipts remain estimated and the benchmark--require-exact-tokensgate fails.- Mount
classifier_manifest.json, tokenizer files, and optionalmodel.onnx; setTOUCHDOWN_CLASSIFIER_ARTIFACT_DIRto let the Space use artifact DROP labels through ONNX Runtime or the manifest fallback. ONNX labels are evaluated in chunked windows using manifestmax_lengthandstride; mounted ONNX labels exposekeep_score,drop_score, anddrop_score_threshold. DROP spans still pass through protected-span and deletion-only safety gates.
Deploy:
hf auth login
./deploy.sh <namespace>/touchdown-compression-classifier
Free CPU Spaces are enough for this scaffold; production traffic should move to paid or owned infrastructure after validation.