Spaces:

wchen22
/

touchdown-compression-classifier

Sleeping

App Files Files Community

touchdown-compression-classifier / README.md

wchen22

Add exact tokenizer accounting to compression API

1212e7d verified 16 days ago

preview code

Raw

History Blame Contribute Delete

4.63 kB

	---
	title: Touchdown Compression Classifier
	emoji: 🚀
	colorFrom: blue
	colorTo: green
	sdk: docker
	app_port: 7860
	---

	# Touchdown Compression Classifier

	Free CPU Hugging Face Space scaffold for the managed prompt compression API.

	Phase 1 serves deterministic deletion-only compression with receipts. The
	planned classifier backbone is `microsoft/deberta-v3-small`; the API reports
	classifier status honestly until a trained KEEP/DROP head or ONNX export is
	mounted.

	Endpoints:

	- `GET /health`
	- `POST /v1/compress`
	- `POST /v1/classify`

	Live Space:

	- `https://wchen22-touchdown-compression-classifier.hf.space`
	- Verified 2026-06-11 with HF CLI: runtime stage `RUNNING`, hardware
	`cpu-basic`, domain `READY`, repo/runtime SHA
	`0dfe65a6c82c9e7fa37d2c4a32c8eda3ed4e96d7`.
	- The deployed scaffold supports chunked ONNX artifact inference for long
	prompts. Use `hf spaces info wchen22/touchdown-compression-classifier --format
	json` for the current repo/runtime SHA.
	- Live smoke:
	`python3 scripts/smoke_compression_api.py --base-url https://wchen22-touchdown-compression-classifier.hf.space --include-classify --include-batch --include-messages --include-gzip`
	validates `/health`, `/v1/classify`, single `/v1/compress`, and managed
	`inputs[]` batch, managed `messages[]`, plus gzipped JSON request/response
	transport.
	- Real-corpus API benchmark:
	`python3 scripts/benchmark_compression_api.py --base-url https://wchen22-touchdown-compression-classifier.hf.space --input-jsonl benchmarks/prompts/real/kv_stress_seed.jsonl --limit 4 --tokenizer-model Qwen/Qwen2.5-7B-Instruct --require-exact-tokens`.
	This calls hosted `/v1/compress` over real prompt rows and fails the run if
	receipts return estimated token counts. Use this before claiming real-token
	savings.
	- Full deployment receipt:
	`python3 scripts/verify_compression_space.py --expected-sha <sha> --out reports/generated/compression_space/hf_space_verification.json`
	validates HF runtime metadata, repo/runtime SHA agreement, API smoke, and
	remote/local Space file parity.
	- Fresh local receipts are written under
	`reports/generated/compression_space/`; run the full verifier with the
	current Space SHA to check runtime, API smoke, and remote/local file parity.
	Current live receipt:
	`reports/generated/compression_space/hf_space_verification_2026-06-11-managed-messages.json`.
	- Latest live result: `/v1/compress` saved 27/102 estimated tokens;
	managed `inputs[]` returned `input_count=2`, `succeeded=2`, `failed=0`,
	managed `messages[]` returned `message_count=2` with system-role protection,
	gzip transport returned `response_content_encoding=gzip`, and `/v1/classify`
	returned KEEP-only DeBERTa tokenizer labels. Receipts include
	removed-span/char totals, classifier DROP block reasons, tool-schema
	preservation counts when `tools` or `tool_schemas` are supplied, and
	`/health` idempotency TTL reporting.
	Matching `Idempotency-Key` retries replay the first in-memory response;
	payload conflicts return HTTP 409. This is per-process memory on the Space,
	not a durable distributed store.
	The HTTP surface accepts `Content-Encoding: gzip` request bodies and gzip
	responses for `Accept-Encoding: gzip` or gzipped requests. If an ingress
	strips the standard content-encoding header, also send
	`X-Touchdown-Content-Encoding: gzip`.
	- `/v1/classify` is tokenizer/fallback KEEP-only until a trained KEEP/DROP head
	is mounted. `/v1/compress` is rules-first deletion-only compression with
	safety receipts. The Space app supports both single `input` requests and
	managed `inputs[]` batches with per-item receipts and partial-error rows.
	`/v1/compress` now accepts `tokenizer_model`; when the tokenizer loads,
	receipts report `token_count_exact=true`, `token_count_method=tokenizer`, and
	the requested model. If it cannot load, receipts remain estimated and the
	benchmark `--require-exact-tokens` gate fails.
	- Mount `classifier_manifest.json`, tokenizer files, and optional `model.onnx`;
	set `TOUCHDOWN_CLASSIFIER_ARTIFACT_DIR` to let the Space use artifact DROP
	labels through ONNX Runtime or the manifest fallback. ONNX labels are
	evaluated in chunked windows using manifest `max_length` and `stride`; mounted
	ONNX labels expose `keep_score`, `drop_score`, and `drop_score_threshold`.
	DROP spans still pass through protected-span and deletion-only safety gates.

	Deploy:

	```bash
	hf auth login
	./deploy.sh <namespace>/touchdown-compression-classifier
	```

	Free CPU Spaces are enough for this scaffold; production traffic should move to
	paid or owned infrastructure after validation.