Spaces:

wchen22
/

touchdown-compression-classifier

Sleeping

wchen22 commited on 20 days ago

Commit

2dab644

verified ·

1 Parent(s): b6d8182

Cache tokenizer in compression Space

Files changed (2) hide show

README.md CHANGED Viewed

@@ -22,6 +22,15 @@ Endpoints:
 - `POST /v1/compress`
 - `POST /v1/classify`
 Deploy:
 ```bash
@@ -29,6 +38,5 @@ hf auth login
 ./deploy.sh <namespace>/touchdown-compression-classifier
 ```
-The current repo machine must be logged into Hugging Face before this can be
-hosted. Free CPU Spaces are enough for this scaffold; production traffic should
-move to paid or owned infrastructure after validation.

 - `POST /v1/compress`
 - `POST /v1/classify`
+Live Space:
+- `https://wchen22-touchdown-compression-classifier.hf.space`
+- Verified 2026-06-11 on free `cpu-basic`: `/health`, `/v1/classify`, and
+  `/v1/compress` returned 200.
+- `/v1/classify` is tokenizer/fallback KEEP-only until a trained KEEP/DROP head
+  is mounted. `/v1/compress` is rules-first deletion-only compression with
+  safety receipts.
 Deploy:
 ```bash
 ./deploy.sh <namespace>/touchdown-compression-classifier
 ```
+Free CPU Spaces are enough for this scaffold; production traffic should move to
+paid or owned infrastructure after validation.

app.py CHANGED Viewed

@@ -2,6 +2,7 @@ from __future__ import annotations
 import re
 import time
 from typing import Any
 from fastapi import FastAPI, HTTPException
@@ -23,6 +24,13 @@ LOW_SIGNAL_PATTERNS = [
 app = FastAPI(title="Touchdown Compression Classifier", version="0.1.0")
 def _find_spans(text: str, needle: str) -> list[tuple[int, int]]:
     spans = []
     cursor = 0
@@ -239,9 +247,7 @@ def _compress_text(payload: dict[str, Any]) -> dict[str, Any]:
 def _tokens(text: str) -> list[dict[str, Any]]:
     started = time.perf_counter()
     try:
-        from transformers import AutoTokenizer
-        tokenizer = AutoTokenizer.from_pretrained(CLASSIFIER_MODEL)
         encoded = tokenizer(
             text,
             add_special_tokens=False,

 import re
 import time
+from functools import lru_cache
 from typing import Any
 from fastapi import FastAPI, HTTPException
 app = FastAPI(title="Touchdown Compression Classifier", version="0.1.0")
+@lru_cache(maxsize=1)
+def _get_tokenizer():
+    from transformers import AutoTokenizer
+    return AutoTokenizer.from_pretrained(CLASSIFIER_MODEL)
 def _find_spans(text: str, needle: str) -> list[tuple[int, int]]:
     spans = []
     cursor = 0
 def _tokens(text: str) -> list[dict[str, Any]]:
     started = time.perf_counter()
     try:
+        tokenizer = _get_tokenizer()
         encoded = tokenizer(
             text,
             add_special_tokens=False,