Spaces:
Sleeping
Sleeping
File size: 2,002 Bytes
3d9d878 50231a8 2571402 50231a8 efddb2f 2d0ef3b 50231a8 2571402 50231a8 2d0ef3b | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | ---
title: Classifier General
emoji: 🌍
colorFrom: gray
colorTo: purple
sdk: docker
pinned: false
license: mit
short_description: classifier-general
---
# Classifier General API (Refactored)
Refactored into a modular FastAPI backend with clear layers:
- `app/routers`
- `app/services`
- `app/pipelines`
- `app/schemas`
- `app/models`
- `app/core`
## Preserved Endpoint Contract
- `POST /api/classifier` -> returns label string
- `POST /api/language` -> returns language string
- `POST /api/transformer` -> returns `{ filename, content }`
- `POST /classify` -> returns `{ label, language, type? }`
- `POST /configlabel` -> returns labels array
- `GET /labels` -> returns labels array
`POST /configlabel` exact payload:
- body accepts `{"labels":["label1","label2","label3"]}`
- all resulting labels are trimmed, empty values removed
- duplicates are kept if they are provided
- returns the stored `string[]` labels
Additional operational endpoints:
- `GET /health/liveness`
- `GET /health/readiness`
- `GET /endpoint/`
## Environment
Copy and edit:
```bash
cp .env.example .env
```
Key vars:
- `CLASSIFIER_MODEL`
- `ENABLE_MODEL_QUANTIZATION`
- `HUGGINGFACE_TOKEN`
- `CLASSIFIER_ENTAILMENT_LABEL_ID` (optional override when model config has no entailment label name)
- `DEFAULT_LABELS_CSV`
## Local Run
```bash
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 4002 --reload
```
## Docker Run
```bash
docker compose up --build
```
## Tests
```bash
pytest -q
```
## Notes
- OCR requires `tesseract-ocr` (installed in Dockerfile).
- Supported extraction formats in this refactor: `.pdf`, `.docx`, `.xlsx`, image formats, and plain text files.
- The classifier model is loaded directly from Hugging Face Hub and runs true zero-shot classification over runtime labels.
- Language detection runs locally via `langdetect` (no remote language endpoint dependency).
- `/classify` uses only the first PDF page for classification; `/api/transformer` still extracts full content.
|