classifier-general / README.md
AyoubChLin's picture
[REF] api documentation
2571402
---
title: Classifier General
emoji: ๐ŸŒ
colorFrom: gray
colorTo: purple
sdk: docker
pinned: false
license: mit
short_description: classifier-general
---
# Classifier General API (Refactored)
Refactored into a modular FastAPI backend with clear layers:
- `app/routers`
- `app/services`
- `app/pipelines`
- `app/schemas`
- `app/models`
- `app/core`
## Preserved Endpoint Contract
- `POST /api/classifier` -> returns label string
- `POST /api/language` -> returns language string
- `POST /api/transformer` -> returns `{ filename, content }`
- `POST /classify` -> returns `{ label, language, type? }`
- `POST /configlabel` -> returns labels array
- `GET /labels` -> returns labels array
`POST /configlabel` exact payload:
- body accepts `{"labels":["label1","label2","label3"]}`
- all resulting labels are trimmed, empty values removed
- duplicates are kept if they are provided
- returns the stored `string[]` labels
Additional operational endpoints:
- `GET /health/liveness`
- `GET /health/readiness`
- `GET /endpoint/`
## Environment
Copy and edit:
```bash
cp .env.example .env
```
Key vars:
- `CLASSIFIER_MODEL`
- `ENABLE_MODEL_QUANTIZATION`
- `HUGGINGFACE_TOKEN`
- `CLASSIFIER_ENTAILMENT_LABEL_ID` (optional override when model config has no entailment label name)
- `DEFAULT_LABELS_CSV`
## Local Run
```bash
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 4002 --reload
```
## Docker Run
```bash
docker compose up --build
```
## Tests
```bash
pytest -q
```
## Notes
- OCR requires `tesseract-ocr` (installed in Dockerfile).
- Supported extraction formats in this refactor: `.pdf`, `.docx`, `.xlsx`, image formats, and plain text files.
- The classifier model is loaded directly from Hugging Face Hub and runs true zero-shot classification over runtime labels.
- Language detection runs locally via `langdetect` (no remote language endpoint dependency).
- `/classify` uses only the first PDF page for classification; `/api/transformer` still extracts full content.