Spaces:
Sleeping
Sleeping
metadata
title: Classifier General
emoji: 🌍
colorFrom: gray
colorTo: purple
sdk: docker
pinned: false
license: mit
short_description: classifier-general
Classifier General API (Refactored)
Refactored into a modular FastAPI backend with clear layers:
app/routersapp/servicesapp/pipelinesapp/schemasapp/modelsapp/core
Preserved Endpoint Contract
POST /api/classifier-> returns label stringPOST /api/language-> returns language stringPOST /api/transformer-> returns{ filename, content }POST /classify-> returns{ label, language, type? }POST /configlabel-> returns labels arrayGET /labels-> returns labels array
POST /configlabel exact payload:
- body accepts
{"labels":["label1","label2","label3"]} - all resulting labels are trimmed, empty values removed
- duplicates are kept if they are provided
- returns the stored
string[]labels
Additional operational endpoints:
GET /health/livenessGET /health/readinessGET /endpoint/
Environment
Copy and edit:
cp .env.example .env
Key vars:
CLASSIFIER_MODELENABLE_MODEL_QUANTIZATIONHUGGINGFACE_TOKENCLASSIFIER_ENTAILMENT_LABEL_ID(optional override when model config has no entailment label name)DEFAULT_LABELS_CSV
Local Run
pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 4002 --reload
Docker Run
docker compose up --build
Tests
pytest -q
Notes
- OCR requires
tesseract-ocr(installed in Dockerfile). - Supported extraction formats in this refactor:
.pdf,.docx,.xlsx, image formats, and plain text files. - The classifier model is loaded directly from Hugging Face Hub and runs true zero-shot classification over runtime labels.
- Language detection runs locally via
langdetect(no remote language endpoint dependency). /classifyuses only the first PDF page for classification;/api/transformerstill extracts full content.