classifier-general / README.md
AyoubChLin's picture
[REF] api documentation
2571402
metadata
title: Classifier General
emoji: 🌍
colorFrom: gray
colorTo: purple
sdk: docker
pinned: false
license: mit
short_description: classifier-general

Classifier General API (Refactored)

Refactored into a modular FastAPI backend with clear layers:

  • app/routers
  • app/services
  • app/pipelines
  • app/schemas
  • app/models
  • app/core

Preserved Endpoint Contract

  • POST /api/classifier -> returns label string
  • POST /api/language -> returns language string
  • POST /api/transformer -> returns { filename, content }
  • POST /classify -> returns { label, language, type? }
  • POST /configlabel -> returns labels array
  • GET /labels -> returns labels array

POST /configlabel exact payload:

  • body accepts {"labels":["label1","label2","label3"]}
  • all resulting labels are trimmed, empty values removed
  • duplicates are kept if they are provided
  • returns the stored string[] labels

Additional operational endpoints:

  • GET /health/liveness
  • GET /health/readiness
  • GET /endpoint/

Environment

Copy and edit:

cp .env.example .env

Key vars:

  • CLASSIFIER_MODEL
  • ENABLE_MODEL_QUANTIZATION
  • HUGGINGFACE_TOKEN
  • CLASSIFIER_ENTAILMENT_LABEL_ID (optional override when model config has no entailment label name)
  • DEFAULT_LABELS_CSV

Local Run

pip install -r requirements.txt
uvicorn main:app --host 0.0.0.0 --port 4002 --reload

Docker Run

docker compose up --build

Tests

pytest -q

Notes

  • OCR requires tesseract-ocr (installed in Dockerfile).
  • Supported extraction formats in this refactor: .pdf, .docx, .xlsx, image formats, and plain text files.
  • The classifier model is loaded directly from Hugging Face Hub and runs true zero-shot classification over runtime labels.
  • Language detection runs locally via langdetect (no remote language endpoint dependency).
  • /classify uses only the first PDF page for classification; /api/transformer still extracts full content.