yin
docs: add project README, CONTRIBUTING guide, and per-package READMEs
b8ca6f2

A newer version of the Gradio SDK is available: 6.14.0

Upgrade

pdfsys-core

Shared data contracts for the pdfsys pipeline. Every other package depends on this one.

What's in here

  • Enums: RegionType (TEXT / IMAGE / TABLE / FORMULA), Backend (MUPDF / PIPELINE / VLM / DEFERRED).
  • PdfRecord: Frozen dataclass for per-PDF metadata (sha256, source_uri, size, provenance).
  • Layout schema: BBox (normalized [0,1]), LayoutRegion, LayoutPage, LayoutDocument — the contract between layout-analyser and every parser backend.
  • ExtractedDoc / Segment: Backend-agnostic output schema. All three parser backends emit these.
  • LayoutCache: Content-addressable on-disk cache for LayoutDocuments, keyed by sha256 + model_tag.
  • PdfsysConfig: Hierarchical configuration (paths, router, layout, per-backend settings, runtime).
  • Serde: Generic to_dict() / from_dict() for all the above dataclasses.

Key design rule

This package has zero external dependencies — stdlib only. Do not add pymupdf, torch, or anything else here. The types must be importable everywhere without pulling in heavy ML libraries.