--- title: README emoji: 📚 colorFrom: blue colorTo: indigo sdk: static pinned: false short_description: Organization card for podscripter-project --- # podscripter-project This is the HuggingFace organization for [**podscripter**](https://github.com/algernon725/podscripter), a Dockerized local-first transcription tool built on OpenAI Whisper, pyannote.audio speaker diarization, and sentence-transformers punctuation restoration. Primary language focus: **English**, **Spanish**, **French**. This org doesn't publish models — Whisper and pyannote live in their own upstream orgs. What lives here is the **supporting data** that the podscripter project owns and republishes under permissive licenses, primarily for testing and reproducibility. ## What's published here ### Datasets - [`podscripter-project/test-fixtures`](https://huggingface.co/datasets/podscripter-project/test-fixtures) — small, curated EN/ES/FR audio clips (CC-BY 4.0) used by podscripter's Tier 1 regression tests. Audio is sourced from permissively licensed public corpora (LibriSpeech, FLEURS, Common Voice, VoxPopuli, AMI, MLS) and trimmed/concatenated to exercise specific pipeline code paths (single-speaker ASR, multi-speaker diarization, chunked-mode transcription). Each clip ships with verbatim transcripts, speaker turns, source attribution, and per-fixture WER/DER thresholds. ## License posture Everything published here is **permissively licensed** (CC-BY 4.0 or CC0 1.0). Aggregate licenses match the most restrictive component — typically CC-BY 4.0, which requires attribution and indication of changes when redistributed. Per-source attribution lives in each artifact's dataset card and (for the test-fixtures) in [`tests/fixtures/audio/LICENSES.md`](https://github.com/algernon725/podscripter/blob/main/tests/fixtures/audio/LICENSES.md) in the podscripter repo. NC/ND-licensed sources are deliberately excluded so artifacts here can be freely redistributed. ## Contributing Issues, fixture proposals, and bug-reproduction clips all go through the [podscripter GitHub repo](https://github.com/algernon725/podscripter). The [contribution workflow for new audio fixtures](https://github.com/algernon725/podscripter/blob/main/tests/fixtures/audio/README.md#adding-a-new-fixture) covers trimming, licensing requirements, the `.expected.json` schema, and bumping `HF_REVISION` so the dataset and tests stay in lockstep.