Spaces:
Running
Running
| title: README | |
| emoji: π | |
| colorFrom: blue | |
| colorTo: indigo | |
| sdk: static | |
| pinned: false | |
| short_description: Organization card for podscripter-project | |
| # podscripter-project | |
| This is the HuggingFace organization for [**podscripter**](https://github.com/algernon725/podscripter), | |
| a Dockerized local-first transcription tool built on OpenAI Whisper, pyannote.audio | |
| speaker diarization, and sentence-transformers punctuation restoration. Primary language | |
| focus: **English**, **Spanish**, **French**. | |
| This org doesn't publish models β Whisper and pyannote live in their own upstream orgs. | |
| What lives here is the **supporting data** that the podscripter project owns and republishes | |
| under permissive licenses, primarily for testing and reproducibility. | |
| ## What's published here | |
| ### Datasets | |
| - [`podscripter-project/test-fixtures`](https://huggingface.co/datasets/podscripter-project/test-fixtures) | |
| β small, curated EN/ES/FR audio clips (CC-BY 4.0) used by podscripter's Tier 1 regression | |
| tests. Audio is sourced from permissively licensed public corpora (LibriSpeech, FLEURS, MLS, AMI) | |
| and trimmed/concatenated to exercise specific pipeline code paths (single-speaker ASR, | |
| multi-speaker diarization, chunked-mode transcription). Each clip ships with verbatim transcripts, speaker turns, source | |
| attribution, and per-fixture WER/DER thresholds. | |
| ## License posture | |
| Everything published here is **permissively licensed** (CC-BY 4.0 or CC0 1.0). Aggregate | |
| licenses match the most restrictive component β typically CC-BY 4.0, which requires | |
| attribution and indication of changes when redistributed. Per-source attribution lives in | |
| each artifact's dataset card and (for the test-fixtures) in | |
| [`tests/fixtures/audio/LICENSES.md`](https://github.com/algernon725/podscripter/blob/main/tests/fixtures/audio/LICENSES.md) | |
| in the podscripter repo. | |
| NC/ND-licensed sources are deliberately excluded so artifacts here can be freely | |
| redistributed. | |
| ## Contributing | |
| Issues, fixture proposals, and bug-reproduction clips all go through the | |
| [podscripter GitHub repo](https://github.com/algernon725/podscripter). The | |
| [contribution workflow for new audio fixtures](https://github.com/algernon725/podscripter/blob/main/tests/fixtures/audio/README.md#adding-a-new-fixture) | |
| covers trimming, licensing requirements, the `.expected.json` schema, and bumping | |
| `HF_REVISION` so the dataset and tests stay in lockstep. |