podscripter

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

algernon725 updated a Space about 1 month ago

podscripter-project/README

algernon725 updated a dataset about 1 month ago

podscripter-project/test-fixtures

algernon725 published a Space about 2 months ago

podscripter-project/README

View all activity

Organization Card

Community About org cards

podscripter-project

This is the HuggingFace organization for podscripter, a Dockerized local-first transcription tool built on OpenAI Whisper, pyannote.audio speaker diarization, and sentence-transformers punctuation restoration. Primary language focus: English, Spanish, French.

This org doesn't publish models — Whisper and pyannote live in their own upstream orgs. What lives here is the supporting data that the podscripter project owns and republishes under permissive licenses, primarily for testing and reproducibility.

What's published here

Datasets

podscripter-project/test-fixtures — small, curated EN/ES/FR audio clips (CC-BY 4.0) used by podscripter's Tier 1 regression tests. Audio is sourced from permissively licensed public corpora (LibriSpeech, FLEURS, MLS, AMI) and trimmed/concatenated to exercise specific pipeline code paths (single-speaker ASR, multi-speaker diarization, chunked-mode transcription). Each clip ships with verbatim transcripts, speaker turns, source attribution, and per-fixture WER/DER thresholds.

License posture

Everything published here is permissively licensed (CC-BY 4.0 or CC0 1.0). Aggregate licenses match the most restrictive component — typically CC-BY 4.0, which requires attribution and indication of changes when redistributed. Per-source attribution lives in each artifact's dataset card and (for the test-fixtures) in tests/fixtures/audio/LICENSES.md in the podscripter repo.

NC/ND-licensed sources are deliberately excluded so artifacts here can be freely redistributed.

Contributing

Issues, fixture proposals, and bug-reproduction clips all go through the podscripter GitHub repo. The contribution workflow for new audio fixtures covers trimming, licensing requirements, the .expected.json schema, and bumping HF_REVISION so the dataset and tests stay in lockstep.

podscripter

AI & ML interests

Recent Activity

podscripter-project

What's published here

Datasets

License posture

Contributing

models 0

datasets 1

podscripter-project/test-fixtures

AI & ML interests

Recent Activity

Team members 1

podscripter-project

What's published here

Datasets

License posture

Contributing

models 0

datasets 1