Spaces:

henry99a
/

WhisperJAV

Running

App Files Files Community

henry99a commited on 12 days ago

Commit

ce3f8aa

1 Parent(s): 6d53dd3

feat: WhisperJAV HuggingFace Space with ChronosJAV pipeline, CPU mode, task queue, download history

Browse files

Files changed (5) hide show

.gitignore +27 -0
README.md +52 -8
app.py +537 -0
packages.txt +2 -0
requirements.txt +8 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,27 @@

+# Generated / runtime files
+outputs/
+temp/
+uploads/
+tasks.json
+# Python
+__pycache__/
+*.pyc
+*.pyo
+.venv/
+venv/
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+# OS
+.DS_Store
+Thumbs.db
+# Gradio
+*.pid
+gradio_cached_examples/
+flagged/

README.md CHANGED Viewed

@@ -1,14 +1,58 @@
 ---
 title: WhisperJAV
-emoji: 🚀
-colorFrom: yellow
-colorTo: pink
 sdk: gradio
-sdk_version: 6.14.0
-python_version: '3.13'
 app_file: app.py
-pinned: false
-short_description: WhisperJAV
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
 title: WhisperJAV
+emoji: 🎙️
+colorFrom: blue
+colorTo: indigo
 sdk: gradio
+sdk_version: 4.44.0
 app_file: app.py
+pinned: true
+license: mit
 ---
+# WhisperJAV — Japanese Subtitle Generator
+**ChronosJAV** pipeline powered by `litagin/anime-whisper`.
+A HuggingFace Space that brings the [WhisperJAV](https://github.com/meizhong986/WhisperJAV) subtitle generator to the cloud. Optimized for the **free CPU tier**.
+## Features
+- **ChronosJAV Pipeline** — Decoupled text generation + timestamp alignment using anime-whisper
+- **Background Processing** — Tasks run in background threads; close your browser and come back later
+- **Task Monitor** — Real-time view of running, queued, and completed tasks
+- **Download History** — All previously generated subtitle files available for download
+- **CPU Only** — Designed for HuggingFace's free hardware (2 vCPU, 16 GB RAM)
+## Usage
+1. Upload a video or audio file (MP4, MKV, WAV, MP3, etc.)
+2. Click **Start Transcription**
+3. Wait for processing (30–60 min per hour of video on CPU)
+4. Download the generated `.srt` or `.vtt` subtitle file from the **Download History** tab
+## Pipeline
+The **ChronosJAV** pipeline separates text generation from timestamp alignment:
+| Stage | Component |
+|-------|-----------|
+| Audio extraction | FFmpeg (48 kHz) |
+| Scene detection | Semantic (MFCC clustering) |
+| Voice activity detection | WhisperSeg ONNX |
+| Text generation | `litagin/anime-whisper` (Whisper large-v3 fine-tune) |
+| Timestamp alignment | VAD-based (VAD_ONLY mode) |
+| Post-processing | Anime-whisper cleaner (ellipsis filtering) |
+## Limitations
+- **CPU only** — processing is 5–10× slower than GPU
+- **Japanese only** — optimized for Japanese dialogue; other languages may produce poor results
+- **First run latency** — the anime-whisper model (~3 GB) downloads on first use
+- **Free tier constraints** — 16 GB RAM, 50 GB disk; very long videos (>4 h) may fail
+## Credits
+- [WhisperJAV](https://github.com/meizhong986/WhisperJAV) by MeiZhong
+- [anime-whisper](https://huggingface.co/litagin/anime-whisper) by litagin
+- [ChronusOmni](https://arxiv.org/abs/2512.09841) — Inspiration for the decoupled pipeline architecture

app.py ADDED Viewed

	@@ -0,0 +1,537 @@

+"""
+WhisperJAV HuggingFace Space — Japanese Subtitle Generator
+==========================================================
+ChronosJAV pipeline (anime-whisper) · CPU mode · Free tier
+Background task processing · Download history · Real-time monitor
+Architecture:
+  - Gradio Blocks UI with tabs for task submission and monitoring
+  - Background threads process transcription tasks without blocking the frontend
+  - JSON file persists task state across Space restarts
+  - Model auto-downloads from HuggingFace Hub on first use (~3 GB)
+"""
+from __future__ import annotations
+import json
+import os
+import shutil
+import threading
+import time
+import traceback
+import uuid
+from datetime import datetime, timezone
+from pathlib import Path
+from typing import Any, Dict, List, Optional
+import gradio as gr
+# ═══════════════════════════════════════════════════════════════════════
+# Paths & Configuration
+# ═══════════════════════════════════════════════════════════════════════
+BASE_DIR = Path(__file__).resolve().parent
+OUTPUT_DIR = BASE_DIR / "outputs"
+TEMP_DIR = BASE_DIR / "temp"
+UPLOAD_DIR = BASE_DIR / "uploads"
+TASKS_FILE = BASE_DIR / "tasks.json"
+MAX_OUTPUT_FILES = 20  # keep most recent N task directories; prune older
+OUTPUT_DIR.mkdir(exist_ok=True)
+TEMP_DIR.mkdir(exist_ok=True)
+UPLOAD_DIR.mkdir(exist_ok=True)
+# ═══════════════════════════════════════════════════════════════════════
+# Task Store (memory + JSON-backed)
+# ═══════════════════════════════════════════════════════════════════════
+_tasks: Dict[str, dict] = {}
+_lock = threading.Lock()
+_semaphore = threading.Semaphore(1)  # single concurrent CPU task
+def _load() -> None:
+    """Load persisted tasks; mark stale 'running' ones as interrupted."""
+    global _tasks
+    if not TASKS_FILE.exists():
+        return
+    try:
+        raw = json.loads(TASKS_FILE.read_text(encoding="utf-8"))
+        for tid, t in raw.items():
+            t.setdefault("id", tid)
+            if t.get("status") == "running":
+                t["status"] = "interrupted"
+                t["error"] = "Space restarted while task was running"
+        _tasks = raw
+    except Exception:
+        _tasks = {}
+def _save() -> None:
+    """Persist lightweight view of tasks to JSON."""
+    with _lock:
+        slim: Dict[str, dict] = {}
+        for tid, t in _tasks.items():
+            slim[tid] = {
+                "id": t.get("id", tid),
+                "filename": t.get("filename", ""),
+                "status": t.get("status", "unknown"),
+                "pipeline": t.get("pipeline", ""),
+                "created_at": str(t.get("created_at", "")),
+                "completed_at": str(t.get("completed_at", "")),
+                "output_srt": t.get("output_srt", ""),
+                "output_vtt": t.get("output_vtt", ""),
+                "error": str(t.get("error", ""))[:500],
+                "duration_seconds": t.get("duration_seconds", 0),
+            }
+        TASKS_FILE.write_text(
+            json.dumps(slim, ensure_ascii=False, indent=2),
+            encoding="utf-8",
+        )
+def _prune_old_outputs() -> None:
+    """Remove task output dirs beyond MAX_OUTPUT_FILES to save disk."""
+    with _lock:
+        completed = sorted(
+            [t for t in _tasks.values() if t.get("status") == "completed"],
+            key=lambda t: str(t.get("completed_at", "")),
+            reverse=True,
+        )
+    keep_ids = {t["id"] for t in completed[:MAX_OUTPUT_FILES]}
+    for child in sorted(OUTPUT_DIR.iterdir(), key=lambda p: p.stat().st_mtime):
+        if child.is_dir() and child.name not in keep_ids:
+            try:
+                shutil.rmtree(child, ignore_errors=True)
+            except Exception:
+                pass
+# ═══════════════════════════════════════════════════════════════════════
+# Background Worker
+# ═══════════════════════════════════════════════════════════════════════
+def _run_transcription(task_id: str, video_path: str) -> None:
+    """Called in a daemon thread.  Loads whisperjav lazily so that the
+    Gradio UI can start serving immediately while models download."""
+    try:
+        with _lock:
+            _tasks[task_id]["status"] = "running"
+        _save()
+        t0 = time.time()
+        vp = Path(video_path)
+        basename = vp.stem
+        task_out = OUTPUT_DIR / task_id
+        task_tmp = TEMP_DIR / task_id
+        task_out.mkdir(parents=True, exist_ok=True)
+        task_tmp.mkdir(parents=True, exist_ok=True)
+        # ── lazy import (model download happens here on first call) ──
+        from whisperjav.pipelines.qwen_pipeline import QwenPipeline
+        pipeline = QwenPipeline(
+            generator_backend="anime-whisper",
+            device="cpu",
+            dtype="float32",
+            scene_detector="semantic",
+            speech_segmenter="whisperseg",
+            language="Japanese",
+            output_dir=str(task_out),
+            temp_dir=str(task_tmp),
+        )
+        result = pipeline.process({"path": str(vp), "basename": basename})
+        pipeline.cleanup()
+        elapsed = round(time.time() - t0, 1)
+        # ── copy final artefacts ──
+        srt_final = ""
+        vtt_final = ""
+        srt_src = result.get("srt_path", "")
+        if srt_src and os.path.isfile(srt_src):
+            dst = task_out / f"{basename}.srt"
+            shutil.copy2(srt_src, dst)
+            srt_final = str(dst)
+        # Check for sidecar VTT (some configurations emit both)
+        vtt_candidate = task_out / f"{basename}.vtt"
+        if vtt_candidate.is_file():
+            vtt_final = str(vtt_candidate)
+        # ── cleanup temp dir ──
+        try:
+            shutil.rmtree(task_tmp, ignore_errors=True)
+        except Exception:
+            pass
+        with _lock:
+            _tasks[task_id].update(
+                status="completed",
+                completed_at=datetime.now(timezone.utc).isoformat(),
+                output_srt=srt_final,
+                output_vtt=vtt_final,
+                duration_seconds=elapsed,
+            )
+        _save()
+        _prune_old_outputs()
+    except Exception as exc:
+        err_text = f"{exc}\n{traceback.format_exc()}"
+        with _lock:
+            _tasks[task_id].update(
+                status="failed",
+                completed_at=datetime.now(timezone.utc).isoformat(),
+                error=err_text[:800],
+            )
+        _save()
+    finally:
+        _semaphore.release()
+# ═══════════════════════════════════════════════════════════════════════
+# Callbacks
+# ═══════════════════════════════════════════════════════════════════════
+def submit_task(video_file) -> tuple:
+    """Kick off a new transcription task."""
+    if video_file is None:
+        return (
+            gr.update(value="Please upload a video or audio file first.", visible=True),
+            _render_monitor(),
+            _render_history(),
+            _render_download_section(),
+        )
+    if not _semaphore.acquire(blocking=False):
+        return (
+            gr.update(value="Another task is already processing.  Please wait for it to finish.", visible=True),
+            _render_monitor(),
+            _render_history(),
+            _render_download_section(),
+        )
+    tid = uuid.uuid4().hex[:12]
+    # Gradio 4.x may return str, dict, or file-like object
+    if isinstance(video_file, str):
+        src_path = video_file
+    elif isinstance(video_file, dict):
+        src_path = video_file.get("name", "")
+    else:
+        src_path = getattr(video_file, "name", "")
+    if not src_path or not os.path.isfile(src_path):
+        _semaphore.release()
+        return (
+            gr.update(value="Upload failed — could not read file path.", visible=True),
+            _render_monitor(),
+            _render_history(),
+            _render_download_section(),
+        )
+    fname = Path(src_path).name
+    # Warn if file is very large (>2 GB) — may cause OOM on free tier
+    file_size_mb = os.path.getsize(src_path) / (1024 * 1024)
+    size_warning = ""
+    if file_size_mb > 2048:
+        size_warning = (
+            f"  (Warning: file is {file_size_mb:.0f} MB.  "
+            "Files >2 GB may fail on the 16 GB free tier.)"
+        )
+    # Copy to persistent upload location so it survives Gradio tmpdir cleanup
+    persistent = UPLOAD_DIR / f"{tid}_{fname}"
+    shutil.copy2(src_path, persistent)
+    with _lock:
+        _tasks[tid] = {
+            "id": tid,
+            "filename": fname,
+            "status": "queued",
+            "pipeline": "ChronosJAV (anime-whisper)",
+            "created_at": datetime.now(timezone.utc).isoformat(),
+            "completed_at": "",
+            "output_srt": "",
+            "output_vtt": "",
+            "error": "",
+            "duration_seconds": 0,
+        }
+    _save()
+    threading.Thread(
+        target=_run_transcription,
+        args=(tid, str(persistent)),
+        daemon=True,
+    ).start()
+    return (
+        gr.update(value=f"Submitted: {fname}  (ID: `{tid}`){size_warning}", visible=True),
+        _render_monitor(),
+        _render_history(),
+        _render_download_section(),
+    )
+# ── HTML renderers ────────────────────────────────────────────────────
+_STATUS_COLORS = {
+    "queued":      "#f0ad4e",
+    "running":     "#5bc0de",
+    "completed":   "#5cb85c",
+    "failed":      "#d9534f",
+    "interrupted": "#999",
+}
+_STATUS_ICONS = {
+    "queued":      "&#9201;",  # ⏳
+    "running":     "&#128260;",  # 🔄
+    "completed":   "&#9989;",  # ✅
+    "failed":      "&#10060;",  # ❌
+    "interrupted": "&#9208;",  # ⏸
+}
+_CSS = """
+<style>
+.tr { font-family: 'SF Mono','Consolas',monospace; font-size: 12px; }
+.tr-card {
+    border: 1px solid #e0e0e0; margin: 4px 0; padding: 8px 12px;
+    border-radius: 6px; background: #fafafa;
+}
+.tr-card .head { display:flex; justify-content:space-between; align-items:flex-start; }
+.tr-card .meta { color: #666; margin-top: 3px; font-size: 11px; }
+.tr-card .dl { margin-top: 6px; }
+.dl-btn {
+    display: inline-block; margin: 2px 4px 2px 0; padding: 2px 10px;
+    background: #28a745; color: #fff; text-decoration: none;
+    border-radius: 4px; font-size: 12px;
+}
+.dl-btn:hover { background: #218838; }
+.hist-table { width: 100%; border-collapse: collapse; font-size: 12px; }
+.hist-table th { background: #2c3e50; color: #fff; padding: 8px; text-align: left; }
+.hist-table td { padding: 6px 8px; border-bottom: 1px solid #ddd; }
+.hist-table tr:hover { background: #f0f0f0; }
+</style>
+"""
+def _render_monitor() -> str:
+    """Return HTML for the real-time task monitor (all tasks, newest first)."""
+    with _lock:
+        items = list(_tasks.values())
+    if not items:
+        return _CSS + "<div style='text-align:center;padding:24px;color:#999;'>No tasks yet.  Upload a file to start.</div>"
+    items.sort(key=lambda t: str(t.get("created_at", "")), reverse=True)
+    html = _CSS + '<div class="tr">'
+    for t in items[:40]:
+        st = t.get("status", "unknown")
+        color = _STATUS_COLORS.get(st, "#999")
+        icon = _STATUS_ICONS.get(st, "?")
+        html += f"""
+        <div class="tr-card" style="border-left:4px solid {color};">
+          <div class="head">
+            <strong>{icon} {t.get('filename','?')[:55]}</strong>
+            <span style="color:{color};font-weight:700;white-space:nowrap;">{st.upper()}</span>
+          </div>
+          <div class="meta">
+            ID: {t.get('id','?')} &nbsp;|&nbsp; {t.get('pipeline','')}
+            &nbsp;|&nbsp; {str(t.get('created_at',''))[:19]}
+          </div>"""
+        if st == "completed":
+            html += f'<div class="meta" style="color:#28a745;">Completed in {t.get("duration_seconds",0)}s</div>'
+        if st in ("failed", "interrupted"):
+            err = str(t.get("error", ""))[:250].replace("<", "&lt;").replace(">", "&gt;")
+            html += f'<div class="meta" style="color:#d9534f;">{err}</div>'
+        html += "</div>"
+    html += "</div>"
+    return html
+def _render_history() -> str:
+    """Return an HTML table of completed tasks with inline download links."""
+    with _lock:
+        completed = [t for t in _tasks.values() if t.get("status") == "completed"]
+    if not completed:
+        return _CSS + "<div style='text-align:center;padding:24px;color:#999;'>No completed tasks yet.</div>"
+    completed.sort(key=lambda t: str(t.get("completed_at", "")), reverse=True)
+    html = _CSS + '<table class="hist-table"><thead><tr>'
+    html += "<th>File</th><th>Duration</th><th>Completed</th><th>Download</th>"
+    html += "</tr></thead><tbody>"
+    for t in completed[:MAX_OUTPUT_FILES]:
+        ca = str(t.get("completed_at", ""))[:19]
+        html += f"<tr><td>{t.get('filename','')[:55]}</td><td>{t.get('duration_seconds',0)}s</td><td>{ca}</td><td>"
+        srt = t.get("output_srt", "")
+        vtt = t.get("output_vtt", "")
+        if srt and os.path.isfile(srt):
+            rel = os.path.relpath(srt, BASE_DIR).replace("\\", "/")
+            html += f'<a class="dl-btn" href="/file={rel}" download="{Path(srt).name}">SRT</a> '
+        if vtt and os.path.isfile(vtt):
+            rel = os.path.relpath(vtt, BASE_DIR).replace("\\", "/")
+            html += f'<a class="dl-btn" href="/file={rel}" download="{Path(vtt).name}">VTT</a> '
+        html += "</td></tr>"
+    html += "</tbody></table>"
+    return html
+def _render_download_section() -> str:
+    """Quick-download panel for the single most recent completed task."""
+    with _lock:
+        completed = sorted(
+            [t for t in _tasks.values() if t.get("status") == "completed"],
+            key=lambda t: str(t.get("completed_at", "")),
+            reverse=True,
+        )
+    if not completed:
+        return _CSS + "<div style='text-align:center;padding:12px;color:#999;'>No downloads available yet.</div>"
+    latest = completed[0]
+    html = _CSS + '<div style="padding: 8px;">'
+    html += f"<strong>Latest:</strong> {latest['filename']}<br>"
+    srt = latest.get("output_srt", "")
+    vtt = latest.get("output_vtt", "")
+    if srt and os.path.isfile(srt):
+        rel = os.path.relpath(srt, BASE_DIR).replace("\\", "/")
+        html += f'<a class="dl-btn" href="/file={rel}" download="{Path(srt).name}">Download SRT</a> '
+    if vtt and os.path.isfile(vtt):
+        rel = os.path.relpath(vtt, BASE_DIR).replace("\\", "/")
+        html += f'<a class="dl-btn" href="/file={rel}" download="{Path(vtt).name}">Download VTT</a> '
+    html += "</div>"
+    return html
+def _auto_refresh() -> tuple:
+    """Called by Gradio's periodic timer to update all panels."""
+    return _render_monitor(), _render_history(), _render_download_section()
+# ═══════════════════════════════════════════════════════════════════════
+# Gradio UI
+# ═══════════════════════════════════════════════════════════════════════
+_FOOTER = """
+<div style="position:fixed;bottom:0;left:0;right:0;padding:6px;
+            background:#f8f8f8;text-align:center;font-size:11px;color:#888;
+            border-top:1px solid #e0e0e0;">
+    WhisperJAV &copy; <a href="https://github.com/meizhong986/WhisperJAV" target="_blank">meizhong986</a>
+    &nbsp;|&nbsp; ChronosJAV pipeline (anime-whisper) &nbsp;|&nbsp;
+    CPU-only &nbsp;|&nbsp; Free HuggingFace Space
+</div>
+"""
+def build_ui() -> gr.Blocks:
+    with gr.Blocks(
+        title="WhisperJAV — Japanese Subtitle Generator",
+        theme=gr.themes.Soft(),
+        css="""
+        footer { visibility: hidden }
+        .app-footer { position: fixed; bottom: 0; left: 0; right: 0; z-index: 100; }
+        """,
+    ) as demo:
+        # ── Header ──
+        gr.Markdown("""
+        # WhisperJAV — Japanese Subtitle Generator
+        **ChronosJAV** pipeline with `litagin/anime-whisper` — a Whisper large-v3
+        fine-tuned on anime and JAV dialogue.  Runs entirely on **CPU** (free tier).
+        First request downloads the model (~3 GB) — please be patient.
+        ⏱️  Processing speed: roughly **30-60 min** per hour of video on CPU.
+        """)
+        with gr.Tabs():
+            # ── Tab 1: New Task ──────────────────────────────────────
+            with gr.TabItem("New Transcription"):
+                with gr.Row():
+                    with gr.Column(scale=2):
+                        upload = gr.File(
+                            label="Upload Video or Audio",
+                            file_types=["video", "audio"],
+                            file_count="single",
+                        )
+                        gr.Markdown(
+                            "**Supported**: MP4, MKV, AVI, MOV, WMV, FLV, WAV, MP3, FLAC, M4A\n\n"
+                            "**Pipeline**: ChronosJAV — Text generation + timestamp alignment.\n"
+                            "The anime-whisper model is tuned specifically for Japanese dialogue."
+                        )
+                        submit_btn = gr.Button(
+                            "Start Transcription",
+                            variant="primary",
+                            size="lg",
+                        )
+                    with gr.Column(scale=1):
+                        status = gr.Textbox(
+                            label="Status",
+                            value="Ready.  Upload a file to begin.",
+                            interactive=False,
+                            lines=3,
+                        )
+                        download_panel = gr.HTML(
+                            value=_CSS + "<div style='text-align:center;padding:12px;color:#999;'>No downloads yet.</div>",
+                        )
+                gr.Markdown("---")
+                gr.Markdown("### Task Monitor  (auto-refreshes every 8 s)")
+                monitor_html = gr.HTML(value=_render_monitor())
+            # ── Tab 2: Download History ──────────────────────────────
+            with gr.TabItem("Download History"):
+                gr.Markdown("Completed tasks with download links.  Click a button to download the subtitle file.")
+                refresh_hist_btn = gr.Button("Refresh", size="sm")
+                history_html = gr.HTML(value=_render_history())
+        # ── Footer ──
+        gr.HTML(_FOOTER, elem_classes=["app-footer"])
+        # ── Events ──
+        submit_btn.click(
+            fn=submit_task,
+            inputs=[upload],
+            outputs=[status, monitor_html, history_html, download_panel],
+        )
+        refresh_hist_btn.click(
+            fn=lambda: (_render_history(),),
+            inputs=[],
+            outputs=[history_html],
+        )
+        # Periodic auto-refresh of monitor + history + download panel
+        demo.load(
+            fn=_auto_refresh,
+            inputs=[],
+            outputs=[monitor_html, history_html, download_panel],
+            every=8,
+        )
+    return demo
+# ═══════════════════════════════════════════════════════════════════════
+# Entry Point
+# ═══════════════════════════════════════════════════════════════════════
+if __name__ == "__main__":
+    _load()
+    _prune_old_outputs()
+    app = build_ui()
+    app.queue(
+        max_size=10,
+        default_concurrency_limit=5,  # allow multiple UI interactions
+    ).launch(
+        server_name="0.0.0.0",
+        server_port=7860,
+        show_error=True,
+    )

packages.txt ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ ffmpeg
2	+ libsndfile1

requirements.txt ADDED Viewed

	@@ -0,0 +1,8 @@

+--index-url https://download.pytorch.org/whl/cpu
+torch>=2.0.0
+torchaudio>=2.0.0
+--index-url https://pypi.org/simple/
+whisperjav[cli,huggingface] @ git+https://github.com/meizhong986/whisperjav.git
+onnxruntime>=1.16.0
+gradio>=4.0.0