henry99a commited on
Commit
ce3f8aa
Β·
1 Parent(s): 6d53dd3

feat: WhisperJAV HuggingFace Space with ChronosJAV pipeline, CPU mode, task queue, download history

Browse files
Files changed (5) hide show
  1. .gitignore +27 -0
  2. README.md +52 -8
  3. app.py +537 -0
  4. packages.txt +2 -0
  5. requirements.txt +8 -0
.gitignore ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Generated / runtime files
2
+ outputs/
3
+ temp/
4
+ uploads/
5
+ tasks.json
6
+
7
+ # Python
8
+ __pycache__/
9
+ *.pyc
10
+ *.pyo
11
+ .venv/
12
+ venv/
13
+
14
+ # IDE
15
+ .idea/
16
+ .vscode/
17
+ *.swp
18
+ *.swo
19
+
20
+ # OS
21
+ .DS_Store
22
+ Thumbs.db
23
+
24
+ # Gradio
25
+ *.pid
26
+ gradio_cached_examples/
27
+ flagged/
README.md CHANGED
@@ -1,14 +1,58 @@
1
  ---
2
  title: WhisperJAV
3
- emoji: πŸš€
4
- colorFrom: yellow
5
- colorTo: pink
6
  sdk: gradio
7
- sdk_version: 6.14.0
8
- python_version: '3.13'
9
  app_file: app.py
10
- pinned: false
11
- short_description: WhisperJAV
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  title: WhisperJAV
3
+ emoji: πŸŽ™οΈ
4
+ colorFrom: blue
5
+ colorTo: indigo
6
  sdk: gradio
7
+ sdk_version: 4.44.0
 
8
  app_file: app.py
9
+ pinned: true
10
+ license: mit
11
  ---
12
 
13
+ # WhisperJAV β€” Japanese Subtitle Generator
14
+
15
+ **ChronosJAV** pipeline powered by `litagin/anime-whisper`.
16
+
17
+ A HuggingFace Space that brings the [WhisperJAV](https://github.com/meizhong986/WhisperJAV) subtitle generator to the cloud. Optimized for the **free CPU tier**.
18
+
19
+ ## Features
20
+
21
+ - **ChronosJAV Pipeline** β€” Decoupled text generation + timestamp alignment using anime-whisper
22
+ - **Background Processing** β€” Tasks run in background threads; close your browser and come back later
23
+ - **Task Monitor** β€” Real-time view of running, queued, and completed tasks
24
+ - **Download History** β€” All previously generated subtitle files available for download
25
+ - **CPU Only** β€” Designed for HuggingFace's free hardware (2 vCPU, 16 GB RAM)
26
+
27
+ ## Usage
28
+
29
+ 1. Upload a video or audio file (MP4, MKV, WAV, MP3, etc.)
30
+ 2. Click **Start Transcription**
31
+ 3. Wait for processing (30–60 min per hour of video on CPU)
32
+ 4. Download the generated `.srt` or `.vtt` subtitle file from the **Download History** tab
33
+
34
+ ## Pipeline
35
+
36
+ The **ChronosJAV** pipeline separates text generation from timestamp alignment:
37
+
38
+ | Stage | Component |
39
+ |-------|-----------|
40
+ | Audio extraction | FFmpeg (48 kHz) |
41
+ | Scene detection | Semantic (MFCC clustering) |
42
+ | Voice activity detection | WhisperSeg ONNX |
43
+ | Text generation | `litagin/anime-whisper` (Whisper large-v3 fine-tune) |
44
+ | Timestamp alignment | VAD-based (VAD_ONLY mode) |
45
+ | Post-processing | Anime-whisper cleaner (ellipsis filtering) |
46
+
47
+ ## Limitations
48
+
49
+ - **CPU only** β€” processing is 5–10Γ— slower than GPU
50
+ - **Japanese only** β€” optimized for Japanese dialogue; other languages may produce poor results
51
+ - **First run latency** β€” the anime-whisper model (~3 GB) downloads on first use
52
+ - **Free tier constraints** β€” 16 GB RAM, 50 GB disk; very long videos (>4 h) may fail
53
+
54
+ ## Credits
55
+
56
+ - [WhisperJAV](https://github.com/meizhong986/WhisperJAV) by MeiZhong
57
+ - [anime-whisper](https://huggingface.co/litagin/anime-whisper) by litagin
58
+ - [ChronusOmni](https://arxiv.org/abs/2512.09841) β€” Inspiration for the decoupled pipeline architecture
app.py ADDED
@@ -0,0 +1,537 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """
2
+ WhisperJAV HuggingFace Space β€” Japanese Subtitle Generator
3
+ ==========================================================
4
+ ChronosJAV pipeline (anime-whisper) Β· CPU mode Β· Free tier
5
+ Background task processing Β· Download history Β· Real-time monitor
6
+
7
+ Architecture:
8
+ - Gradio Blocks UI with tabs for task submission and monitoring
9
+ - Background threads process transcription tasks without blocking the frontend
10
+ - JSON file persists task state across Space restarts
11
+ - Model auto-downloads from HuggingFace Hub on first use (~3 GB)
12
+ """
13
+
14
+ from __future__ import annotations
15
+
16
+ import json
17
+ import os
18
+ import shutil
19
+ import threading
20
+ import time
21
+ import traceback
22
+ import uuid
23
+ from datetime import datetime, timezone
24
+ from pathlib import Path
25
+ from typing import Any, Dict, List, Optional
26
+
27
+ import gradio as gr
28
+
29
+ # ═══════════════════════════════════════════════════════════════════════
30
+ # Paths & Configuration
31
+ # ═══════════════════════════════════════════════════════════════════════
32
+
33
+ BASE_DIR = Path(__file__).resolve().parent
34
+ OUTPUT_DIR = BASE_DIR / "outputs"
35
+ TEMP_DIR = BASE_DIR / "temp"
36
+ UPLOAD_DIR = BASE_DIR / "uploads"
37
+ TASKS_FILE = BASE_DIR / "tasks.json"
38
+ MAX_OUTPUT_FILES = 20 # keep most recent N task directories; prune older
39
+
40
+ OUTPUT_DIR.mkdir(exist_ok=True)
41
+ TEMP_DIR.mkdir(exist_ok=True)
42
+ UPLOAD_DIR.mkdir(exist_ok=True)
43
+
44
+ # ═══════════════════════════════════════════════════════════════════════
45
+ # Task Store (memory + JSON-backed)
46
+ # ═══════════════════════════════════════════════════════════════════════
47
+
48
+ _tasks: Dict[str, dict] = {}
49
+ _lock = threading.Lock()
50
+ _semaphore = threading.Semaphore(1) # single concurrent CPU task
51
+
52
+
53
+ def _load() -> None:
54
+ """Load persisted tasks; mark stale 'running' ones as interrupted."""
55
+ global _tasks
56
+ if not TASKS_FILE.exists():
57
+ return
58
+ try:
59
+ raw = json.loads(TASKS_FILE.read_text(encoding="utf-8"))
60
+ for tid, t in raw.items():
61
+ t.setdefault("id", tid)
62
+ if t.get("status") == "running":
63
+ t["status"] = "interrupted"
64
+ t["error"] = "Space restarted while task was running"
65
+ _tasks = raw
66
+ except Exception:
67
+ _tasks = {}
68
+
69
+
70
+ def _save() -> None:
71
+ """Persist lightweight view of tasks to JSON."""
72
+ with _lock:
73
+ slim: Dict[str, dict] = {}
74
+ for tid, t in _tasks.items():
75
+ slim[tid] = {
76
+ "id": t.get("id", tid),
77
+ "filename": t.get("filename", ""),
78
+ "status": t.get("status", "unknown"),
79
+ "pipeline": t.get("pipeline", ""),
80
+ "created_at": str(t.get("created_at", "")),
81
+ "completed_at": str(t.get("completed_at", "")),
82
+ "output_srt": t.get("output_srt", ""),
83
+ "output_vtt": t.get("output_vtt", ""),
84
+ "error": str(t.get("error", ""))[:500],
85
+ "duration_seconds": t.get("duration_seconds", 0),
86
+ }
87
+ TASKS_FILE.write_text(
88
+ json.dumps(slim, ensure_ascii=False, indent=2),
89
+ encoding="utf-8",
90
+ )
91
+
92
+
93
+ def _prune_old_outputs() -> None:
94
+ """Remove task output dirs beyond MAX_OUTPUT_FILES to save disk."""
95
+ with _lock:
96
+ completed = sorted(
97
+ [t for t in _tasks.values() if t.get("status") == "completed"],
98
+ key=lambda t: str(t.get("completed_at", "")),
99
+ reverse=True,
100
+ )
101
+ keep_ids = {t["id"] for t in completed[:MAX_OUTPUT_FILES]}
102
+ for child in sorted(OUTPUT_DIR.iterdir(), key=lambda p: p.stat().st_mtime):
103
+ if child.is_dir() and child.name not in keep_ids:
104
+ try:
105
+ shutil.rmtree(child, ignore_errors=True)
106
+ except Exception:
107
+ pass
108
+
109
+
110
+ # ═══════════════════════════════════════════════════════════════════════
111
+ # Background Worker
112
+ # ═══════════════════════════════════════════════════════════════════════
113
+
114
+ def _run_transcription(task_id: str, video_path: str) -> None:
115
+ """Called in a daemon thread. Loads whisperjav lazily so that the
116
+ Gradio UI can start serving immediately while models download."""
117
+ try:
118
+ with _lock:
119
+ _tasks[task_id]["status"] = "running"
120
+ _save()
121
+
122
+ t0 = time.time()
123
+ vp = Path(video_path)
124
+ basename = vp.stem
125
+
126
+ task_out = OUTPUT_DIR / task_id
127
+ task_tmp = TEMP_DIR / task_id
128
+ task_out.mkdir(parents=True, exist_ok=True)
129
+ task_tmp.mkdir(parents=True, exist_ok=True)
130
+
131
+ # ── lazy import (model download happens here on first call) ──
132
+ from whisperjav.pipelines.qwen_pipeline import QwenPipeline
133
+
134
+ pipeline = QwenPipeline(
135
+ generator_backend="anime-whisper",
136
+ device="cpu",
137
+ dtype="float32",
138
+ scene_detector="semantic",
139
+ speech_segmenter="whisperseg",
140
+ language="Japanese",
141
+ output_dir=str(task_out),
142
+ temp_dir=str(task_tmp),
143
+ )
144
+
145
+ result = pipeline.process({"path": str(vp), "basename": basename})
146
+ pipeline.cleanup()
147
+
148
+ elapsed = round(time.time() - t0, 1)
149
+
150
+ # ── copy final artefacts ──
151
+ srt_final = ""
152
+ vtt_final = ""
153
+ srt_src = result.get("srt_path", "")
154
+ if srt_src and os.path.isfile(srt_src):
155
+ dst = task_out / f"{basename}.srt"
156
+ shutil.copy2(srt_src, dst)
157
+ srt_final = str(dst)
158
+
159
+ # Check for sidecar VTT (some configurations emit both)
160
+ vtt_candidate = task_out / f"{basename}.vtt"
161
+ if vtt_candidate.is_file():
162
+ vtt_final = str(vtt_candidate)
163
+
164
+ # ── cleanup temp dir ──
165
+ try:
166
+ shutil.rmtree(task_tmp, ignore_errors=True)
167
+ except Exception:
168
+ pass
169
+
170
+ with _lock:
171
+ _tasks[task_id].update(
172
+ status="completed",
173
+ completed_at=datetime.now(timezone.utc).isoformat(),
174
+ output_srt=srt_final,
175
+ output_vtt=vtt_final,
176
+ duration_seconds=elapsed,
177
+ )
178
+ _save()
179
+ _prune_old_outputs()
180
+
181
+ except Exception as exc:
182
+ err_text = f"{exc}\n{traceback.format_exc()}"
183
+ with _lock:
184
+ _tasks[task_id].update(
185
+ status="failed",
186
+ completed_at=datetime.now(timezone.utc).isoformat(),
187
+ error=err_text[:800],
188
+ )
189
+ _save()
190
+ finally:
191
+ _semaphore.release()
192
+
193
+
194
+ # ═══════════════════════════════════════════════════════════════════════
195
+ # Callbacks
196
+ # ═══════════════════════════════════════════════════════════════════════
197
+
198
+ def submit_task(video_file) -> tuple:
199
+ """Kick off a new transcription task."""
200
+ if video_file is None:
201
+ return (
202
+ gr.update(value="Please upload a video or audio file first.", visible=True),
203
+ _render_monitor(),
204
+ _render_history(),
205
+ _render_download_section(),
206
+ )
207
+
208
+ if not _semaphore.acquire(blocking=False):
209
+ return (
210
+ gr.update(value="Another task is already processing. Please wait for it to finish.", visible=True),
211
+ _render_monitor(),
212
+ _render_history(),
213
+ _render_download_section(),
214
+ )
215
+
216
+ tid = uuid.uuid4().hex[:12]
217
+
218
+ # Gradio 4.x may return str, dict, or file-like object
219
+ if isinstance(video_file, str):
220
+ src_path = video_file
221
+ elif isinstance(video_file, dict):
222
+ src_path = video_file.get("name", "")
223
+ else:
224
+ src_path = getattr(video_file, "name", "")
225
+
226
+ if not src_path or not os.path.isfile(src_path):
227
+ _semaphore.release()
228
+ return (
229
+ gr.update(value="Upload failed β€” could not read file path.", visible=True),
230
+ _render_monitor(),
231
+ _render_history(),
232
+ _render_download_section(),
233
+ )
234
+
235
+ fname = Path(src_path).name
236
+
237
+ # Warn if file is very large (>2 GB) β€” may cause OOM on free tier
238
+ file_size_mb = os.path.getsize(src_path) / (1024 * 1024)
239
+ size_warning = ""
240
+ if file_size_mb > 2048:
241
+ size_warning = (
242
+ f" (Warning: file is {file_size_mb:.0f} MB. "
243
+ "Files >2 GB may fail on the 16 GB free tier.)"
244
+ )
245
+
246
+ # Copy to persistent upload location so it survives Gradio tmpdir cleanup
247
+ persistent = UPLOAD_DIR / f"{tid}_{fname}"
248
+ shutil.copy2(src_path, persistent)
249
+
250
+ with _lock:
251
+ _tasks[tid] = {
252
+ "id": tid,
253
+ "filename": fname,
254
+ "status": "queued",
255
+ "pipeline": "ChronosJAV (anime-whisper)",
256
+ "created_at": datetime.now(timezone.utc).isoformat(),
257
+ "completed_at": "",
258
+ "output_srt": "",
259
+ "output_vtt": "",
260
+ "error": "",
261
+ "duration_seconds": 0,
262
+ }
263
+ _save()
264
+
265
+ threading.Thread(
266
+ target=_run_transcription,
267
+ args=(tid, str(persistent)),
268
+ daemon=True,
269
+ ).start()
270
+
271
+ return (
272
+ gr.update(value=f"Submitted: {fname} (ID: `{tid}`){size_warning}", visible=True),
273
+ _render_monitor(),
274
+ _render_history(),
275
+ _render_download_section(),
276
+ )
277
+
278
+
279
+ # ── HTML renderers ────────────────────────────────────────────────────
280
+
281
+ _STATUS_COLORS = {
282
+ "queued": "#f0ad4e",
283
+ "running": "#5bc0de",
284
+ "completed": "#5cb85c",
285
+ "failed": "#d9534f",
286
+ "interrupted": "#999",
287
+ }
288
+ _STATUS_ICONS = {
289
+ "queued": "⏱", # ⏳
290
+ "running": "🔄", # πŸ”„
291
+ "completed": "✅", # βœ…
292
+ "failed": "❌", # ❌
293
+ "interrupted": "⏸", # ⏸
294
+ }
295
+
296
+ _CSS = """
297
+ <style>
298
+ .tr { font-family: 'SF Mono','Consolas',monospace; font-size: 12px; }
299
+ .tr-card {
300
+ border: 1px solid #e0e0e0; margin: 4px 0; padding: 8px 12px;
301
+ border-radius: 6px; background: #fafafa;
302
+ }
303
+ .tr-card .head { display:flex; justify-content:space-between; align-items:flex-start; }
304
+ .tr-card .meta { color: #666; margin-top: 3px; font-size: 11px; }
305
+ .tr-card .dl { margin-top: 6px; }
306
+ .dl-btn {
307
+ display: inline-block; margin: 2px 4px 2px 0; padding: 2px 10px;
308
+ background: #28a745; color: #fff; text-decoration: none;
309
+ border-radius: 4px; font-size: 12px;
310
+ }
311
+ .dl-btn:hover { background: #218838; }
312
+ .hist-table { width: 100%; border-collapse: collapse; font-size: 12px; }
313
+ .hist-table th { background: #2c3e50; color: #fff; padding: 8px; text-align: left; }
314
+ .hist-table td { padding: 6px 8px; border-bottom: 1px solid #ddd; }
315
+ .hist-table tr:hover { background: #f0f0f0; }
316
+ </style>
317
+ """
318
+
319
+
320
+ def _render_monitor() -> str:
321
+ """Return HTML for the real-time task monitor (all tasks, newest first)."""
322
+ with _lock:
323
+ items = list(_tasks.values())
324
+ if not items:
325
+ return _CSS + "<div style='text-align:center;padding:24px;color:#999;'>No tasks yet. Upload a file to start.</div>"
326
+
327
+ items.sort(key=lambda t: str(t.get("created_at", "")), reverse=True)
328
+ html = _CSS + '<div class="tr">'
329
+ for t in items[:40]:
330
+ st = t.get("status", "unknown")
331
+ color = _STATUS_COLORS.get(st, "#999")
332
+ icon = _STATUS_ICONS.get(st, "?")
333
+ html += f"""
334
+ <div class="tr-card" style="border-left:4px solid {color};">
335
+ <div class="head">
336
+ <strong>{icon} {t.get('filename','?')[:55]}</strong>
337
+ <span style="color:{color};font-weight:700;white-space:nowrap;">{st.upper()}</span>
338
+ </div>
339
+ <div class="meta">
340
+ ID: {t.get('id','?')} &nbsp;|&nbsp; {t.get('pipeline','')}
341
+ &nbsp;|&nbsp; {str(t.get('created_at',''))[:19]}
342
+ </div>"""
343
+ if st == "completed":
344
+ html += f'<div class="meta" style="color:#28a745;">Completed in {t.get("duration_seconds",0)}s</div>'
345
+ if st in ("failed", "interrupted"):
346
+ err = str(t.get("error", ""))[:250].replace("<", "&lt;").replace(">", "&gt;")
347
+ html += f'<div class="meta" style="color:#d9534f;">{err}</div>'
348
+ html += "</div>"
349
+
350
+ html += "</div>"
351
+ return html
352
+
353
+
354
+ def _render_history() -> str:
355
+ """Return an HTML table of completed tasks with inline download links."""
356
+ with _lock:
357
+ completed = [t for t in _tasks.values() if t.get("status") == "completed"]
358
+ if not completed:
359
+ return _CSS + "<div style='text-align:center;padding:24px;color:#999;'>No completed tasks yet.</div>"
360
+
361
+ completed.sort(key=lambda t: str(t.get("completed_at", "")), reverse=True)
362
+ html = _CSS + '<table class="hist-table"><thead><tr>'
363
+ html += "<th>File</th><th>Duration</th><th>Completed</th><th>Download</th>"
364
+ html += "</tr></thead><tbody>"
365
+
366
+ for t in completed[:MAX_OUTPUT_FILES]:
367
+ ca = str(t.get("completed_at", ""))[:19]
368
+ html += f"<tr><td>{t.get('filename','')[:55]}</td><td>{t.get('duration_seconds',0)}s</td><td>{ca}</td><td>"
369
+
370
+ srt = t.get("output_srt", "")
371
+ vtt = t.get("output_vtt", "")
372
+ if srt and os.path.isfile(srt):
373
+ rel = os.path.relpath(srt, BASE_DIR).replace("\\", "/")
374
+ html += f'<a class="dl-btn" href="/file={rel}" download="{Path(srt).name}">SRT</a> '
375
+ if vtt and os.path.isfile(vtt):
376
+ rel = os.path.relpath(vtt, BASE_DIR).replace("\\", "/")
377
+ html += f'<a class="dl-btn" href="/file={rel}" download="{Path(vtt).name}">VTT</a> '
378
+
379
+ html += "</td></tr>"
380
+
381
+ html += "</tbody></table>"
382
+ return html
383
+
384
+
385
+ def _render_download_section() -> str:
386
+ """Quick-download panel for the single most recent completed task."""
387
+ with _lock:
388
+ completed = sorted(
389
+ [t for t in _tasks.values() if t.get("status") == "completed"],
390
+ key=lambda t: str(t.get("completed_at", "")),
391
+ reverse=True,
392
+ )
393
+ if not completed:
394
+ return _CSS + "<div style='text-align:center;padding:12px;color:#999;'>No downloads available yet.</div>"
395
+
396
+ latest = completed[0]
397
+ html = _CSS + '<div style="padding: 8px;">'
398
+ html += f"<strong>Latest:</strong> {latest['filename']}<br>"
399
+ srt = latest.get("output_srt", "")
400
+ vtt = latest.get("output_vtt", "")
401
+ if srt and os.path.isfile(srt):
402
+ rel = os.path.relpath(srt, BASE_DIR).replace("\\", "/")
403
+ html += f'<a class="dl-btn" href="/file={rel}" download="{Path(srt).name}">Download SRT</a> '
404
+ if vtt and os.path.isfile(vtt):
405
+ rel = os.path.relpath(vtt, BASE_DIR).replace("\\", "/")
406
+ html += f'<a class="dl-btn" href="/file={rel}" download="{Path(vtt).name}">Download VTT</a> '
407
+ html += "</div>"
408
+ return html
409
+
410
+
411
+ def _auto_refresh() -> tuple:
412
+ """Called by Gradio's periodic timer to update all panels."""
413
+ return _render_monitor(), _render_history(), _render_download_section()
414
+
415
+
416
+ # ═══════════════════════════════════════════════════════════════════════
417
+ # Gradio UI
418
+ # ═══════════════════════════════════════════════════════════════════════
419
+
420
+ _FOOTER = """
421
+ <div style="position:fixed;bottom:0;left:0;right:0;padding:6px;
422
+ background:#f8f8f8;text-align:center;font-size:11px;color:#888;
423
+ border-top:1px solid #e0e0e0;">
424
+ WhisperJAV &copy; <a href="https://github.com/meizhong986/WhisperJAV" target="_blank">meizhong986</a>
425
+ &nbsp;|&nbsp; ChronosJAV pipeline (anime-whisper) &nbsp;|&nbsp;
426
+ CPU-only &nbsp;|&nbsp; Free HuggingFace Space
427
+ </div>
428
+ """
429
+
430
+
431
+ def build_ui() -> gr.Blocks:
432
+ with gr.Blocks(
433
+ title="WhisperJAV β€” Japanese Subtitle Generator",
434
+ theme=gr.themes.Soft(),
435
+ css="""
436
+ footer { visibility: hidden }
437
+ .app-footer { position: fixed; bottom: 0; left: 0; right: 0; z-index: 100; }
438
+ """,
439
+ ) as demo:
440
+
441
+ # ── Header ──
442
+ gr.Markdown("""
443
+ # WhisperJAV β€” Japanese Subtitle Generator
444
+
445
+ **ChronosJAV** pipeline with `litagin/anime-whisper` β€” a Whisper large-v3
446
+ fine-tuned on anime and JAV dialogue. Runs entirely on **CPU** (free tier).
447
+ First request downloads the model (~3 GB) β€” please be patient.
448
+
449
+ ⏱️ Processing speed: roughly **30-60 min** per hour of video on CPU.
450
+ """)
451
+
452
+ with gr.Tabs():
453
+ # ── Tab 1: New Task ──────────────────────────────────────
454
+ with gr.TabItem("New Transcription"):
455
+ with gr.Row():
456
+ with gr.Column(scale=2):
457
+ upload = gr.File(
458
+ label="Upload Video or Audio",
459
+ file_types=["video", "audio"],
460
+ file_count="single",
461
+ )
462
+ gr.Markdown(
463
+ "**Supported**: MP4, MKV, AVI, MOV, WMV, FLV, WAV, MP3, FLAC, M4A\n\n"
464
+ "**Pipeline**: ChronosJAV β€” Text generation + timestamp alignment.\n"
465
+ "The anime-whisper model is tuned specifically for Japanese dialogue."
466
+ )
467
+ submit_btn = gr.Button(
468
+ "Start Transcription",
469
+ variant="primary",
470
+ size="lg",
471
+ )
472
+
473
+ with gr.Column(scale=1):
474
+ status = gr.Textbox(
475
+ label="Status",
476
+ value="Ready. Upload a file to begin.",
477
+ interactive=False,
478
+ lines=3,
479
+ )
480
+ download_panel = gr.HTML(
481
+ value=_CSS + "<div style='text-align:center;padding:12px;color:#999;'>No downloads yet.</div>",
482
+ )
483
+
484
+ gr.Markdown("---")
485
+ gr.Markdown("### Task Monitor (auto-refreshes every 8 s)")
486
+ monitor_html = gr.HTML(value=_render_monitor())
487
+
488
+ # ── Tab 2: Download History ──────────────────────────────
489
+ with gr.TabItem("Download History"):
490
+ gr.Markdown("Completed tasks with download links. Click a button to download the subtitle file.")
491
+ refresh_hist_btn = gr.Button("Refresh", size="sm")
492
+ history_html = gr.HTML(value=_render_history())
493
+
494
+ # ── Footer ──
495
+ gr.HTML(_FOOTER, elem_classes=["app-footer"])
496
+
497
+ # ── Events ──
498
+ submit_btn.click(
499
+ fn=submit_task,
500
+ inputs=[upload],
501
+ outputs=[status, monitor_html, history_html, download_panel],
502
+ )
503
+
504
+ refresh_hist_btn.click(
505
+ fn=lambda: (_render_history(),),
506
+ inputs=[],
507
+ outputs=[history_html],
508
+ )
509
+
510
+ # Periodic auto-refresh of monitor + history + download panel
511
+ demo.load(
512
+ fn=_auto_refresh,
513
+ inputs=[],
514
+ outputs=[monitor_html, history_html, download_panel],
515
+ every=8,
516
+ )
517
+
518
+ return demo
519
+
520
+
521
+ # ═══════════════════════════════════════════════════════════════════════
522
+ # Entry Point
523
+ # ═══════════════════════════════════════════════════════════════════════
524
+
525
+ if __name__ == "__main__":
526
+ _load()
527
+ _prune_old_outputs()
528
+
529
+ app = build_ui()
530
+ app.queue(
531
+ max_size=10,
532
+ default_concurrency_limit=5, # allow multiple UI interactions
533
+ ).launch(
534
+ server_name="0.0.0.0",
535
+ server_port=7860,
536
+ show_error=True,
537
+ )
packages.txt ADDED
@@ -0,0 +1,2 @@
 
 
 
1
+ ffmpeg
2
+ libsndfile1
requirements.txt ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ --index-url https://download.pytorch.org/whl/cpu
2
+ torch>=2.0.0
3
+ torchaudio>=2.0.0
4
+
5
+ --index-url https://pypi.org/simple/
6
+ whisperjav[cli,huggingface] @ git+https://github.com/meizhong986/whisperjav.git
7
+ onnxruntime>=1.16.0
8
+ gradio>=4.0.0