henry99a commited on
Commit
a013059
Β·
1 Parent(s): e132ccd

feat: complete port with all 7 pipeline modes, advanced config, pipeline factory

Browse files
Files changed (2) hide show
  1. README.md +22 -41
  2. app.py +461 -192
README.md CHANGED
@@ -11,49 +11,30 @@ pinned: true
11
  license: mit
12
  ---
13
 
14
- # WhisperJAV β€” Japanese Subtitle Generator
15
 
16
- **ChronosJAV** pipeline powered by `litagin/anime-whisper`.
 
17
 
18
- A HuggingFace Space that brings the [WhisperJAV](https://github.com/meizhong986/WhisperJAV) subtitle generator to the cloud. Optimized for the **free CPU tier**.
19
 
20
- ## Features
21
-
22
- - **ChronosJAV Pipeline** β€” Decoupled text generation + timestamp alignment using anime-whisper
23
- - **Background Processing** β€” Tasks run in background threads; close your browser and come back later
24
- - **Task Monitor** β€” Real-time view of running, queued, and completed tasks
25
- - **Download History** β€” All previously generated subtitle files available for download
26
- - **CPU Only** β€” Designed for HuggingFace's free hardware (2 vCPU, 16 GB RAM)
27
-
28
- ## Usage
29
-
30
- 1. Upload a video or audio file (MP4, MKV, WAV, MP3, etc.)
31
- 2. Click **Start Transcription**
32
- 3. Wait for processing (30–60 min per hour of video on CPU)
33
- 4. Download the generated `.srt` or `.vtt` subtitle file from the **Download History** tab
34
-
35
- ## Pipeline
36
 
37
- The **ChronosJAV** pipeline separates text generation from timestamp alignment:
38
-
39
- | Stage | Component |
40
- |-------|-----------|
41
- | Audio extraction | FFmpeg (48 kHz) |
42
- | Scene detection | Semantic (MFCC clustering) |
43
- | Voice activity detection | WhisperSeg ONNX |
44
- | Text generation | `litagin/anime-whisper` (Whisper large-v3 fine-tune) |
45
- | Timestamp alignment | VAD-based (VAD_ONLY mode) |
46
- | Post-processing | Anime-whisper cleaner (ellipsis filtering) |
47
-
48
- ## Limitations
49
-
50
- - **CPU only** β€” processing is 5–10Γ— slower than GPU
51
- - **Japanese only** β€” optimized for Japanese dialogue; other languages may produce poor results
52
- - **First run latency** β€” the anime-whisper model (~3 GB) downloads on first use
53
- - **Free tier constraints** β€” 16 GB RAM, 50 GB disk; very long videos (>4 h) may fail
54
-
55
- ## Credits
56
 
57
- - [WhisperJAV](https://github.com/meizhong986/WhisperJAV) by MeiZhong
58
- - [anime-whisper](https://huggingface.co/litagin/anime-whisper) by litagin
59
- - [ChronusOmni](https://arxiv.org/abs/2512.09841) β€” Inspiration for the decoupled pipeline architecture
 
 
 
 
 
 
11
  license: mit
12
  ---
13
 
14
+ # WhisperJAV β€” Japanese Subtitle Generator (Full Port)
15
 
16
+ Complete port of [WhisperJAV](https://github.com/meizhong986/WhisperJAV) to HuggingFace Spaces.
17
+ All **7 pipeline modes**, ChronosJAV, sensitivity settings, and advanced configuration. CPU-optimized for the free tier.
18
 
19
+ ## Pipeline Modes
20
 
21
+ | Mode | Backend | Best For |
22
+ |------|---------|----------|
23
+ | **anime** | anime-whisper + ChronosJAV | Anime / JAV dialogue |
24
+ | **qwen** | Qwen3-ASR + forced alignment | Maximum accuracy |
25
+ | **balanced** | Faster-Whisper + Silero VAD | Default, noisy content |
26
+ | **fidelity** | OpenAI Whisper + stable-ts | Maximum fidelity |
27
+ | **fast** | Faster-Whisper + auditok | General use, mixed quality |
28
+ | **faster** | Faster-Whisper turbo | Speed, clean audio |
29
+ | **transformers** | Kotoba-Whisper / Qwen | Japanese-optimised models |
 
 
 
 
 
 
 
30
 
31
+ ## Features
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
32
 
33
+ - **All 7 pipeline modes** with full configuration
34
+ - **Sensitivity settings**: conservative, balanced, aggressive
35
+ - **Scene detection**: semantic, auditok, silero
36
+ - **Voice Activity Detection**: WhisperSeg, Silero, TEN
37
+ - **Background processing** β€” tasks run in daemon threads
38
+ - **Task monitor** β€” real-time status with auto-refresh
39
+ - **Download history** β€” select and download any past subtitle file
40
+ - **Storage Bucket** support β€” mount `/data` for persistent model cache
app.py CHANGED
@@ -1,14 +1,15 @@
1
  """
2
- WhisperJAV HuggingFace Space β€” Japanese Subtitle Generator
3
- ==========================================================
4
- ChronosJAV pipeline (anime-whisper) Β· CPU mode Β· Free tier
5
- Background task processing Β· Download history Β· Real-time monitor
6
 
7
  Architecture:
8
- - Gradio Blocks UI with tabs for task submission and monitoring
9
- - Background threads process transcription tasks without blocking the frontend
10
- - JSON file persists task state across Space restarts
11
- - Model auto-downloads from HuggingFace Hub on first use (~3 GB)
 
12
  """
13
 
14
  from __future__ import annotations
@@ -17,15 +18,13 @@ import json
17
  import os
18
 
19
  # ── Storage Bucket support ──
20
- # Redirect HuggingFace cache to mounted persistent storage so models
21
- # survive Space rebuilds. Falls back to default ~/.cache/huggingface
22
- # if the bucket path is not present.
23
  _BUCKET_HOME = "/data/huggingface"
24
  if os.path.isdir("/data") and os.access("/data", os.W_OK):
25
  os.makedirs(_BUCKET_HOME, exist_ok=True)
26
  os.environ.setdefault("HF_HOME", _BUCKET_HOME)
27
  os.environ.setdefault("HF_HUB_CACHE", os.path.join(_BUCKET_HOME, "hub"))
28
  os.environ.setdefault("TRANSFORMERS_CACHE", os.path.join(_BUCKET_HOME, "hub"))
 
29
  import shutil
30
  import threading
31
  import time
@@ -37,32 +36,70 @@ from typing import Any, Dict, List, Optional
37
 
38
  import gradio as gr
39
 
40
- # ═══════════════════════════════════════════════════════════════════════
41
  # Paths & Configuration
42
- # ═══════════════════════════════════════════════════════════════════════
43
 
44
  BASE_DIR = Path(__file__).resolve().parent
45
  OUTPUT_DIR = BASE_DIR / "outputs"
46
  TEMP_DIR = BASE_DIR / "temp"
47
  UPLOAD_DIR = BASE_DIR / "uploads"
48
  TASKS_FILE = BASE_DIR / "tasks.json"
49
- MAX_OUTPUT_FILES = 20 # keep most recent N task directories; prune older
50
 
51
  OUTPUT_DIR.mkdir(exist_ok=True)
52
  TEMP_DIR.mkdir(exist_ok=True)
53
  UPLOAD_DIR.mkdir(exist_ok=True)
54
 
55
- # ═══════════════════════════════════════════════════════════════════════
56
- # Task Store (memory + JSON-backed)
57
- # ═══════════════════════════════════════════════════════════════════════
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
58
 
59
  _tasks: Dict[str, dict] = {}
60
  _lock = threading.Lock()
61
- _semaphore = threading.Semaphore(1) # single concurrent CPU task
62
 
63
 
64
  def _load() -> None:
65
- """Load persisted tasks; mark stale 'running' ones as interrupted."""
66
  global _tasks
67
  if not TASKS_FILE.exists():
68
  return
@@ -79,7 +116,6 @@ def _load() -> None:
79
 
80
 
81
  def _save() -> None:
82
- """Persist lightweight view of tasks to JSON."""
83
  with _lock:
84
  slim: Dict[str, dict] = {}
85
  for tid, t in _tasks.items():
@@ -88,6 +124,7 @@ def _save() -> None:
88
  "filename": t.get("filename", ""),
89
  "status": t.get("status", "unknown"),
90
  "pipeline": t.get("pipeline", ""),
 
91
  "created_at": str(t.get("created_at", "")),
92
  "completed_at": str(t.get("completed_at", "")),
93
  "output_srt": t.get("output_srt", ""),
@@ -95,14 +132,10 @@ def _save() -> None:
95
  "error": str(t.get("error", ""))[:500],
96
  "duration_seconds": t.get("duration_seconds", 0),
97
  }
98
- TASKS_FILE.write_text(
99
- json.dumps(slim, ensure_ascii=False, indent=2),
100
- encoding="utf-8",
101
- )
102
 
103
 
104
  def _prune_old_outputs() -> None:
105
- """Remove task output dirs beyond MAX_OUTPUT_FILES to save disk."""
106
  with _lock:
107
  completed = sorted(
108
  [t for t in _tasks.values() if t.get("status") == "completed"],
@@ -118,13 +151,144 @@ def _prune_old_outputs() -> None:
118
  pass
119
 
120
 
121
- # ═══════════════════════════════════════════════════════════════════════
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
122
  # Background Worker
123
- # ═══════════════════════════════════════════════════════════════════════
124
 
125
  def _run_transcription(task_id: str, video_path: str) -> None:
126
- """Called in a daemon thread. Loads whisperjav lazily so that the
127
- Gradio UI can start serving immediately while models download."""
128
  try:
129
  with _lock:
130
  _tasks[task_id]["status"] = "running"
@@ -132,29 +296,22 @@ def _run_transcription(task_id: str, video_path: str) -> None:
132
 
133
  t0 = time.time()
134
  vp = Path(video_path)
135
-
136
- # Use the original filename (without upload prefix) for output naming
137
- original_filename = _tasks.get(task_id, {}).get("filename", vp.name)
138
  basename = Path(original_filename).stem
 
 
139
 
140
  task_out = OUTPUT_DIR / task_id
141
  task_tmp = TEMP_DIR / task_id
142
  task_out.mkdir(parents=True, exist_ok=True)
143
  task_tmp.mkdir(parents=True, exist_ok=True)
144
 
145
- # ── lazy import (model download happens here on first call) ──
146
- from whisperjav.pipelines.qwen_pipeline import QwenPipeline
147
-
148
- pipeline = QwenPipeline(
149
- generator_backend="anime-whisper",
150
- model_id="litagin/anime-whisper",
151
- device="cpu",
152
- dtype="float32",
153
- scene_detector="semantic",
154
- speech_segmenter="whisperseg",
155
- language="Japanese",
156
  output_dir=str(task_out),
157
  temp_dir=str(task_tmp),
 
158
  )
159
 
160
  result = pipeline.process({"path": str(vp), "basename": basename})
@@ -162,7 +319,7 @@ def _run_transcription(task_id: str, video_path: str) -> None:
162
 
163
  elapsed = round(time.time() - t0, 1)
164
 
165
- # ── copy final artefacts ──
166
  srt_final = ""
167
  vtt_final = ""
168
  srt_src = result.get("srt_path", "")
@@ -171,12 +328,17 @@ def _run_transcription(task_id: str, video_path: str) -> None:
171
  shutil.copy2(srt_src, dst)
172
  srt_final = str(dst)
173
 
174
- # Check for sidecar VTT (some configurations emit both)
175
  vtt_candidate = task_out / f"{basename}.vtt"
176
  if vtt_candidate.is_file():
177
  vtt_final = str(vtt_candidate)
178
 
179
- # ── cleanup temp dir ──
 
 
 
 
 
 
180
  try:
181
  shutil.rmtree(task_tmp, ignore_errors=True)
182
  except Exception:
@@ -206,33 +368,30 @@ def _run_transcription(task_id: str, video_path: str) -> None:
206
  _semaphore.release()
207
 
208
 
209
- # ═══════════════════════════════════════════════════════════════════════
210
  # Callbacks
211
- # ═══════════════════════════════════════════════════════════════════════
212
-
213
- def submit_task(video_file) -> tuple:
214
- """Kick off a new transcription task."""
 
 
 
 
215
  if video_file is None:
216
  return (
217
- gr.update(value="Please upload a video or audio file first.", visible=True),
218
- _render_monitor(),
219
- _render_history(),
220
- None,
221
- _get_completed_filenames(),
222
  )
223
 
224
  if not _semaphore.acquire(blocking=False):
225
  return (
226
- gr.update(value="Another task is already processing. Please wait for it to finish.", visible=True),
227
- _render_monitor(),
228
- _render_history(),
229
- None,
230
- _get_completed_filenames(),
231
  )
232
 
233
  tid = uuid.uuid4().hex[:12]
234
 
235
- # Gradio 4.x may return str, dict, or file-like object
236
  if isinstance(video_file, str):
237
  src_path = video_file
238
  elif isinstance(video_file, dict):
@@ -240,37 +399,52 @@ def submit_task(video_file) -> tuple:
240
  else:
241
  src_path = getattr(video_file, "name", "")
242
 
243
- if not src_path or not os.path.isfile(src_path):
244
- _semaphore.release()
245
- return (
246
- gr.update(value="Upload failed β€” could not read file path.", visible=True),
247
- _render_monitor(),
248
- _render_history(),
249
- None,
250
- _get_completed_filenames(),
251
- )
252
 
253
  fname = Path(src_path).name
254
 
255
- # Warn if file is very large (>2 GB) β€” may cause OOM on free tier
256
  file_size_mb = os.path.getsize(src_path) / (1024 * 1024)
257
  size_warning = ""
258
  if file_size_mb > 2048:
259
- size_warning = (
260
- f" (Warning: file is {file_size_mb:.0f} MB. "
261
- "Files >2 GB may fail on the 16 GB free tier.)"
262
- )
263
 
264
- # Copy to persistent upload location so it survives Gradio tmpdir cleanup
265
  persistent = UPLOAD_DIR / f"{tid}_{fname}"
266
  shutil.copy2(src_path, persistent)
267
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
268
  with _lock:
269
  _tasks[tid] = {
270
  "id": tid,
271
  "filename": fname,
272
  "status": "queued",
273
- "pipeline": "ChronosJAV (anime-whisper)",
 
274
  "created_at": datetime.now(timezone.utc).isoformat(),
275
  "completed_at": "",
276
  "output_srt": "",
@@ -280,84 +454,62 @@ def submit_task(video_file) -> tuple:
280
  }
281
  _save()
282
 
283
- threading.Thread(
284
- target=_run_transcription,
285
- args=(tid, str(persistent)),
286
- daemon=True,
287
- ).start()
288
 
289
  return (
290
- gr.update(value=f"Submitted: {fname} (ID: `{tid}`){size_warning}", visible=True),
291
- _render_monitor(),
292
- _render_history(),
293
- None,
294
- _get_completed_filenames(),
295
  )
296
 
297
 
298
- # ── HTML renderers ────────────────────────────────────────────────────
299
 
300
  _STATUS_COLORS = {
301
- "queued": "#f0ad4e",
302
- "running": "#5bc0de",
303
- "completed": "#5cb85c",
304
- "failed": "#d9534f",
305
- "interrupted": "#999",
306
  }
307
  _STATUS_ICONS = {
308
- "queued": "⏱", # ⏳
309
- "running": "🔄", # πŸ”„
310
- "completed": "✅", # βœ…
311
- "failed": "❌", # ❌
312
- "interrupted": "⏸", # ⏸
313
  }
314
 
315
- _CSS = """
316
- <style>
317
- .tr { font-family: 'SF Mono','Consolas',monospace; font-size: 12px; }
318
- .tr-card {
319
- border: 1px solid #e0e0e0; margin: 4px 0; padding: 8px 12px;
320
- border-radius: 6px; background: #fafafa;
321
- }
322
  .tr-card .head { display:flex; justify-content:space-between; align-items:flex-start; }
323
- .tr-card .meta { color: #666; margin-top: 3px; font-size: 11px; }
324
- .tr-card .dl { margin-top: 6px; }
325
- .dl-btn {
326
- display: inline-block; margin: 2px 4px 2px 0; padding: 2px 10px;
327
- background: #28a745; color: #fff; text-decoration: none;
328
- border-radius: 4px; font-size: 12px;
329
- }
330
- .dl-btn:hover { background: #218838; }
331
- .hist-table { width: 100%; border-collapse: collapse; font-size: 12px; }
332
- .hist-table th { background: #2c3e50; color: #fff; padding: 8px; text-align: left; }
333
- .hist-table td { padding: 6px 8px; border-bottom: 1px solid #ddd; }
334
- .hist-table tr:hover { background: #f0f0f0; }
335
- </style>
336
- """
337
 
338
 
339
  def _render_monitor() -> str:
340
- """Return HTML for the real-time task monitor (all tasks, newest first)."""
341
  with _lock:
342
  items = list(_tasks.values())
343
  if not items:
344
  return _CSS + "<div style='text-align:center;padding:24px;color:#999;'>No tasks yet. Upload a file to start.</div>"
345
-
346
  items.sort(key=lambda t: str(t.get("created_at", "")), reverse=True)
347
  html = _CSS + '<div class="tr">'
348
  for t in items[:40]:
349
  st = t.get("status", "unknown")
350
  color = _STATUS_COLORS.get(st, "#999")
351
  icon = _STATUS_ICONS.get(st, "?")
352
- html += f"""
353
- <div class="tr-card" style="border-left:4px solid {color};">
 
 
 
 
 
 
354
  <div class="head">
355
  <strong>{icon} {t.get('filename','?')[:55]}</strong>
356
  <span style="color:{color};font-weight:700;white-space:nowrap;">{st.upper()}</span>
357
  </div>
358
  <div class="meta">
359
- ID: {t.get('id','?')} &nbsp;|&nbsp; {t.get('pipeline','')}
360
- &nbsp;|&nbsp; {str(t.get('created_at',''))[:19]}
361
  </div>"""
362
  if st == "completed":
363
  html += f'<div class="meta" style="color:#28a745;">Completed in {t.get("duration_seconds",0)}s</div>'
@@ -365,94 +517,107 @@ def _render_monitor() -> str:
365
  err = str(t.get("error", ""))[:250].replace("<", "&lt;").replace(">", "&gt;")
366
  html += f'<div class="meta" style="color:#d9534f;">{err}</div>'
367
  html += "</div>"
368
-
369
  html += "</div>"
370
  return html
371
 
372
 
373
  def _render_history() -> str:
374
- """Return an HTML table of completed tasks."""
375
  with _lock:
376
  completed = [t for t in _tasks.values() if t.get("status") == "completed"]
377
  if not completed:
378
  return _CSS + "<div style='text-align:center;padding:24px;color:#999;'>No completed tasks yet.</div>"
379
-
380
  completed.sort(key=lambda t: str(t.get("completed_at", "")), reverse=True)
381
  html = _CSS + '<table class="hist-table"><thead><tr>'
382
- html += "<th>File</th><th>Duration</th><th>Completed</th>"
383
  html += "</tr></thead><tbody>"
384
-
385
  for t in completed[:MAX_OUTPUT_FILES]:
386
  ca = str(t.get("completed_at", ""))[:19]
387
- html += f"<tr><td>{t.get('filename','')[:55]}</td><td>{t.get('duration_seconds',0)}s</td><td>{ca}</td></tr>"
388
-
389
  html += "</tbody></table>"
390
  return html
391
 
392
 
393
  def _get_latest_srt() -> Optional[str]:
394
- """Return the file path of the most recently completed task's SRT."""
395
  with _lock:
396
  completed = sorted(
397
  [t for t in _tasks.values() if t.get("status") == "completed"],
398
- key=lambda t: str(t.get("completed_at", "")),
399
- reverse=True,
400
  )
401
  if not completed:
402
  return None
403
  srt = completed[0].get("output_srt", "")
404
- if srt and os.path.isfile(srt):
405
- return srt
406
- return None
407
 
408
 
409
  def _get_task_file(task_filename: str) -> Optional[str]:
410
- """Given a completed task's display name, return its SRT path."""
 
411
  with _lock:
412
  for t in _tasks.values():
413
  if t.get("filename") == task_filename and t.get("status") == "completed":
414
  srt = t.get("output_srt", "")
415
- if srt and os.path.isfile(srt):
416
- return srt
417
  return None
418
 
419
 
420
  def _get_completed_filenames() -> List[str]:
421
- """Return list of completed task filenames for dropdown."""
422
  with _lock:
423
  completed = sorted(
424
  [t for t in _tasks.values() if t.get("status") == "completed"],
425
- key=lambda t: str(t.get("completed_at", "")),
426
- reverse=True,
427
  )
428
  return [t.get("filename", "?") for t in completed]
429
 
430
 
431
  def _auto_refresh() -> tuple:
432
- """Called by Gradio's periodic timer to update all panels."""
433
  latest = _get_latest_srt()
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
434
  return (
435
- _render_monitor(),
436
- _render_history(),
437
- latest if latest else None,
438
- _get_completed_filenames(),
439
  )
440
 
441
 
442
- # ═══════════════════════════════════════════════════════════════════════
443
  # Gradio UI
444
- # ═══════════════════════════════════════════════════════════════════════
445
 
446
  _FOOTER = """
447
  <div style="position:fixed;bottom:0;left:0;right:0;padding:6px;
448
  background:#f8f8f8;text-align:center;font-size:11px;color:#888;
449
  border-top:1px solid #e0e0e0;">
450
  WhisperJAV &copy; <a href="https://github.com/meizhong986/WhisperJAV" target="_blank">meizhong986</a>
451
- &nbsp;|&nbsp; ChronosJAV pipeline (anime-whisper) &nbsp;|&nbsp;
452
- CPU-only &nbsp;|&nbsp; Free HuggingFace Space
453
  </div>
454
  """
455
 
 
 
 
 
 
 
 
 
 
 
 
456
 
457
  def build_ui() -> gr.Blocks:
458
  with gr.Blocks(
@@ -461,54 +626,147 @@ def build_ui() -> gr.Blocks:
461
  css="""
462
  footer { visibility: hidden }
463
  .app-footer { position: fixed; bottom: 0; left: 0; right: 0; z-index: 100; }
 
464
  """,
465
  ) as demo:
466
 
467
- # ── Header ──
468
  gr.Markdown("""
469
  # WhisperJAV β€” Japanese Subtitle Generator
470
 
471
- **ChronosJAV** pipeline with `litagin/anime-whisper` β€” a Whisper large-v3
472
- fine-tuned on anime and JAV dialogue. Runs entirely on **CPU** (free tier).
473
- First request downloads the model (~3 GB) β€” please be patient.
474
-
475
- ⏱️ Processing speed: roughly **30-60 min** per hour of video on CPU.
476
  """)
477
 
478
  with gr.Tabs():
479
- # ── Tab 1: New Task ──────────────────────────────────────
480
  with gr.Tab("New Transcription"):
481
  with gr.Row():
 
482
  with gr.Column(scale=2):
483
  upload = gr.File(
484
  label="Upload Video or Audio",
485
  file_types=["video", "audio"],
486
  file_count="single",
487
  )
488
- gr.Markdown(
489
- "**Supported**: MP4, MKV, AVI, MOV, WMV, FLV, WAV, MP3, FLAC, M4A\n\n"
490
- "**Pipeline**: ChronosJAV β€” Text generation + timestamp alignment.\n"
491
- "The anime-whisper model is tuned specifically for Japanese dialogue."
492
- )
493
- submit_btn = gr.Button(
494
- "Start Transcription",
495
- variant="primary",
496
- size="lg",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
497
  )
498
 
 
499
  with gr.Column(scale=1):
500
  status = gr.Textbox(
501
  label="Status",
502
  value="Ready. Upload a file to begin.",
503
  interactive=False,
504
- lines=3,
505
  )
506
  latest_download = gr.File(
507
  label="Latest Subtitle",
508
  interactive=False,
509
- visible=True,
510
  )
511
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
512
  gr.Markdown("---")
513
  gr.Markdown("### Task Monitor (auto-refreshes every 8 s)")
514
  monitor_html = gr.HTML(value=_render_monitor())
@@ -517,27 +775,42 @@ def build_ui() -> gr.Blocks:
517
  with gr.Tab("Download History"):
518
  gr.Markdown("Pick a completed task, then download its subtitle file.")
519
  with gr.Row():
520
- with gr.Column(scale=1):
521
- hist_dropdown = gr.Dropdown(
522
- label="Select Completed Task",
523
- choices=_get_completed_filenames(),
524
- interactive=True,
525
- )
526
- with gr.Column(scale=1):
527
- hist_download = gr.File(
528
- label="Subtitle File",
529
- interactive=False,
530
- )
531
  gr.Markdown("---")
532
  history_html = gr.HTML(value=_render_history())
533
 
534
  # ── Footer ──
535
  gr.HTML(_FOOTER, elem_classes=["app-footer"])
536
 
537
- # ── Events ──
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
538
  submit_btn.click(
539
  fn=submit_task,
540
- inputs=[upload],
 
 
 
 
 
541
  outputs=[status, monitor_html, history_html, latest_download, hist_dropdown],
542
  )
543
 
@@ -547,23 +820,19 @@ def build_ui() -> gr.Blocks:
547
  outputs=[hist_download],
548
  )
549
 
550
- # Auto-refresh every 8 seconds (Gradio 5.x Timer API)
551
  timer = gr.Timer(8, active=True)
552
  timer.tick(fn=_auto_refresh, outputs=[monitor_html, history_html, latest_download, hist_dropdown])
553
 
554
  return demo
555
 
556
 
557
- # ═══════════════════════════════════════════════════════════════════════
558
  # Entry Point
559
- # ═══════════════════════════════════════════════════════════════════════
560
 
561
  if __name__ == "__main__":
562
  _load()
563
  _prune_old_outputs()
564
 
565
  app = build_ui()
566
- app.queue(
567
- max_size=10,
568
- default_concurrency_limit=5,
569
- ).launch()
 
1
  """
2
+ WhisperJAV HuggingFace Space β€” Complete Japanese Subtitle Generator
3
+ ====================================================================
4
+ Full port with all 7 pipeline modes, sensitivity settings, and
5
+ configuration options. CPU-optimized for free HuggingFace tier.
6
 
7
  Architecture:
8
+ - Gradio Blocks UI with full configuration panel
9
+ - Pipeline factory maps mode selection to correct pipeline class
10
+ - Background threads for transcription (non-blocking)
11
+ - JSON-backed task store with download history
12
+ - Auto-detects /data Storage Bucket for persistent model cache
13
  """
14
 
15
  from __future__ import annotations
 
18
  import os
19
 
20
  # ── Storage Bucket support ──
 
 
 
21
  _BUCKET_HOME = "/data/huggingface"
22
  if os.path.isdir("/data") and os.access("/data", os.W_OK):
23
  os.makedirs(_BUCKET_HOME, exist_ok=True)
24
  os.environ.setdefault("HF_HOME", _BUCKET_HOME)
25
  os.environ.setdefault("HF_HUB_CACHE", os.path.join(_BUCKET_HOME, "hub"))
26
  os.environ.setdefault("TRANSFORMERS_CACHE", os.path.join(_BUCKET_HOME, "hub"))
27
+
28
  import shutil
29
  import threading
30
  import time
 
36
 
37
  import gradio as gr
38
 
39
+ # ═══════════════════════════════════════════════════════════════════════════
40
  # Paths & Configuration
41
+ # ═══════════════════════════════════════════════════════════════════════════
42
 
43
  BASE_DIR = Path(__file__).resolve().parent
44
  OUTPUT_DIR = BASE_DIR / "outputs"
45
  TEMP_DIR = BASE_DIR / "temp"
46
  UPLOAD_DIR = BASE_DIR / "uploads"
47
  TASKS_FILE = BASE_DIR / "tasks.json"
48
+ MAX_OUTPUT_FILES = 20
49
 
50
  OUTPUT_DIR.mkdir(exist_ok=True)
51
  TEMP_DIR.mkdir(exist_ok=True)
52
  UPLOAD_DIR.mkdir(exist_ok=True)
53
 
54
+ # ═══════════════════════════════════════════════════════════════════════════
55
+ # Pipeline Config Registry
56
+ # ═══════════════════════════════════════════════════════════════════════════
57
+
58
+ PIPELINE_MODES = [
59
+ "anime", # ChronosJAV β€” anime-whisper, text gen + VAD alignment
60
+ "qwen", # Qwen3-ASR with forced alignment
61
+ "balanced", # Faster-Whisper + auditok + Silero VAD (default)
62
+ "fidelity", # OpenAI Whisper + auditok + Silero VAD (max accuracy)
63
+ "fast", # Faster-Whisper + auditok (general use)
64
+ "faster", # Faster-Whisper turbo (speed, clean audio)
65
+ "transformers", # HuggingFace Kotoba-Whisper models
66
+ ]
67
+
68
+ PIPELINE_INFO = {
69
+ "anime": "ChronosJAV β€” anime-whisper (text gen + VAD alignment). Best for anime/JAV dialogue.",
70
+ "qwen": "Qwen3-ASR with forced word-level alignment. High accuracy, slower.",
71
+ "balanced": "Faster-Whisper + auditok + Silero VAD. Good default for noisy, dialogue-heavy content.",
72
+ "fidelity": "OpenAI Whisper + stable-ts. Maximum accuracy, slowest.",
73
+ "fast": "Faster-Whisper + auditok. Good for mixed quality audio.",
74
+ "faster": "Faster-Whisper turbo, no scene detection. Fastest, for clean audio.",
75
+ "transformers": "HuggingFace Kotoba-Whisper (Japanese-optimised). Supports HF and Qwen backends.",
76
+ }
77
+
78
+ SENSITIVITY_OPTIONS = ["balanced", "aggressive", "conservative"]
79
+ LANGUAGE_OPTIONS = ["Japanese", "auto"]
80
+ OUTPUT_FORMATS = ["srt", "vtt", "both"]
81
+ SCENE_DETECTORS = ["semantic", "auditok", "silero", "none"]
82
+ SPEECH_SEGMENTERS = ["whisperseg", "silero", "ten", "none"]
83
+ QWEEN_GENERATORS = ["qwen3", "anime-whisper", "cohere"]
84
+ QWEEN_MODES = ["assembly", "context_aware", "vad_slicing"]
85
+ TRANSFORMERS_BACKENDS = ["hf", "qwen"]
86
+ TRANSFORMERS_MODELS = [
87
+ "kotoba-tech/kotoba-whisper-bilingual-v1.0",
88
+ "kotoba-tech/kotoba-whisper-v2.0",
89
+ "kotoba-tech/kotoba-whisper-v2.1",
90
+ "kotoba-tech/kotoba-whisper-v2.2",
91
+ ]
92
+
93
+ # ═══════════════════════════════════════════════════════════════════════════
94
+ # Task Store
95
+ # ═══════════════════════════════════════════════════════════════════════════
96
 
97
  _tasks: Dict[str, dict] = {}
98
  _lock = threading.Lock()
99
+ _semaphore = threading.Semaphore(1)
100
 
101
 
102
  def _load() -> None:
 
103
  global _tasks
104
  if not TASKS_FILE.exists():
105
  return
 
116
 
117
 
118
  def _save() -> None:
 
119
  with _lock:
120
  slim: Dict[str, dict] = {}
121
  for tid, t in _tasks.items():
 
124
  "filename": t.get("filename", ""),
125
  "status": t.get("status", "unknown"),
126
  "pipeline": t.get("pipeline", ""),
127
+ "config": t.get("config", ""),
128
  "created_at": str(t.get("created_at", "")),
129
  "completed_at": str(t.get("completed_at", "")),
130
  "output_srt": t.get("output_srt", ""),
 
132
  "error": str(t.get("error", ""))[:500],
133
  "duration_seconds": t.get("duration_seconds", 0),
134
  }
135
+ TASKS_FILE.write_text(json.dumps(slim, ensure_ascii=False, indent=2), encoding="utf-8")
 
 
 
136
 
137
 
138
  def _prune_old_outputs() -> None:
 
139
  with _lock:
140
  completed = sorted(
141
  [t for t in _tasks.values() if t.get("status") == "completed"],
 
151
  pass
152
 
153
 
154
+ # ═══════════════════════════════════════════════════════════════════════════
155
+ # Pipeline Factory
156
+ # ═══════════════════════════════════════════════════════════════════════════
157
+
158
+ def _build_pipeline(mode: str, output_dir: str, temp_dir: str, **kwargs):
159
+ """Create the appropriate whisperjav pipeline instance for the given mode."""
160
+ device = "cpu"
161
+ dtype = "float32"
162
+ language = kwargs.get("language", "Japanese")
163
+
164
+ if mode == "faster":
165
+ from whisperjav.pipelines.faster_pipeline import FasterPipeline
166
+ return FasterPipeline(
167
+ output_dir=output_dir,
168
+ temp_dir=temp_dir,
169
+ keep_temp_files=False,
170
+ subs_language="native",
171
+ resolved_config={
172
+ "provider": {"device": device, "compute_type": dtype},
173
+ "scene_detection": {"method": "none"},
174
+ "vad": {"enabled": False},
175
+ "transcription": {"language": language},
176
+ },
177
+ )
178
+
179
+ elif mode == "fast":
180
+ from whisperjav.pipelines.fast_pipeline import FastPipeline
181
+ return FastPipeline(
182
+ output_dir=output_dir,
183
+ temp_dir=temp_dir,
184
+ keep_temp_files=False,
185
+ subs_language="native",
186
+ resolved_config={
187
+ "provider": {"device": device, "compute_type": dtype},
188
+ "scene_detection": {"method": kwargs.get("scene_detector", "auditok")},
189
+ "vad": {"enabled": False},
190
+ "transcription": {"language": language},
191
+ },
192
+ )
193
+
194
+ elif mode == "balanced":
195
+ from whisperjav.pipelines.balanced_pipeline import BalancedPipeline
196
+ return BalancedPipeline(
197
+ output_dir=output_dir,
198
+ temp_dir=temp_dir,
199
+ keep_temp_files=False,
200
+ subs_language="native",
201
+ resolved_config={
202
+ "provider": {"device": device, "compute_type": dtype},
203
+ "scene_detection": {"method": kwargs.get("scene_detector", "auditok")},
204
+ "vad": {
205
+ "enabled": kwargs.get("speech_segmenter", "silero") != "none",
206
+ "method": kwargs.get("speech_segmenter", "silero"),
207
+ },
208
+ "transcription": {"language": language},
209
+ },
210
+ )
211
+
212
+ elif mode == "fidelity":
213
+ from whisperjav.pipelines.fidelity_pipeline import FidelityPipeline
214
+ return FidelityPipeline(
215
+ output_dir=output_dir,
216
+ temp_dir=temp_dir,
217
+ keep_temp_files=False,
218
+ subs_language="native",
219
+ resolved_config={
220
+ "provider": {"device": device, "compute_type": dtype},
221
+ "scene_detection": {"method": kwargs.get("scene_detector", "auditok")},
222
+ "vad": {
223
+ "enabled": kwargs.get("speech_segmenter", "silero") != "none",
224
+ "method": kwargs.get("speech_segmenter", "silero"),
225
+ },
226
+ "transcription": {"language": language},
227
+ },
228
+ )
229
+
230
+ elif mode == "transformers":
231
+ from whisperjav.pipelines.transformers_pipeline import TransformersPipeline
232
+ backend = kwargs.get("transformers_backend", "hf")
233
+ hf_lang = None if language == "auto" else (language[:2].lower() if language != "Japanese" else "ja")
234
+ return TransformersPipeline(
235
+ output_dir=output_dir,
236
+ temp_dir=temp_dir,
237
+ keep_temp_files=False,
238
+ subs_language="native",
239
+ asr_backend=backend,
240
+ hf_model_id=kwargs.get("hf_model_id", "kotoba-tech/kotoba-whisper-bilingual-v1.0"),
241
+ hf_device=device,
242
+ hf_dtype=dtype,
243
+ hf_language=hf_lang or "ja",
244
+ qwen_device=device,
245
+ qwen_dtype=dtype,
246
+ )
247
+
248
+ elif mode == "qwen":
249
+ from whisperjav.pipelines.qwen_pipeline import QwenPipeline
250
+ generator = kwargs.get("qwen_generator", "qwen3")
251
+ model_map = {
252
+ "qwen3": "Qwen/Qwen3-ASR-1.7B",
253
+ "anime-whisper": "litagin/anime-whisper",
254
+ "cohere": "CohereLabs/cohere-transcribe-03-2026",
255
+ }
256
+ return QwenPipeline(
257
+ generator_backend=generator,
258
+ model_id=kwargs.get("qwen_model_id", model_map.get(generator, model_map["qwen3"])),
259
+ device=device,
260
+ dtype=dtype,
261
+ scene_detector=kwargs.get("scene_detector", "semantic"),
262
+ speech_segmenter=kwargs.get("speech_segmenter", "whisperseg"),
263
+ language=None if language == "auto" else language,
264
+ qwen_input_mode=kwargs.get("qwen_mode", "assembly"),
265
+ output_dir=output_dir,
266
+ temp_dir=temp_dir,
267
+ )
268
+
269
+ elif mode == "anime":
270
+ from whisperjav.pipelines.qwen_pipeline import QwenPipeline
271
+ return QwenPipeline(
272
+ generator_backend="anime-whisper",
273
+ model_id="litagin/anime-whisper",
274
+ device=device,
275
+ dtype=dtype,
276
+ scene_detector=kwargs.get("scene_detector", "semantic"),
277
+ speech_segmenter=kwargs.get("speech_segmenter", "whisperseg"),
278
+ language=None if language == "auto" else language,
279
+ output_dir=output_dir,
280
+ temp_dir=temp_dir,
281
+ )
282
+
283
+ else:
284
+ raise ValueError(f"Unknown pipeline mode: {mode}")
285
+
286
+
287
+ # ═══════════════════════════════════════════════════════════════════════════
288
  # Background Worker
289
+ # ═══════════════════════════════════════════════════════════════════════════
290
 
291
  def _run_transcription(task_id: str, video_path: str) -> None:
 
 
292
  try:
293
  with _lock:
294
  _tasks[task_id]["status"] = "running"
 
296
 
297
  t0 = time.time()
298
  vp = Path(video_path)
299
+ task = _tasks.get(task_id, {})
300
+ original_filename = task.get("filename", vp.name)
 
301
  basename = Path(original_filename).stem
302
+ mode = task.get("pipeline", "anime")
303
+ config = task.get("config", {})
304
 
305
  task_out = OUTPUT_DIR / task_id
306
  task_tmp = TEMP_DIR / task_id
307
  task_out.mkdir(parents=True, exist_ok=True)
308
  task_tmp.mkdir(parents=True, exist_ok=True)
309
 
310
+ pipeline = _build_pipeline(
311
+ mode=mode,
 
 
 
 
 
 
 
 
 
312
  output_dir=str(task_out),
313
  temp_dir=str(task_tmp),
314
+ **config,
315
  )
316
 
317
  result = pipeline.process({"path": str(vp), "basename": basename})
 
319
 
320
  elapsed = round(time.time() - t0, 1)
321
 
322
+ # Copy output files
323
  srt_final = ""
324
  vtt_final = ""
325
  srt_src = result.get("srt_path", "")
 
328
  shutil.copy2(srt_src, dst)
329
  srt_final = str(dst)
330
 
 
331
  vtt_candidate = task_out / f"{basename}.vtt"
332
  if vtt_candidate.is_file():
333
  vtt_final = str(vtt_candidate)
334
 
335
+ # Also look for whisperjav-named files
336
+ for f in task_out.iterdir():
337
+ if f.suffix == ".srt" and not srt_final:
338
+ srt_final = str(f)
339
+ if f.suffix == ".vtt" and not vtt_final:
340
+ vtt_final = str(f)
341
+
342
  try:
343
  shutil.rmtree(task_tmp, ignore_errors=True)
344
  except Exception:
 
368
  _semaphore.release()
369
 
370
 
371
+ # ═══════════════════════════════════════════════════════════════════════════
372
  # Callbacks
373
+ # ═══════════════════════════════════════════════════════════════════════════
374
+
375
+ def submit_task(
376
+ video_file, mode, sensitivity, language, output_format,
377
+ scene_detector, speech_segmenter,
378
+ qwen_generator, qwen_model_id, qwen_mode,
379
+ transformers_backend, hf_model_id,
380
+ ) -> tuple:
381
  if video_file is None:
382
  return (
383
+ gr.update(value="Please upload a video or audio file first."),
384
+ _render_monitor(), _render_history(), None, _get_completed_filenames(),
 
 
 
385
  )
386
 
387
  if not _semaphore.acquire(blocking=False):
388
  return (
389
+ gr.update(value="Another task is processing. Please wait."),
390
+ _render_monitor(), _render_history(), None, _get_completed_filenames(),
 
 
 
391
  )
392
 
393
  tid = uuid.uuid4().hex[:12]
394
 
 
395
  if isinstance(video_file, str):
396
  src_path = video_file
397
  elif isinstance(video_file, dict):
 
399
  else:
400
  src_path = getattr(video_file, "name", "")
401
 
402
+ if not src_path or not os.path.isfile(src_path):
403
+ _semaphore.release()
404
+ return (
405
+ gr.update(value="Upload failed β€” could not read file path."),
406
+ _render_monitor(), _render_history(), None, _get_completed_filenames(),
407
+ )
 
 
 
408
 
409
  fname = Path(src_path).name
410
 
 
411
  file_size_mb = os.path.getsize(src_path) / (1024 * 1024)
412
  size_warning = ""
413
  if file_size_mb > 2048:
414
+ size_warning = f" (Warning: {file_size_mb:.0f} MB β€” may fail on 16 GB RAM)"
 
 
 
415
 
 
416
  persistent = UPLOAD_DIR / f"{tid}_{fname}"
417
  shutil.copy2(src_path, persistent)
418
 
419
+ # Build config dict for the pipeline factory
420
+ config = {
421
+ "language": language,
422
+ "sensitivity": sensitivity,
423
+ "output_format": output_format,
424
+ "scene_detector": scene_detector,
425
+ "speech_segmenter": speech_segmenter,
426
+ "qwen_generator": qwen_generator,
427
+ "qwen_model_id": qwen_model_id or None,
428
+ "qwen_mode": qwen_mode,
429
+ "transformers_backend": transformers_backend,
430
+ "hf_model_id": hf_model_id,
431
+ }
432
+ # Remove None values
433
+ config = {k: v for k, v in config.items() if v is not None}
434
+
435
+ pipeline_label = mode
436
+ if mode == "qwen":
437
+ pipeline_label = f"qwen ({qwen_generator})"
438
+ elif mode == "transformers":
439
+ pipeline_label = f"transformers ({transformers_backend})"
440
+
441
  with _lock:
442
  _tasks[tid] = {
443
  "id": tid,
444
  "filename": fname,
445
  "status": "queued",
446
+ "pipeline": pipeline_label,
447
+ "config": config,
448
  "created_at": datetime.now(timezone.utc).isoformat(),
449
  "completed_at": "",
450
  "output_srt": "",
 
454
  }
455
  _save()
456
 
457
+ threading.Thread(target=_run_transcription, args=(tid, str(persistent)), daemon=True).start()
 
 
 
 
458
 
459
  return (
460
+ gr.update(value=f"Submitted: {fname} (ID: `{tid}`){size_warning}"),
461
+ _render_monitor(), _render_history(), None, _get_completed_filenames(),
 
 
 
462
  )
463
 
464
 
465
+ # ── HTML renderers ────────────────────────────────────────────────────────
466
 
467
  _STATUS_COLORS = {
468
+ "queued": "#f0ad4e", "running": "#5bc0de", "completed": "#5cb85c",
469
+ "failed": "#d9534f", "interrupted": "#999",
 
 
 
470
  }
471
  _STATUS_ICONS = {
472
+ "queued": "&#9201;", "running": "&#128260;", "completed": "&#9989;",
473
+ "failed": "&#10060;", "interrupted": "&#9208;",
 
 
 
474
  }
475
 
476
+ _CSS = """<style>
477
+ .tr { font-family:'SF Mono','Consolas',monospace; font-size:12px; }
478
+ .tr-card { border:1px solid #e0e0e0; margin:4px 0; padding:8px 12px; border-radius:6px; background:#fafafa; }
 
 
 
 
479
  .tr-card .head { display:flex; justify-content:space-between; align-items:flex-start; }
480
+ .tr-card .meta { color:#666; margin-top:3px; font-size:11px; }
481
+ .hist-table { width:100%; border-collapse:collapse; font-size:12px; }
482
+ .hist-table th { background:#2c3e50; color:#fff; padding:8px; text-align:left; }
483
+ .hist-table td { padding:6px 8px; border-bottom:1px solid #ddd; }
484
+ .hist-table tr:hover { background:#f0f0f0; }
485
+ </style>"""
 
 
 
 
 
 
 
 
486
 
487
 
488
  def _render_monitor() -> str:
 
489
  with _lock:
490
  items = list(_tasks.values())
491
  if not items:
492
  return _CSS + "<div style='text-align:center;padding:24px;color:#999;'>No tasks yet. Upload a file to start.</div>"
 
493
  items.sort(key=lambda t: str(t.get("created_at", "")), reverse=True)
494
  html = _CSS + '<div class="tr">'
495
  for t in items[:40]:
496
  st = t.get("status", "unknown")
497
  color = _STATUS_COLORS.get(st, "#999")
498
  icon = _STATUS_ICONS.get(st, "?")
499
+ cfg = t.get("config", {})
500
+ extra = ""
501
+ if cfg.get("scene_detector") and cfg["scene_detector"] != "none":
502
+ extra += f" | scene: {cfg['scene_detector']}"
503
+ if cfg.get("speech_segmenter") and cfg["speech_segmenter"] != "none":
504
+ extra += f" | vad: {cfg['speech_segmenter']}"
505
+
506
+ html += f"""<div class="tr-card" style="border-left:4px solid {color};">
507
  <div class="head">
508
  <strong>{icon} {t.get('filename','?')[:55]}</strong>
509
  <span style="color:{color};font-weight:700;white-space:nowrap;">{st.upper()}</span>
510
  </div>
511
  <div class="meta">
512
+ ID: {t.get('id','?')} | {t.get('pipeline','')}{extra} | {str(t.get('created_at',''))[:19]}
 
513
  </div>"""
514
  if st == "completed":
515
  html += f'<div class="meta" style="color:#28a745;">Completed in {t.get("duration_seconds",0)}s</div>'
 
517
  err = str(t.get("error", ""))[:250].replace("<", "&lt;").replace(">", "&gt;")
518
  html += f'<div class="meta" style="color:#d9534f;">{err}</div>'
519
  html += "</div>"
 
520
  html += "</div>"
521
  return html
522
 
523
 
524
  def _render_history() -> str:
 
525
  with _lock:
526
  completed = [t for t in _tasks.values() if t.get("status") == "completed"]
527
  if not completed:
528
  return _CSS + "<div style='text-align:center;padding:24px;color:#999;'>No completed tasks yet.</div>"
 
529
  completed.sort(key=lambda t: str(t.get("completed_at", "")), reverse=True)
530
  html = _CSS + '<table class="hist-table"><thead><tr>'
531
+ html += "<th>File</th><th>Pipeline</th><th>Duration</th><th>Completed</th>"
532
  html += "</tr></thead><tbody>"
 
533
  for t in completed[:MAX_OUTPUT_FILES]:
534
  ca = str(t.get("completed_at", ""))[:19]
535
+ html += f"<tr><td>{t.get('filename','')[:45]}</td><td>{t.get('pipeline','')}</td><td>{t.get('duration_seconds',0)}s</td><td>{ca}</td></tr>"
 
536
  html += "</tbody></table>"
537
  return html
538
 
539
 
540
  def _get_latest_srt() -> Optional[str]:
 
541
  with _lock:
542
  completed = sorted(
543
  [t for t in _tasks.values() if t.get("status") == "completed"],
544
+ key=lambda t: str(t.get("completed_at", "")), reverse=True,
 
545
  )
546
  if not completed:
547
  return None
548
  srt = completed[0].get("output_srt", "")
549
+ return srt if (srt and os.path.isfile(srt)) else None
 
 
550
 
551
 
552
  def _get_task_file(task_filename: str) -> Optional[str]:
553
+ if not task_filename:
554
+ return None
555
  with _lock:
556
  for t in _tasks.values():
557
  if t.get("filename") == task_filename and t.get("status") == "completed":
558
  srt = t.get("output_srt", "")
559
+ return srt if (srt and os.path.isfile(srt)) else None
 
560
  return None
561
 
562
 
563
  def _get_completed_filenames() -> List[str]:
 
564
  with _lock:
565
  completed = sorted(
566
  [t for t in _tasks.values() if t.get("status") == "completed"],
567
+ key=lambda t: str(t.get("completed_at", "")), reverse=True,
 
568
  )
569
  return [t.get("filename", "?") for t in completed]
570
 
571
 
572
  def _auto_refresh() -> tuple:
 
573
  latest = _get_latest_srt()
574
+ return _render_monitor(), _render_history(), latest if latest else None, _get_completed_filenames()
575
+
576
+
577
+ def _update_pipeline_info(mode: str) -> str:
578
+ return PIPELINE_INFO.get(mode, "")
579
+
580
+
581
+ def _on_mode_change(mode: str) -> tuple:
582
+ """Show/hide advanced options based on selected pipeline mode."""
583
+ show_qwen = mode == "qwen"
584
+ show_transformers = mode == "transformers"
585
+ show_legacy = mode in ("balanced", "fidelity", "fast")
586
+ show_scene = mode != "faster"
587
+ show_vad = mode in ("balanced", "fidelity", "qwen", "anime")
588
+
589
  return (
590
+ gr.update(visible=show_scene),
591
+ gr.update(visible=show_vad),
592
+ gr.update(visible=show_qwen),
593
+ gr.update(visible=show_transformers),
594
  )
595
 
596
 
597
+ # ═══════════════════════════════════════════════════════════════════════════
598
  # Gradio UI
599
+ # ═══════════════════════════════════════════════════════════════════════════
600
 
601
  _FOOTER = """
602
  <div style="position:fixed;bottom:0;left:0;right:0;padding:6px;
603
  background:#f8f8f8;text-align:center;font-size:11px;color:#888;
604
  border-top:1px solid #e0e0e0;">
605
  WhisperJAV &copy; <a href="https://github.com/meizhong986/WhisperJAV" target="_blank">meizhong986</a>
606
+ &nbsp;|&nbsp; Full pipeline port &nbsp;|&nbsp; CPU-only &nbsp;|&nbsp; Free HuggingFace Space
 
607
  </div>
608
  """
609
 
610
+ RECOMMENDATIONS = """
611
+ | Content Type | Pipeline | Sensitivity |
612
+ |---|---|---|
613
+ | Anime / JAV Dialogue | **anime** | aggressive |
614
+ | Drama / Dialogue Heavy | **balanced** | aggressive |
615
+ | Group Scenes | **faster** | conservative |
616
+ | Amateur / Homemade | **fast** | conservative |
617
+ | ASMR / Whisper | **fidelity** | aggressive |
618
+ | Maximum Accuracy | **qwen** | balanced |
619
+ """
620
+
621
 
622
  def build_ui() -> gr.Blocks:
623
  with gr.Blocks(
 
626
  css="""
627
  footer { visibility: hidden }
628
  .app-footer { position: fixed; bottom: 0; left: 0; right: 0; z-index: 100; }
629
+ .info-box { padding: 10px; background: #f0f7ff; border-radius: 6px; font-size: 13px; margin-bottom: 8px; }
630
  """,
631
  ) as demo:
632
 
 
633
  gr.Markdown("""
634
  # WhisperJAV β€” Japanese Subtitle Generator
635
 
636
+ Complete port with **7 pipeline modes** powered by Whisper, Qwen3-ASR,
637
+ anime-whisper, Kotoba, and ChronosJAV. Runs entirely on **CPU** (free tier).
638
+ First request downloads the model (~1–4 GB) β€” please be patient.
 
 
639
  """)
640
 
641
  with gr.Tabs():
642
+ # ── Tab 1: New Transcription ──────────────────────────────
643
  with gr.Tab("New Transcription"):
644
  with gr.Row():
645
+ # Left column: file upload + pipeline select
646
  with gr.Column(scale=2):
647
  upload = gr.File(
648
  label="Upload Video or Audio",
649
  file_types=["video", "audio"],
650
  file_count="single",
651
  )
652
+
653
+ with gr.Row():
654
+ with gr.Column(scale=1):
655
+ mode_select = gr.Dropdown(
656
+ label="Pipeline Mode",
657
+ choices=PIPELINE_MODES,
658
+ value="anime",
659
+ interactive=True,
660
+ )
661
+ with gr.Column(scale=1):
662
+ sensitivity_select = gr.Dropdown(
663
+ label="Sensitivity",
664
+ choices=SENSITIVITY_OPTIONS,
665
+ value="balanced",
666
+ interactive=True,
667
+ )
668
+
669
+ with gr.Row():
670
+ with gr.Column(scale=1):
671
+ language_select = gr.Dropdown(
672
+ label="Language",
673
+ choices=LANGUAGE_OPTIONS,
674
+ value="Japanese",
675
+ interactive=True,
676
+ )
677
+ with gr.Column(scale=1):
678
+ format_select = gr.Dropdown(
679
+ label="Output Format",
680
+ choices=OUTPUT_FORMATS,
681
+ value="srt",
682
+ interactive=True,
683
+ )
684
+
685
+ pipeline_info = gr.Markdown(
686
+ PIPELINE_INFO["anime"],
687
+ elem_classes=["info-box"],
688
  )
689
 
690
+ # Right column: status + downloads
691
  with gr.Column(scale=1):
692
  status = gr.Textbox(
693
  label="Status",
694
  value="Ready. Upload a file to begin.",
695
  interactive=False,
696
+ lines=2,
697
  )
698
  latest_download = gr.File(
699
  label="Latest Subtitle",
700
  interactive=False,
 
701
  )
702
 
703
+ # ── Advanced Options (collapsible) ────────────────────
704
+ with gr.Accordion("Advanced Options", open=False):
705
+ with gr.Row():
706
+ with gr.Column(scale=1):
707
+ scene_detector_select = gr.Dropdown(
708
+ label="Scene Detection",
709
+ choices=SCENE_DETECTORS,
710
+ value="semantic",
711
+ interactive=True,
712
+ )
713
+ with gr.Column(scale=1):
714
+ speech_segmenter_select = gr.Dropdown(
715
+ label="Speech Segmenter (VAD)",
716
+ choices=SPEECH_SEGMENTERS,
717
+ value="whisperseg",
718
+ interactive=True,
719
+ )
720
+
721
+ # Qwen-specific options
722
+ with gr.Group(visible=False) as qwen_group:
723
+ gr.Markdown("**Qwen Pipeline Options**")
724
+ with gr.Row():
725
+ with gr.Column(scale=1):
726
+ qwen_generator_select = gr.Dropdown(
727
+ label="Generator Backend",
728
+ choices=QWEEN_GENERATORS,
729
+ value="qwen3",
730
+ interactive=True,
731
+ )
732
+ with gr.Column(scale=1):
733
+ qwen_mode_select = gr.Dropdown(
734
+ label="Input Mode",
735
+ choices=QWEEN_MODES,
736
+ value="assembly",
737
+ interactive=True,
738
+ )
739
+ qwen_model_id_text = gr.Textbox(
740
+ label="Model ID (leave blank for default)",
741
+ placeholder="Qwen/Qwen3-ASR-1.7B",
742
+ interactive=True,
743
+ )
744
+
745
+ # Transformers-specific options
746
+ with gr.Group(visible=False) as transformers_group:
747
+ gr.Markdown("**Transformers Pipeline Options**")
748
+ with gr.Row():
749
+ with gr.Column(scale=1):
750
+ transformers_backend_select = gr.Dropdown(
751
+ label="ASR Backend",
752
+ choices=TRANSFORMERS_BACKENDS,
753
+ value="hf",
754
+ interactive=True,
755
+ )
756
+ with gr.Column(scale=1):
757
+ hf_model_id_select = gr.Dropdown(
758
+ label="HF Model",
759
+ choices=TRANSFORMERS_MODELS,
760
+ value=TRANSFORMERS_MODELS[0],
761
+ interactive=True,
762
+ allow_custom_value=True,
763
+ )
764
+
765
+ submit_btn = gr.Button("Start Transcription", variant="primary", size="lg")
766
+
767
+ gr.Markdown("---")
768
+ gr.Markdown("### Content-Specific Recommendations")
769
+ gr.Markdown(RECOMMENDATIONS)
770
  gr.Markdown("---")
771
  gr.Markdown("### Task Monitor (auto-refreshes every 8 s)")
772
  monitor_html = gr.HTML(value=_render_monitor())
 
775
  with gr.Tab("Download History"):
776
  gr.Markdown("Pick a completed task, then download its subtitle file.")
777
  with gr.Row():
778
+ hist_dropdown = gr.Dropdown(
779
+ label="Select Completed Task",
780
+ choices=_get_completed_filenames(),
781
+ interactive=True,
782
+ )
783
+ hist_download = gr.File(label="Subtitle File", interactive=False)
 
 
 
 
 
784
  gr.Markdown("---")
785
  history_html = gr.HTML(value=_render_history())
786
 
787
  # ── Footer ──
788
  gr.HTML(_FOOTER, elem_classes=["app-footer"])
789
 
790
+ # ══════════════════════════════════════════════════════════════
791
+ # Events
792
+ # ══════════════════════════════════════════════════════════════
793
+
794
+ mode_select.change(
795
+ fn=_update_pipeline_info,
796
+ inputs=[mode_select],
797
+ outputs=[pipeline_info],
798
+ )
799
+
800
+ mode_select.change(
801
+ fn=_on_mode_change,
802
+ inputs=[mode_select],
803
+ outputs=[scene_detector_select, speech_segmenter_select, qwen_group, transformers_group],
804
+ )
805
+
806
  submit_btn.click(
807
  fn=submit_task,
808
+ inputs=[
809
+ upload, mode_select, sensitivity_select, language_select, format_select,
810
+ scene_detector_select, speech_segmenter_select,
811
+ qwen_generator_select, qwen_model_id_text, qwen_mode_select,
812
+ transformers_backend_select, hf_model_id_select,
813
+ ],
814
  outputs=[status, monitor_html, history_html, latest_download, hist_dropdown],
815
  )
816
 
 
820
  outputs=[hist_download],
821
  )
822
 
 
823
  timer = gr.Timer(8, active=True)
824
  timer.tick(fn=_auto_refresh, outputs=[monitor_html, history_html, latest_download, hist_dropdown])
825
 
826
  return demo
827
 
828
 
829
+ # ═══════════════════════════════════════════════════════════════════════════
830
  # Entry Point
831
+ # ═══════════════════════════════════════════════════════════════════════════
832
 
833
  if __name__ == "__main__":
834
  _load()
835
  _prune_old_outputs()
836
 
837
  app = build_ui()
838
+ app.queue(max_size=10, default_concurrency_limit=5).launch()