Add IA-local-auto-cut: cortes automáticos de vídeo com Whisper + Ollama

Files changed (8) hide show

.gitignore +56 -0
LICENSE +12 -0
README.md +118 -0
generate_post_texts_from_cuts.py +174 -0
interview_cuts.py +350 -0
persona.example.json +7 -0
requirements.txt +12 -0
video_cuts_offline_mac_plus_subs.py +668 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,56 @@

+# Dados sensíveis e gerados
+*.mp4
+*.mov
+*.avi
+*.mkv
+*.webm
+*.wav
+*.mp3
+*_transcript.json
+*_cuts.json
+*_interview_cuts.json
+*_cuts.sh
+*_posts.txt
+*_audio.wav
+*_preview_parts/
+export_parts/
+PREVIEW_*.mp4
+# Persona (dados pessoais - use persona.example.json como template)
+persona.json
+# Ambiente e secrets
+.env
+.env.*
+*.pem
+*.key
+secrets/
+credentials/
+# Python
+__pycache__/
+*.py[cod]
+*$py.class
+*.so
+.Python
+.venv/
+venv/
+ENV/
+.eggs/
+*.egg-info/
+*.egg
+# IDE e OS
+.DS_Store
+.idea/
+.vscode/
+*.code-workspace
+*.swp
+*.swo
+*~
+# Logs e temporários
+*.log
+tmp/
+temp/
+.cache/

LICENSE ADDED Viewed

	@@ -0,0 +1,12 @@

+Creative Commons Attribution 4.0 International (CC BY 4.0)
+Copyright (c) 2025 travahacker
+You are free to:
+  Share — copy and redistribute the material in any medium or format
+  Adapt — remix, transform, and build upon the material for any purpose, even commercially
+Under the following terms:
+  Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
+Full legal code: https://creativecommons.org/licenses/by/4.0/legalcode

README.md ADDED Viewed

	@@ -0,0 +1,118 @@

+---
+license: cc-by-4.0
+language:
+- pt
+tags:
+- video
+- transcription
+- whisper
+- ollama
+- ffmpeg
+- auto-cut
+---
+# IA-local-transcript-autocut
+Ferramenta para **cortes automáticos de vídeo** usando transcrição local (Whisper) e IA local (Ollama). Tudo roda na sua máquina, sem enviar dados para a nuvem.
+> Cortes automáticos de vídeo com Whisper + Ollama. Transcreve, propõe cortes com IA local e exporta em MP4.
+**Repositório:** [github.com/travahacker/IA-local-auto-cut](https://github.com/travahacker/IA-local-auto-cut)
+## Fluxo
+1. **Transcreve** o áudio do vídeo com [faster-whisper](https://github.com/SYSTRAN/faster-whisper)
+2. **Propõe cortes** via Ollama (IA local) ou heurísticas (modo React, entrevistas)
+3. **Gera scripts ffmpeg** para exportar os cortes em MP4
+## Pré-requisitos
+- **Python 3.9+**
+- **ffmpeg** (instalado no sistema)
+- **Ollama** (opcional, para propostas de cortes com IA — [ollama.ai](https://ollama.ai))
+## Instalação
+```bash
+git clone https://huggingface.co/Veronyka/IA-local-transcript-autocut
+cd IA-local-transcript-autocut
+pip install -r requirements.txt
+```
+## Uso
+### 1. Transcrição + cortes com IA (Ollama)
+```bash
+# Transcreve, propõe cortes com Ollama e gera script de export
+python video_cuts_offline_mac_plus_subs.py seu_video.mp4 --preview
+# Só transcrever (salva transcript.json)
+python video_cuts_offline_mac_plus_subs.py seu_video.mp4 --only-transcribe
+# Reusar transcrição existente
+python video_cuts_offline_mac_plus_subs.py seu_video.mp4 --only-propose --reuse-transcript
+```
+### 2. Modo React (comentários em PT com lead-in em EN)
+Para vídeos de reação onde você comenta em português sobre conteúdo em inglês:
+```bash
+python video_cuts_offline_mac_plus_subs.py video.mp4 --react-mode --preview
+```
+### 3. Cortes para entrevistas (pergunta + resposta)
+Gera cortes de perguntas curtas seguidas de respostas longas:
+```bash
+# Primeiro: transcrever
+python video_cuts_offline_mac_plus_subs.py entrevista.mp4 --only-transcribe
+# Depois: gerar cortes de entrevista
+python interview_cuts.py entrevista.mp4 --min 60 --max 150 --preview
+```
+### 4. Títulos e descrições para redes sociais
+```bash
+python generate_post_texts_from_cuts.py base_do_video
+# Com IA local (Ollama) para copy mais criativo
+python generate_post_texts_from_cuts.py base_do_video --ollama-model llama3.1:8b
+```
+## Persona (opcional)
+Para alinhar os cortes com seu perfil de criador(a), crie um `persona.json` a partir do exemplo:
+```bash
+cp persona.example.json persona.json
+# Edite persona.json com sua bio, pilares, tom, etc.
+```
+Use com `--persona persona.json` no script principal.
+## Opções principais
+| Flag | Descrição |
+|------|-----------|
+| `--lang pt` | Forçar idioma da transcrição |
+| `--whisper-model small` | Modelo Whisper (tiny, base, small, medium, large) |
+| `--model llama2` | Modelo Ollama para propor cortes |
+| `--max-stories 8` | Número máximo de cortes |
+| `--max-length 60` | Duração máxima por corte (segundos) |
+| `--preview` | Gera vídeo de prévia com todos os cortes |
+| `--persona arquivo.json` | Arquivo com contexto de persona |
+## Arquivos gerados
+- `*_transcript.json` — Transcrição com timestamps
+- `*_cuts.json` — Metadados dos cortes propostos
+- `*_cuts.sh` — Script bash para exportar os MP4s
+- `export_parts/` — Pasta com os cortes em MP4
+## Licença
+[CC BY 4.0](https://creativecommons.org/licenses/by/4.0/) — Creative Commons Attribution 4.0 International. Uso livre com atribuição.

generate_post_texts_from_cuts.py ADDED Viewed

	@@ -0,0 +1,174 @@

+#!/usr/bin/env python3
+"""
+Gera títulos e descrições para redes sociais a partir dos cortes (transcript + cuts).
+Uso:
+  python generate_post_texts_from_cuts.py <base> [--persona "sua persona"] [--hashtags #tag1 #tag2]
+  python generate_post_texts_from_cuts.py <base> --ollama-model llama3.1:8b  # usa IA local
+"""
+import argparse, json, os, re, requests
+from pathlib import Path
+from typing import List, Dict, Any
+def load_json(path):
+    with open(path, "r", encoding="utf-8") as f:
+        return json.load(f)
+def cap(s: str, n: int) -> str:
+    s = s.strip()
+    return (s[:n-1] + "…") if len(s) > n else s
+def normalize_whitespace(s: str) -> str:
+    return re.sub(r"\s+", " ", s).strip()
+def overlap(a1, a2, b1, b2):
+    return max(0.0, min(a2, b2) - max(a1, b1))
+def collect_text_for_segments(transcript: List[Dict[str, Any]], segments: List[Dict[str, float]]) -> str:
+    buf = []
+    for seg in segments:
+        s, e = float(seg["start"]), float(seg["end"])
+        for t in transcript:
+            ts, te = float(t["start"]), float(t["end"])
+            if overlap(s, e, ts, te) > 0.01:
+                buf.append(t.get("text","").strip())
+    txt = " ".join(x for x in buf if x)
+    return normalize_whitespace(txt)
+def first_sentence(s: str, max_len=140) -> str:
+    s = normalize_whitespace(s)
+    m = re.split(r"(?<=[\.\!\?])\s+", s)
+    cand = (m[0] if m else s) or s
+    return cap(cand, max_len)
+def build_titles_and_descs(text: str, persona: str, hashtags: List[str],
+                           yt_len=70, ig_len=140, tt_len=120,
+                           max_ig_tags=5, max_tt_tags=8) -> Dict[str,str]:
+    txt = text or ""
+    title = cap(first_sentence(txt, yt_len), yt_len)
+    core_ig = first_sentence(txt, ig_len)
+    ig = f"{core_ig}\nAssiste até o fim e comenta 👇"
+    tags_ig = " ".join(hashtags[:max_ig_tags]) if hashtags else ""
+    if tags_ig:
+        ig = f"{ig}\n{tags_ig}"
+    core_tt = first_sentence(txt, tt_len)
+    tt = f"{core_tt}\nCurte e segue p/ mais 🔔"
+    tags_tt = " ".join(hashtags[:max_tt_tags]) if hashtags else ""
+    if tags_tt:
+        tt = f"{tt}\n{tags_tt}"
+    return {"yt_title": title, "ig_desc": ig.strip(), "tt_desc": tt.strip()}
+def call_ollama(model: str, prompt: str, url: str) -> str:
+    payload = {
+        "model": model,
+        "prompt": prompt,
+        "temperature": 0.4,
+        "stream": False,
+        "format": "json",
+        "options": {"num_ctx": 8192, "num_predict": 384}
+    }
+    r = requests.post(url.rstrip("/") + "/api/generate", json=payload, timeout=120)
+    r.raise_for_status()
+    return r.json().get("response", "")
+def _coerce_json(raw: str) -> Dict[str, str]:
+    txt = (raw or "").strip()
+    try:
+        return json.loads(txt)
+    except Exception:
+        pass
+    m = re.search(r"\{[\s\S]*\}", txt)
+    if not m:
+        raise ValueError("no-json-object")
+    jtxt = m.group(0)
+    jtxt = jtxt.replace("\u201c", '"').replace("\u201d", '"').replace("\u2018", "'").replace("\u2019", "'")
+    jtxt = re.sub(r",\s*(\}|\])", r"\1", jtxt)
+    if '"' not in jtxt and "'" in jtxt:
+        jtxt = jtxt.replace("'", '"')
+    return json.loads(jtxt)
+def with_ollama(text: str, persona: str, hashtags: List[str], model: str, server_url: str) -> Dict[str,str]:
+    prompt = f'''
+Responda ESTRITAMENTE em JSON válido (sem texto extra, sem markdown, sem explicações).
+Gere campos:
+- yt_title: string (<= 70 chars, chamativo, sem hashtags)
+- ig_desc: string (≈120–150 chars, termina com linha de hashtags IG)
+- tt_desc: string (≈100–140 chars, termina com linha de hashtags TikTok)
+PERSONA: {persona or '-'}
+HASHTAGS_IG: {' '.join(hashtags[:5])}
+HASHTAGS_TT: {' '.join(hashtags[:8])}
+TEXTO_DO_CORTE (transcrição bruta, use para inspirar o copy):
+"""{text.strip()[:2000]}"""
+Retorne APENAS um objeto JSON com exatamente estas chaves:
+{{
+  "yt_title": "...",
+  "ig_desc": "...\n{' '.join(hashtags[:5])}",
+  "tt_desc": "...\n{' '.join(hashtags[:8])}"
+}}
+'''
+    try:
+        raw = call_ollama(model, prompt, server_url)
+        data = _coerce_json(raw)
+        data["yt_title"] = cap(data.get("yt_title",""), 70)
+        data["ig_desc"] = cap(data.get("ig_desc",""), 300)
+        data["tt_desc"] = cap(data.get("tt_desc",""), 220)
+        return data
+    except Exception as e:
+        print(f"[warn] Ollama retornou JSON inválido: {e}. Usando heurística.")
+    return build_titles_and_descs(text, persona, hashtags)
+def main():
+    ap = argparse.ArgumentParser("Gera títulos/descrições para redes a partir dos cortes.")
+    ap.add_argument("base", help="Base do arquivo (ex.: 'meu_video' sem sufixos)")
+    ap.add_argument("--persona", default="criador(a) de conteúdo",
+                    help="Breve dica de persona para compor textos")
+    ap.add_argument("--hashtags", nargs="*", default=["#criacaodeconteudo","#video","#shorts"],
+                    help="Hashtags prioritárias")
+    ap.add_argument("--ollama-model", default="", help="Modelo Ollama para copy (ex.: llama3.1:8b)")
+    ap.add_argument("--ollama-url", default="http://localhost:11434", help="URL do Ollama")
+    ap.add_argument("--out", default="", help="Arquivo de saída (default: <base>_posts.txt)")
+    args = ap.parse_args()
+    base = args.base
+    cuts_path = f"{base}_cuts.json"
+    transcript_path = f"{base}_transcript.json"
+    if not os.path.exists(cuts_path) or not os.path.exists(transcript_path):
+        print(f"ERRO: não achei '{cuts_path}' ou '{transcript_path}'. Rode na pasta correta.")
+        raise SystemExit(1)
+    cuts = load_json(cuts_path)
+    transcript = load_json(transcript_path)
+    out_path = args.out or f"{base}_posts.txt"
+    lines = []
+    for i, c in enumerate(cuts, 1):
+        segs = c.get("segments") or []
+        if not segs and "start" in c and "end" in c:
+            segs = [{"start": c["start"], "end": c["end"]}]
+        text = collect_text_for_segments(transcript, segs)
+        if args.ollama_model:
+            results = with_ollama(text, args.persona, args.hashtags, args.ollama_model, args.ollama_url)
+        else:
+            results = build_titles_and_descs(text, args.persona, args.hashtags)
+        lines.append(f"Corte {i}")
+        lines.append("YouTube Shorts — Título:")
+        lines.append("👉 " + results["yt_title"])
+        lines.append("")
+        lines.append("Instagram Reels — Descrição:")
+        lines.append(results["ig_desc"])
+        lines.append("")
+        lines.append("TikTok — Descrição:")
+        lines.append(results["tt_desc"])
+        lines.append("\n" + "-"*60 + "\n")
+    Path(out_path).write_text("\n".join(lines).rstrip()+"\n", encoding="utf-8")
+    print(f"✅ Gerado: {out_path}")
+if __name__ == "__main__":
+    main()

interview_cuts.py ADDED Viewed

	@@ -0,0 +1,350 @@

+#!/usr/bin/env python3
+"""
+interview_cuts.py — Gera cortes para entrevistas em PT (pergunta curta + resposta longa).
+Uso típico:
+  python interview_cuts.py video.mp4 --min 60 --max 150 --qmax 12 --gap 2.0 --lead-in-question yes --max-cuts 20 --preview
+Pré-requisitos:
+  - Ter o arquivo <base>_transcript.json na mesma pasta do vídeo (gerado pelo video_cuts_offline_mac_plus_subs.py).
+Saídas:
+  - <base>_interview_cuts.json
+  - <base>_interview_cuts.sh
+  - PREVIEW_<base>_interview.mp4 (opcional)
+"""
+import argparse, json, os, re, shlex, subprocess, math
+from pathlib import Path
+from typing import List, Dict, Any
+try:
+    import numpy as np
+except Exception:
+    np = None
+try:
+    from resemblyzer import VoiceEncoder, preprocess_wav
+    _HAVE_RESEMBLYZER = True
+except Exception:
+    VoiceEncoder = None
+    preprocess_wav = None
+    _HAVE_RESEMBLYZER = False
+def load_json(p: Path):
+    with p.open("r", encoding="utf-8") as f:
+        return json.load(f)
+def save_json(obj, p: Path):
+    with p.open("w", encoding="utf-8") as f:
+        json.dump(obj, f, ensure_ascii=False, indent=2)
+def normspace(s: str) -> str:
+    return re.sub(r"\s+", " ", (s or "").strip())
+def first_sentence(s: str, limit=120) -> str:
+    s = normspace(s)
+    parts = re.split(r"(?<=[.!?])\s+", s)
+    out = parts[0] if parts and parts[0] else s
+    return out[:limit].rstrip()
+def ensure_wav_16k_mono(video_path: Path) -> Path:
+    """Export a temporary 16k mono wav next to the video if not present."""
+    wav_path = video_path.with_suffix(".16k.wav")
+    if wav_path.exists():
+        return wav_path
+    cmd = [
+        "ffmpeg", "-y",
+        "-i", str(video_path),
+        "-ac", "1", "-ar", "16000",
+        str(wav_path)
+    ]
+    subprocess.run(cmd, check=True)
+    return wav_path
+def diarize_with_resemblyzer(wav_path: Path, n_speakers: int = 2, debug: bool = False):
+    """Lightweight diarization using Resemblyzer."""
+    if not _HAVE_RESEMBLYZER or np is None:
+        raise RuntimeError("pip install resemblyzer numpy scikit-learn soundfile")
+    try:
+        from sklearn.cluster import AgglomerativeClustering
+    except Exception:
+        raise RuntimeError("pip install scikit-learn")
+    wav = preprocess_wav(str(wav_path))
+    enc = VoiceEncoder()
+    _, partial_embeds, partial_slices = enc.embed_utterance(wav, return_partials=True)
+    sr = 16000.0
+    duration = float(len(wav)) / sr if len(wav) > 0 else 0.0
+    if len(partial_embeds) == 0 or duration <= 0.0:
+        return []
+    half = 0.8
+    n_parts = len(partial_embeds)
+    partial_times = np.array([duration/2.0], dtype=float) if duration <= 2*half else np.linspace(half, duration - half, n_parts)
+    n_samples = len(partial_embeds)
+    if n_samples < 2:
+        return []
+    X = np.vstack(partial_embeds)
+    n_speakers = max(2, int(n_speakers))
+    n_clusters = max(2, min(n_speakers, X.shape[0]))
+    labels = AgglomerativeClustering(n_clusters=n_clusters).fit_predict(X)
+    segs = []
+    cur_spk = int(labels[0])
+    cur_start = max(0.0, float(partial_times[0] - half))
+    cur_end = float(partial_times[0] + half)
+    for i in range(1, len(labels)):
+        spk = int(labels[i])
+        st = float(partial_times[i] - half)
+        en = float(partial_times[i] + half)
+        if spk == cur_spk and st <= cur_end + 0.1:
+            cur_end = max(cur_end, en)
+        else:
+            segs.append({"start": round(max(0.0, cur_start), 3), "end": round(max(cur_end, cur_start+0.1), 3), "spk": cur_spk})
+            cur_spk = spk
+            cur_start = st
+            cur_end = en
+    segs.append({"start": round(max(0.0, cur_start), 3), "end": round(max(cur_end, cur_start+0.1), 3), "spk": cur_spk})
+    return segs
+def assign_speakers_to_transcript(transcript: List[Dict[str, Any]], diar_segs: List[Dict[str, Any]]):
+    def spk_at(t: float):
+        for s in diar_segs:
+            if s["start"] - 0.1 <= t <= s["end"] + 0.1:
+                return s["spk"]
+        if diar_segs:
+            bydist = min(diar_segs, key=lambda s: abs((s["start"] + s["end"]) / 2 - t))
+            return bydist["spk"]
+        return -1
+    return [spk_at((float(seg.get("start",0)) + float(seg.get("end",0))) / 2.0) for seg in transcript]
+def detect_questions(transcript: List[Dict[str, Any]], qmax: float, wc_max: int, qmark_required: bool, debug: bool=False) -> List[int]:
+    idxs = []
+    for i, seg in enumerate(transcript):
+        st = float(seg.get("start", 0)); en = float(seg.get("end", 0)); d = max(0.0, en - st)
+        text = (seg.get("text") or "").strip()
+        wc = len(text.split())
+        has_qmark = text.endswith("?")
+        dur_ok = d <= qmax
+        wc_ok = wc <= wc_max and wc >= 2
+        is_q = (has_qmark or dur_ok) and wc_ok
+        if qmark_required:
+            is_q = has_qmark and wc_ok
+        if is_q:
+            idxs.append(i)
+    return idxs
+def build_interview_cuts(
+    transcript: List[Dict[str, Any]],
+    min_len: float,
+    max_len: float,
+    qmax: float,
+    gap: float,
+    lead_in_question: bool,
+    max_cuts: int,
+    wc_max: int = 35,
+    qmark_required: bool = False,
+    spk_labels: List[int] | None = None,
+    interviewer_id: int | None = None,
+    debug: bool = False,
+) -> List[Dict[str, Any]]:
+    if spk_labels is not None and interviewer_id is not None:
+        qs = set()
+        for i, seg in enumerate(transcript):
+            st = float(seg.get("start", 0)); en = float(seg.get("end", 0)); d = en - st
+            text = (seg.get("text") or "").strip()
+            wc = len(text.split())
+            has_q = text.endswith("?")
+            if spk_labels[i] == interviewer_id and wc <= wc_max and (d <= qmax or has_q):
+                qs.add(i)
+    else:
+        qs = set(detect_questions(transcript, qmax, wc_max, qmark_required, debug))
+    cuts = []
+    n = len(transcript)
+    i = 0
+    while i < n:
+        seg = transcript[i]
+        st = float(seg.get("start", 0)); en = float(seg.get("end", 0)); d = en - st
+        txt = normspace(seg.get("text", ""))
+        if not txt or d < 0.2:
+            i += 1; continue
+        if i in qs:
+            j = i + 1
+            resp_start = None
+            end_time = en
+            collected_text = []
+            segments = []
+            while j < n:
+                s2 = transcript[j]
+                st2 = float(s2.get("start", 0)); en2 = float(s2.get("end", 0)); d2 = en2 - st2
+                txt2 = normspace(s2.get("text", ""))
+                if j in qs:
+                    break
+                if d2 < 0.25:
+                    j += 1
+                    continue
+                if resp_start is not None and st2 - end_time > gap:
+                    break
+                if txt2:
+                    if resp_start is None:
+                        resp_start = st2
+                    segments.append({"start": st2, "end": en2})
+                    collected_text.append(txt2)
+                    end_time = en2
+                    if end_time - (resp_start if resp_start is not None else st) >= max_len:
+                        break
+                j += 1
+            if resp_start is not None:
+                start_cut = st if lead_in_question else resp_start
+                end_cut = end_time
+                dur = end_cut - start_cut
+                if dur >= min_len * 0.6:
+                    label = first_sentence(" ".join(collected_text), 70) or "Resposta marcante"
+                    hook = first_sentence(txt, 90) if lead_in_question else ""
+                    cuts.append({
+                        "start": round(start_cut, 3),
+                        "end": round(end_cut, 3),
+                        "label": label,
+                        "hook": hook,
+                        "reason": "Pergunta curta seguida de resposta longa",
+                        "segments": ([{"start": st, "end": en}] if lead_in_question else []) + segments
+                    })
+                    if len(cuts) >= max_cuts:
+                        break
+            i = max(i + 1, j)
+            continue
+        else:
+            j = i + 1
+            end_time = en
+            collected = [txt] if txt else []
+            segments = [{"start": st, "end": en}]
+            while j < n and float(transcript[j].get("start",0)) - end_time <= gap:
+                s2 = transcript[j]
+                st2 = float(s2.get("start", 0)); en2 = float(s2.get("end", 0))
+                t2 = normspace(s2.get("text", ""))
+                if en2 - st2 < 0.25:
+                    j += 1
+                    continue
+                if t2:
+                    segments.append({"start": st2, "end": en2})
+                    collected.append(t2)
+                    end_time = en2
+                    if end_time - st >= max_len:
+                        break
+                j += 1
+            dur = end_time - st
+            if dur >= min_len and collected:
+                cuts.append({
+                    "start": round(st, 3),
+                    "end": round(end_time, 3),
+                    "label": first_sentence(" ".join(collected), 70) or "Resposta destacada",
+                    "hook": "",
+                    "reason": "Resposta contínua em entrevista",
+                    "segments": segments
+                })
+                if len(cuts) >= max_cuts:
+                    break
+            i = j
+            continue
+    return cuts
+def write_shell_and_preview(video_path: Path, base: str, cuts: List[Dict[str, Any]], preview: bool):
+    out_dir = video_path.parent
+    sh_path = out_dir / f"{base}_interview_cuts.sh"
+    parts_dir = out_dir / "export_parts"
+    parts_dir.mkdir(exist_ok=True)
+    lines = ["#!/usr/bin/env bash", "set -e"]
+    for k, c in enumerate(cuts, 1):
+        ss = c["start"]; ee = c["end"]; dd = round(ee - ss, 3)
+        out_file = parts_dir / f"{base}_cut_{k:02}.mp4"
+        cmd = (
+            f"ffmpeg -hide_banner -loglevel warning -y -ss {ss} -i {shlex.quote(str(video_path))} -t {dd} "
+            f"-c:v libx264 -crf 22 -preset veryfast -vf scale=1080:-2:flags=bicubic -c:a aac -b:a 128k {shlex.quote(str(out_file))}"
+        )
+        lines.append(cmd)
+    if preview and cuts:
+        plist = out_dir / f"{base}_interview_preview_list.txt"
+        with plist.open("w", encoding="utf-8") as f:
+            for k in range(1, len(cuts)+1):
+                p = parts_dir / f"{base}_cut_{k:02}.mp4"
+                f.write(f"file {p.name}\n")
+        preview_path = out_dir / f"PREVIEW_{base}_interview.mp4"
+        lines.append(f"ffmpeg -hide_banner -loglevel warning -y -f concat -safe 0 -i {shlex.quote(str(plist))} -c copy {shlex.quote(str(preview_path))}")
+    sh_path.write_text("\n".join(lines) + "\n", encoding="utf-8")
+    os.chmod(sh_path, 0o755)
+    print(f"✅ Script de export: {sh_path}")
+def main():
+    ap = argparse.ArgumentParser("Cortes para entrevistas (pergunta curta + resposta longa)")
+    ap.add_argument("video", help="Arquivo de entrada (.mp4/.mov)")
+    ap.add_argument("--min", type=float, default=60.0, help="Duração mínima do corte em segundos")
+    ap.add_argument("--max", type=float, default=150.0, help="Duração máxima do corte em segundos")
+    ap.add_argument("--qmax", type=float, default=12.0, help="Máximo de duração para marcar perguntas")
+    ap.add_argument("--gap", type=float, default=2.0, help="Tolerância de gap entre segmentos")
+    ap.add_argument("--lead-in-question", choices=["yes","no"], default="yes", help="Incluir pergunta antes da resposta")
+    ap.add_argument("--max-cuts", type=int, default=20, help="Limite de cortes")
+    ap.add_argument("--preview", action="store_true", help="Gera comando de prévia por concat")
+    ap.add_argument("--q-wc-max", type=int, default=35, help="Máximo de palavras para considerar pergunta")
+    ap.add_argument("--qmark-required", action="store_true", help="Exigir '?' para marcar pergunta")
+    ap.add_argument("--diarize", action="store_true", help="Ativar diarização com Resemblyzer")
+    ap.add_argument("--n-speakers", type=int, default=2, help="Número de falantes para clusterizar")
+    ap.add_argument("--debug", action="store_true", help="Imprimir diagnóstico")
+    args = ap.parse_args()
+    video_path = Path(args.video).expanduser().resolve()
+    base = video_path.stem
+    transcript_path = video_path.with_name(f"{base}_transcript.json")
+    if not transcript_path.exists():
+        print(f"ERRO: não achei '{transcript_path.name}'. Gere a transcrição primeiro com video_cuts_offline_mac_plus_subs.py")
+        raise SystemExit(1)
+    transcript = load_json(transcript_path)
+    spk_labels = None
+    interviewer_id = None
+    if args.diarize:
+        try:
+            wav16k = ensure_wav_16k_mono(video_path)
+            diar = diarize_with_resemblyzer(wav16k, n_speakers=args.n_speakers, debug=args.debug)
+            if diar:
+                spk_labels = assign_speakers_to_transcript(transcript, diar)
+                totals = {}
+                for i, seg in enumerate(transcript):
+                    st = float(seg.get("start",0)); en = float(seg.get("end",0)); d = max(0.0, en-st)
+                    spk = spk_labels[i] if spk_labels and i < len(spk_labels) else -1
+                    totals[spk] = totals.get(spk, 0.0) + d
+                if totals:
+                    interviewer_id = sorted(totals.items(), key=lambda kv: kv[1])[0][0]
+        except Exception as e:
+            print(f"[warn] Diarização falhou: {e}. Seguindo sem diarização.")
+    cuts = build_interview_cuts(
+        transcript=transcript,
+        min_len=args.min,
+        max_len=args.max,
+        qmax=args.qmax,
+        gap=args.gap,
+        lead_in_question=(args.lead_in_question=="yes"),
+        max_cuts=args.max_cuts,
+        wc_max=args.q_wc_max,
+        qmark_required=args.qmark_required,
+        spk_labels=spk_labels,
+        interviewer_id=interviewer_id,
+        debug=args.debug,
+    )
+    out_json = video_path.with_name(f"{base}_interview_cuts.json")
+    save_json(cuts, out_json)
+    print(f"✅ Gerado: {out_json}")
+    write_shell_and_preview(video_path, base, cuts, preview=args.preview)
+if __name__ == "__main__":
+    main()

persona.example.json ADDED Viewed

	@@ -0,0 +1,7 @@

+{
+  "bio": "Sua bio ou descrição como criador(a) de conteúdo",
+  "pillars": ["tema1", "tema2", "tema3"],
+  "audience": "Seu público-alvo",
+  "tone": "Tom da sua comunicação (ex: direto, crítico, pedagógico)",
+  "redlines": ["o que evitar", "limites de conteúdo"]
+}

requirements.txt ADDED Viewed

	@@ -0,0 +1,12 @@

+# IA-local-auto-cut — Cortes automáticos de vídeo com transcrição local
+# pip install -r requirements.txt
+faster-whisper>=1.0.0
+langdetect>=1.0.9
+requests>=2.28.0
+# Opcional: para interview_cuts com diarização (identificar falantes)
+# resemblyzer>=0.1.1
+# numpy>=1.20.0
+# scikit-learn>=1.0.0
+# soundfile>=0.12.0

video_cuts_offline_mac_plus_subs.py ADDED Viewed

	@@ -0,0 +1,668 @@

+#!/usr/bin/env python3
+"""
+IA-local-auto-cut: Cortes automáticos de vídeo com transcrição local (Whisper) e IA (Ollama).
+Transcreve o áudio com faster-whisper, propõe cortes via Ollama ou heurísticas,
+e gera scripts ffmpeg para exportar os cortes.
+"""
+import os
+import sys
+import json
+import subprocess
+import argparse
+import re
+import requests
+from typing import List, Dict, Any, Optional
+from pathlib import Path
+from langdetect import detect, DetectorFactory
+DetectorFactory.seed = 1234
+import pathlib
+try:
+    import yaml  # optional; only used if provided
+except Exception:
+    yaml = None
+from faster_whisper import WhisperModel
+# --- Audio stream probing helper ---
+def probe_audio_streams(video_path: str) -> List[Dict[str, Any]]:
+    """Return list of audio streams with basic metadata using ffprobe."""
+    try:
+        cmd = [
+            "ffprobe", "-v", "error",
+            "-select_streams", "a",
+            "-show_entries", "stream=index,codec_name,channels,channel_layout:stream_tags=language",
+            "-of", "json",
+            video_path,
+        ]
+        out = subprocess.check_output(cmd)
+        info = json.loads(out.decode("utf-8", errors="ignore"))
+        streams = info.get("streams", [])
+        # normalize tag:language to just 'language'
+        for s in streams:
+            tags = s.get("tags") or {}
+            if "language" in tags and "language" not in s:
+                s["language"] = tags.get("language")
+        return streams
+    except Exception as e:
+        print(f"ffprobe failed, assuming single audio stream: {e}", file=sys.stderr)
+        return []
+def extract_audio(video_path: str, audio_path: str, audio_stream: int = -1) -> None:
+    """
+    Extract audio from video using ffmpeg.
+    If audio_stream == -1, mix all audio streams (if multiple), else extract the specified stream index.
+    """
+    streams = probe_audio_streams(video_path)
+    try:
+        if audio_stream >= 0:
+            # map a specific audio stream index
+            cmd = [
+                "ffmpeg", "-y",
+                "-i", video_path,
+                "-map", f"0:a:{audio_stream}",
+                "-vn",
+                "-acodec", "pcm_s16le",
+                "-ar", "16000",
+                "-ac", "1",
+                audio_path,
+            ]
+        else:
+            n = len(streams)
+            if n <= 1:
+                # original behavior (single stream)
+                cmd = [
+                    "ffmpeg", "-y",
+                    "-i", video_path,
+                    "-vn",
+                    "-acodec", "pcm_s16le",
+                    "-ar", "16000",
+                    "-ac", "1",
+                    audio_path,
+                ]
+            else:
+                # mix all audio streams into one mono track
+                inputs = "".join(f"[0:a:{i}]" for i in range(n))
+                filtergraph = f"{inputs}amix=inputs={n}:duration=longest[out]"
+                cmd = [
+                    "ffmpeg", "-y",
+                    "-i", video_path,
+                    "-filter_complex", filtergraph,
+                    "-map", "[out]",
+                    "-ac", "1",
+                    "-ar", "16000",
+                    "-acodec", "pcm_s16le",
+                    audio_path,
+                ]
+        print(f"Extracting audio: {' '.join(cmd)}")
+        subprocess.run(cmd, check=True)
+    except subprocess.CalledProcessError as e:
+        print(f"ffmpeg failed extracting audio: {e}", file=sys.stderr)
+        raise
+def transcribe_audio(
+    audio_path: str, lang: str = "", model_size: str = "small"
+) -> List[Dict[str, Any]]:
+    """Transcribe audio using faster-whisper."""
+    print(f"Loading Whisper model '{model_size}'...")
+    model = WhisperModel(model_size, device="cpu", compute_type="int8")
+    print(f"Transcribing audio: {audio_path} with language='{lang or 'auto'}'...")
+    segments, info = model.transcribe(audio_path, language=lang or None)
+    print(f"Detected language: {info.language} with confidence {info.language_probability}")
+    if lang and info.language and lang != info.language:
+        print(f"[warn] Whisper detected '{info.language}' but --lang was '{lang}'.", file=sys.stderr)
+    result = []
+    for segment in segments:
+        result.append(
+            {
+                "id": segment.id,
+                "seek": segment.seek,
+                "start": segment.start,
+                "end": segment.end,
+                "text": segment.text.strip(),
+            }
+        )
+    return result
+def query_ollama(model: str, prompt: str, temperature: float = 0.2, max_tokens: int = 2048, server_url: str = "http://localhost:11434") -> str:
+    """
+    Query Ollama via HTTP /api/generate (works on recent Ollama versions).
+    Requires `ollama serve` running locally.
+    """
+    url = server_url.rstrip("/") + "/api/generate"
+    payload = {
+        "model": model,
+        "prompt": prompt,
+        "temperature": temperature,
+        "stream": False,
+        "options": {"num_ctx": 8192, "num_predict": max_tokens}
+    }
+    try:
+        r = requests.post(url, json=payload, timeout=600)
+        r.raise_for_status()
+        data = r.json()
+        return data.get("response", "")
+    except requests.exceptions.ConnectionError:
+        print("Failed to connect to Ollama. Is `ollama serve` running?", file=sys.stderr)
+        sys.exit(1)
+    except Exception as e:
+        print(f"Error querying Ollama: {e}", file=sys.stderr)
+        sys.exit(1)
+def load_persona_text(path: str) -> str:
+    if not path:
+        return ""
+    p = pathlib.Path(path)
+    if not p.exists():
+        print(f"Persona file not found: {path}", file=sys.stderr)
+        return ""
+    try:
+        if p.suffix.lower() in {".yaml", ".yml"} and yaml is not None:
+            data = yaml.safe_load(p.read_text(encoding="utf-8"))
+        else:
+            data = json.loads(p.read_text(encoding="utf-8"))
+    except Exception as e:
+        print(f"Failed to read persona file {path}: {e}", file=sys.stderr)
+        return ""
+    # Normalize and stringify
+    parts = []
+    if isinstance(data, dict):
+        bio = data.get("bio") or data.get("about") or ""
+        pillars = data.get("pillars") or data.get("topics") or []
+        audience = data.get("audience") or ""
+        tone = data.get("tone") or ""
+        redlines = data.get("redlines") or data.get("donts") or []
+        parts.append(f"BIO: {bio}")
+        if pillars:
+            parts.append("PILARES:" + ", ".join(map(str, pillars)))
+        if audience:
+            parts.append(f"PÚBLICO: {audience}")
+        if tone:
+            parts.append(f"TOM: {tone}")
+        if redlines:
+            parts.append("NÃO FAZER:" + ", ".join(map(str, redlines)))
+    else:
+        parts.append(str(data))
+    return "\n".join([p for p in parts if p])
+# --- PT/EN heuristic helpers ---
+_pt_words = {" que ", " de ", " pra ", " com ", " não ", " é ", " uma ", " um ", " eu ", " você ", " gente ", " isso ", " então ", " né ", " tá "}
+def is_ptish(text: str) -> bool:
+    t = (text or "").lower()
+    if any(ch in t for ch in "áéíóúâêôãõç"):
+        return True
+    hits = sum(w in t for w in _pt_words)
+    en_hits = sum(w in t for w in [" the ", " and ", " to ", " is ", " of "])
+    return hits >= max(2, en_hits + 1)
+def _lang_of(text: str) -> str:
+    t = (text or "").strip()
+    if not t:
+        return "unk"
+    if is_ptish(t):
+        return "pt"
+    try:
+        code = detect(t)
+        if code.startswith("pt"):
+            return "pt"
+        if code.startswith("en"):
+            return "en"
+        return code
+    except Exception:
+        pt_hits = sum(w in t.lower() for w in [" que ", " de ", " pra ", " com ", " não ", " é "])
+        en_hits = sum(w in t.lower() for w in [" the ", " and ", " to ", " is ", " of "])
+        return "pt" if pt_hits >= en_hits else "en"
+def build_react_cuts(transcript: List[Dict[str, Any]], min_s: int = 60, max_s: int = 180, leadin_s: int = 6, include_en: bool = True, gap_s: float = 1.5) -> List[Dict[str, Any]]:
+    """
+    transcript: list of {start,end,text}
+    Returns cuts as {"segments":[{"start":..,"end":..}, ...], "label": "react-pt"}
+    """
+    ann = []
+    for seg in transcript:
+        lang = _lang_of(seg.get("text", ""))
+        ann.append({**seg, "lang": lang})
+    total_pt = sum((float(seg["end"]) - float(seg["start"])) for seg in ann if seg["lang"] == "pt")
+    print(f"[react-mode] PT seconds detected (raw): {total_pt:.1f}s")
+    cuts: List[Dict[str, Any]] = []
+    i = 0
+    n = len(ann)
+    while i < n:
+        if ann[i]["lang"] != "pt":
+            i += 1
+            continue
+        block_end = float(ann[i]["end"])
+        j = i + 1
+        gap_acc = 0.0
+        while j < n:
+            lang_j = ann[j]["lang"]
+            if lang_j == "pt":
+                gap_acc = 0.0
+                block_end = float(ann[j]["end"])
+                j += 1
+                continue
+            gap = float(ann[j]["end"]) - float(ann[j]["start"])
+            if gap_acc + gap <= gap_s:
+                gap_acc += gap
+                j += 1
+                continue
+            break
+        pt_segs = []
+        for t in range(i, j):
+            if ann[t]["lang"] == "pt":
+                pt_segs.append({"start": float(ann[t]["start"]), "end": float(ann[t]["end"])})
+        if not pt_segs:
+            i = j
+            continue
+        lead_segments: List[Dict[str, float]] = []
+        if include_en:
+            k = i - 1
+            remaining = float(leadin_s)
+            while k >= 0 and remaining > 0 and ann[k]["lang"] == "en":
+                s = float(ann[k]["start"]); e = float(ann[k]["end"])
+                seg_dur = e - s
+                use_s = max(s, e - remaining)
+                if e - use_s > 0.05:
+                    lead_segments.append({"start": use_s, "end": e})
+                remaining -= (e - use_s)
+                k -= 1
+            lead_segments.reverse()
+        acc: List[Dict[str, float]] = []
+        acc_len = 0.0
+        def flush_window():
+            if not acc:
+                return
+            segs = []
+            if include_en and lead_segments:
+                segs.extend(lead_segments)
+            segs.extend(acc)
+            cuts.append({"segments": segs, "label": "react-pt"})
+        for s in pt_segs:
+            seg_len = s["end"] - s["start"]
+            if acc_len + seg_len <= max_s:
+                acc.append(s); acc_len += seg_len
+                if acc_len >= min_s:
+                    flush_window()
+                    acc, acc_len = [], 0.0
+            else:
+                take = max_s - acc_len
+                if take > 0.2:
+                    seg_cut = {"start": s["start"], "end": s["start"] + take}
+                    acc.append(seg_cut); acc_len += take
+                    flush_window()
+                    rest = s["end"] - (s["start"] + take)
+                    if rest >= 0.2:
+                        acc = [{"start": s["start"] + take, "end": s["end"]}]
+                        acc_len = rest
+                    else:
+                        acc, acc_len = [], 0.0
+                else:
+                    flush_window()
+                    acc = [s]; acc_len = seg_len
+                    if acc_len >= min_s:
+                        flush_window(); acc, acc_len = [], 0.0
+        if acc_len >= min_s * 0.5:
+            flush_window()
+        i = j
+    return cuts
+def propose_cuts(transcript: List[Dict[str, Any]], model: str, max_stories: int, max_length: int, persona_text: str = "") -> List[Dict[str, Any]]:
+    """Propose cuts based on transcript using Ollama model."""
+    transcript_text = "\n".join(
+        [f"{seg['start']:.2f} --> {seg['end']:.2f}: {seg['text']}" for seg in transcript]
+    )
+    persona_block = ("\nPERSONA DO(A) CRIADOR(A):\n" + persona_text + "\n") if persona_text else ""
+    prompt = (
+        "Você é um(a) editor(a) de vídeos curtos. Dada a transcrição com timestamps (em segundos), "
+        f"proponha NO MÁXIMO {max_stories} cortes de até {max_length} segundos cada, com começo–meio–fim e potencial de engajamento. "
+        "Você PODE montar cada corte como MONTAGEM, juntando trechos não contíguos que conversem entre si (ex.: segundos do minuto 1 + segundos do minuto 3). "
+        "Leve em conta a persona, temas e diferenciais do(a) criador(a) para priorizar trechos alinhados. "
+        "Responda ESTRITAMENTE em JSON (sem texto fora do JSON), como uma lista de objetos com os campos:\n"
+        "  - (OU) start (segundos, número) e end (segundos, número) para UM bloco contínuo\n"
+        "  - (OU) segments: lista de objetos {start, end} para MONTAGEM\n"
+        "  - label (título curto)\n"
+        "  - hook (frase de abertura curta, 7–12 palavras, no idioma do trecho)\n"
+        "  - reason (por que funciona e como se alinha à persona)\n"
+        "  - score_relevance (0–100, alinhamento com persona/pilares)\n"
+        "  - score_engagement (0–100, potencial de retenção)\n"
+        "  - language (pt, en, pt+en, etc.)\n\n"
+        + persona_block +
+        "TRANSCRIÇÃO:\n"
+        f"{transcript_text}\n\n"
+        "EXEMPLOS DE SAÍDA (apenas um deles por item):\n"
+        "[{\"start\": 72.0, \"end\": 118.5, \"label\": \"Começo da história\", \"hook\": \"Frase chamativa...\", \"reason\": \"Alinha com X...\", \"score_relevance\": 86, \"score_engagement\": 79, \"language\": \"pt\"}]\n"
+        "[{\"segments\":[{\"start\": 12.0, \"end\": 22.5}, {\"start\": 185.0, \"end\": 202.0}], \"label\": \"Conectando pontos\", \"hook\": \"O ponto que ninguém percebe...\", \"reason\": \"Trechos distantes que contam uma ideia completa\", \"score_relevance\": 90, \"score_engagement\": 84, \"language\": \"pt+en\"}]"
+    )
+    response = query_ollama(model, prompt)
+    resp = response.strip()
+    resp = re.sub(r"^```(?:json)?", "", resp).strip()
+    resp = re.sub(r"```$", "", resp).strip()
+    cuts = None
+    try:
+        cuts = json.loads(resp)
+    except json.JSONDecodeError:
+        m = re.search(r"\[.*?\]", resp, flags=re.DOTALL)
+        if m:
+            try:
+                cuts = json.loads(m.group(0))
+            except Exception:
+                cuts = None
+    if not isinstance(cuts, list):
+        print("Model returned non-JSON or invalid JSON. Falling back to heuristic cuts.", file=sys.stderr)
+        cuts = []
+        cur_start = None
+        cur_end = None
+        for seg in transcript:
+            s = float(seg["start"]); e = float(seg["end"])
+            if cur_start is None:
+                cur_start, cur_end = s, e
+            elif e - cur_start <= max_length:
+                cur_end = e
+            else:
+                cuts.append({"start": cur_start, "end": cur_end, "label": "trecho"})
+                cur_start, cur_end = s, e
+            if len(cuts) >= max_stories:
+                break
+        if len(cuts) < max_stories and cur_start is not None:
+            cuts.append({"start": cur_start, "end": cur_end, "label": "trecho"})
+        cuts = cuts[:max_stories]
+    norm_cuts = []
+    for c in cuts:
+        try:
+            if "segments" in c and isinstance(c["segments"], list) and c["segments"]:
+                segs = []
+                total = 0.0
+                for seg in c["segments"]:
+                    s = float(seg["start"]); e = float(seg["end"])
+                    if e <= s:
+                        continue
+                    dur = e - s
+                    if total + dur > max_length:
+                        e = s + max(0.01, max_length - total)
+                        dur = e - s
+                    segs.append({"start": s, "end": e})
+                    total += dur
+                    if total >= max_length:
+                        break
+                if segs:
+                    c2 = {k: v for k, v in c.items() if k != "segments"}
+                    c2["segments"] = segs
+                    norm_cuts.append(c2)
+            elif "start" in c and "end" in c:
+                s = float(c["start"]); e = float(c["end"])
+                if e > s:
+                    if (e - s) > max_length:
+                        e = s + max_length
+                    c2 = dict(c)
+                    c2["segments"] = [{"start": s, "end": e}]
+                    norm_cuts.append(c2)
+        except Exception:
+            continue
+    cuts = norm_cuts
+    cleaned = []
+    for c in cuts:
+        try:
+            segs = c.get("segments", [])
+            acc = []
+            total = 0.0
+            for seg in segs:
+                s = float(seg["start"]); e = float(seg["end"])
+                if e <= s:
+                    continue
+                dur = e - s
+                if total + dur > max_length:
+                    e = s + max(0.01, max_length - total)
+                    dur = e - s
+                acc.append({"start": s, "end": e})
+                total += dur
+                if total >= max_length:
+                    break
+            if not acc:
+                continue
+            label = str(c.get("label", "trecho")).strip() or "trecho"
+            out = {"segments": acc, "label": label}
+            for k in ("hook","reason","score_relevance","score_engagement","language"):
+                if k in c:
+                    out[k] = c[k]
+            cleaned.append(out)
+        except Exception:
+            continue
+    def score_of(c):
+        try:
+            r = float(c.get("score_relevance", 0))
+            e = float(c.get("score_engagement", 0))
+            return 0.6*r + 0.4*e
+        except Exception:
+            return 0.0
+    cleaned_sorted = sorted(cleaned, key=score_of, reverse=True) if cleaned else cleaned
+    cleaned_sorted = cleaned_sorted[:max_stories]
+    print(f"Proposed {len(cleaned_sorted)} cuts (post-sort).")
+    return cleaned_sorted
+def save_cuts_json(cuts: List[Dict[str, Any]], output_path: str) -> None:
+    with open(output_path, "w", encoding="utf-8") as f:
+        json.dump(cuts, f, ensure_ascii=False, indent=2)
+    print(f"Saved cuts JSON to {output_path}")
+def generate_ffmpeg_script(
+    cuts: List[Dict[str, Any]], video_path: str, output_script_path: str, reencode: bool = False
+) -> None:
+    """Generate a shell script with ffmpeg commands to extract cuts."""
+    lines = [
+        "#!/bin/bash",
+        "# Generated ffmpeg cut script",
+        "set -euo pipefail",
+        "",
+        f"VIDEO=\"{video_path}\"",
+        "BASE=export_parts",
+        "mkdir -p \"$BASE\"",
+        "",
+    ]
+    for i, cut in enumerate(cuts, 1):
+        label = cut.get("label", f"cut_{i}")
+        slug = re.sub(r"[^\w\-]+", "-", label.strip().lower()).strip("-") or f"cut-{i}"
+        out_file = f"{slug}.mp4"
+        parts_dir = f"$BASE/parts_{i:02d}"
+        lines.append(f"mkdir -p {parts_dir}")
+        segs = cut.get("segments") or ([{"start": cut.get("start"), "end": cut.get("end")}])
+        idx = 1
+        for seg in segs:
+            s = float(seg["start"]); e = float(seg["end"])
+            dur = max(0.05, e - s)
+            part_name = f"part_{i:02d}_{idx:02d}.mp4"
+            if reencode:
+                lines.append(
+                    f"ffmpeg -y -ss {s:.3f} -i \"$VIDEO\" -t {dur:.3f} -vf scale=1080:-2 -c:v libx264 -preset veryfast -crf 22 -c:a aac -b:a 128k \"{parts_dir}/{part_name}\""
+                )
+            else:
+                lines.append(
+                    f"ffmpeg -y -ss {s:.3f} -i \"$VIDEO\" -t {dur:.3f} -c copy \"{parts_dir}/{part_name}\""
+                )
+            idx += 1
+        list_file = f"{parts_dir}/list.txt"
+        lines.append(f"rm -f {list_file} && touch {list_file}")
+        lines.append(f"for f in {parts_dir}/part_{i:02d}_*.mp4; do echo \"file '$PWD/$f'\" >> {list_file}; done")
+        lines.append(
+            f"ffmpeg -y -f concat -safe 0 -i {list_file} -c copy \"{out_file}\""
+        )
+        lines.append("")
+    with open(output_script_path, "w", encoding="utf-8") as f:
+        f.write("\n".join(lines))
+    os.chmod(output_script_path, 0o755)
+    print(f"Generated ffmpeg script: {output_script_path}")
+def generate_preview(cuts: List[Dict[str, Any]], video_path: str, base_name: str) -> Optional[str]:
+    """Create a single low-res preview video that concatenates all cuts in order."""
+    if not cuts:
+        print("No cuts to preview (empty cuts list).", file=sys.stderr)
+        return None
+    try:
+        work_root = Path(f"{base_name}_preview_parts")
+        work_root.mkdir(parents=True, exist_ok=True)
+        cut_outputs = []
+        for i, c in enumerate(cuts, 1):
+            segs = c.get("segments") or ([{"start": c.get("start"), "end": c.get("end")}])
+            cut_dir = work_root / f"cut_{i:02d}"
+            cut_dir.mkdir(parents=True, exist_ok=True)
+            part_paths = []
+            for j, seg in enumerate(segs, 1):
+                s = float(seg["start"]); e = float(seg["end"])
+                dur = max(0.05, e - s)
+                part = cut_dir / f"part_{i:02d}_{j:02d}.mp4"
+                cmd = [
+                    "ffmpeg", "-y",
+                    "-ss", f"{s:.3f}", "-i", video_path,
+                    "-t", f"{dur:.3f}",
+                    "-vf", "scale=1280:-2",
+                    "-c:v", "libx264", "-preset", "veryfast", "-crf", "28",
+                    "-c:a", "aac", "-b:a", "96k",
+                    str(part),
+                ]
+                subprocess.run(cmd, check=True)
+                part_paths.append(part)
+            list_file = cut_dir / "concat_list.txt"
+            with list_file.open("w", encoding="utf-8") as f:
+                for p in part_paths:
+                    f.write(f"file '{p.resolve()}'\n")
+            cut_out = work_root / f"cut_{i:02d}.mp4"
+            cmd2 = [
+                "ffmpeg", "-y",
+                "-f", "concat", "-safe", "0",
+                "-i", str(list_file),
+                "-c", "copy",
+                str(cut_out),
+            ]
+            subprocess.run(cmd2, check=True)
+            cut_outputs.append(cut_out)
+        list_all = work_root / "all.txt"
+        with list_all.open("w", encoding="utf-8") as f:
+            for p in cut_outputs:
+                f.write(f"file '{p.resolve()}'\n")
+        out_path = f"PREVIEW_{base_name}.mp4"
+        cmd3 = [
+            "ffmpeg", "-y",
+            "-f", "concat", "-safe", "0",
+            "-i", str(list_all),
+            "-c", "copy",
+            out_path,
+        ]
+        subprocess.run(cmd3, check=True)
+        print(f"Generated preview: {out_path}")
+        return out_path
+    except Exception as e:
+        print(f"Failed to generate preview: {e}", file=sys.stderr)
+        return None
+def main():
+    parser = argparse.ArgumentParser(
+        description="Video cuts offline tool with audio transcription and Ollama integration."
+    )
+    parser.add_argument("video", help="Input video file path")
+    parser.add_argument("--lang", default="", help='Language code for transcription (empty for auto-detect)')
+    parser.add_argument("--audio-stream", type=int, default=-1, help="Audio stream index (-1 = mix all)")
+    parser.add_argument("--model", default="llama2", help="Ollama model for proposing cuts")
+    parser.add_argument("--whisper-model", default="small", help="Whisper model (tiny, base, small, medium, large)")
+    parser.add_argument("--only-transcribe", action="store_true", help="Only transcribe and save transcript.json")
+    parser.add_argument("--only-propose", action="store_true", help="Only propose cuts from existing transcript.json")
+    parser.add_argument("--reencode", action="store_true", help="Re-encode video cuts")
+    parser.add_argument("--max-stories", type=int, default=8, help="Maximum number of cuts")
+    parser.add_argument("--max-length", type=int, default=60, help="Max duration (seconds) per cut")
+    parser.add_argument("--preview", action="store_true", help="Generate preview MP4")
+    parser.add_argument("--persona", type=str, default="", help="Path to persona JSON/YAML (see persona.example.json)")
+    parser.add_argument("--react-mode", action="store_true", help="React mode: PT comments with EN lead-in")
+    parser.add_argument("--react-min", type=int, default=60, help="Min duration (s) per cut in react mode")
+    parser.add_argument("--react-max", type=int, default=180, help="Max duration (s) per cut in react mode")
+    parser.add_argument("--react-leadin", type=int, default=6, help="EN lead-in (s) before comment")
+    parser.add_argument("--react-include-en", choices=["yes","no"], default="yes", help="Include EN lead-in")
+    parser.add_argument("--react-gap", type=float, default=1.5, help="Gap tolerance (s) between PT segments")
+    parser.add_argument("--reuse-transcript", action="store_true", help="Reuse existing transcript.json")
+    args = parser.parse_args()
+    video_path = args.video
+    base_name = os.path.splitext(os.path.basename(video_path))[0]
+    audio_path = f"{base_name}_audio.wav"
+    transcript_path = f"{base_name}_transcript.json"
+    cuts_json_path = f"{base_name}_cuts.json"
+    cuts_script_path = f"{base_name}_cuts.sh"
+    transcript = None
+    if not args.only_propose:
+        if args.reuse_transcript and os.path.exists(transcript_path):
+            print(f"Reusing existing transcript from {transcript_path}")
+            with open(transcript_path, "r", encoding="utf-8") as f:
+                transcript = json.load(f)
+        else:
+            if not os.path.exists(audio_path):
+                extract_audio(video_path, audio_path, audio_stream=args.audio_stream)
+            transcript = transcribe_audio(audio_path, lang=args.lang, model_size=args.whisper_model)
+            with open(transcript_path, "w", encoding="utf-8") as f:
+                json.dump(transcript, f, ensure_ascii=False, indent=2)
+            print(f"Saved transcript to {transcript_path}")
+        if args.only_transcribe:
+            print("Only transcribe requested, exiting.")
+            sys.exit(0)
+    else:
+        if not os.path.exists(transcript_path):
+            print(f"Transcript file {transcript_path} not found.", file=sys.stderr)
+            sys.exit(1)
+        with open(transcript_path, "r", encoding="utf-8") as f:
+            transcript = json.load(f)
+    if args.react_mode:
+        cuts = build_react_cuts(
+            transcript,
+            min_s=args.react_min,
+            max_s=args.react_max,
+            leadin_s=args.react_leadin,
+            include_en=(args.react_include_en == "yes"),
+            gap_s=args.react_gap,
+        )
+        if not cuts:
+            print("No PT blocks long enough for react-mode.")
+            sys.exit(0)
+        save_cuts_json(cuts, cuts_json_path)
+        generate_ffmpeg_script(cuts, video_path, cuts_script_path, reencode=args.reencode)
+        if args.preview:
+            generate_preview(cuts, video_path, base_name)
+        print("Done (react-mode).")
+        sys.exit(0)
+    persona_text = load_persona_text(args.persona)
+    cuts = propose_cuts(transcript, args.model, args.max_stories, args.max_length, persona_text)
+    save_cuts_json(cuts, cuts_json_path)
+    generate_ffmpeg_script(cuts, video_path, cuts_script_path, reencode=args.reencode)
+    if args.preview:
+        generate_preview(cuts, video_path, base_name)
+    print("Done.")
+if __name__ == "__main__":
+    main()