Spaces:

ZhijunLStudio
/

Ernie4.5-VL-Video2Coder

Sleeping

App Files Files Community

zhijun.li commited on Nov 20, 2025

Commit

e6bc9f9

1 Parent(s): 25ed9e1

Add app.py, requirements and improved README

Browse files

Files changed (4) hide show

.gitignore +6 -0
README.md +34 -6
app.py +192 -0
requirements.txt +5 -0

.gitignore ADDED Viewed

	@@ -0,0 +1,6 @@

+__pycache__/
+*.pyc
+.env
+.DS_Store
+venv/
+.ipynb_checkpoints/

README.md CHANGED Viewed

@@ -1,14 +1,42 @@
 ---
-title: Video To Code Ernie
-emoji: 🏃
-colorFrom: red
-colorTo: indigo
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 license: apache-2.0
-short_description: Generate HTML/CSS code from videos using ERNIE 4.5-VL
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Ernie 4.5 Video2Code
+emoji: ⚡
+colorFrom: blue
+colorTo: cyan
 sdk: gradio
 sdk_version: 5.49.1
 app_file: app.py
 pinned: false
 license: apache-2.0
+short_description: Turn UI/UX video tutorials into executable HTML code instantly.
 ---
+# ⚡ ERNIE 4.5-VL Video-to-Code Agent
+**Watch the video, write the code.**
+This AI Agent uses **Baidu ERNIE 4.5-VL (Vision-Language Model)** to analyze frontend coding tutorials frame-by-frame and reconstruct the final webpage structure, styling, and logic automatically.
+## ✨ Key Features
+*   **👁️ Visual Perception**: The AI "watches" the video, identifying HTML structures, CSS layouts, and interactive elements shown on screen.
+*   **🛡️ Sandbox Rendering**: Generated code is rendered inside a secure **Iframe**, allowing you to see the live result immediately without style conflicts.
+*   **🧹 Clean Output**: Automatically filters out conversational text to provide pure, ready-to-run HTML/CSS/JS code.
+*   **📦 Single-File Download**: Get a standalone `.html` file containing all dependencies.
+## 🚀 How to Use
+1.  **Upload**: Drop an MP4 video file (Frontend tutorials, CSS effects, UI demos).
+    *   *Constraint: Max video duration is **30 minutes**.*
+2.  **Generate**: Click **"🚀 Generate & Render"**.
+3.  **Preview**:
+    *   **Live Preview**: See the code running instantly in the browser.
+    *   **Source Code**: Inspect the generated HTML syntax.
+4.  **Download**: Save the result to your local machine.
+## ⚙️ How It Works
+1.  **Frame Extraction**: The video is processed using OpenCV to capture high-quality keyframes.
+2.  **Parallel Analysis**: ERNIE 4.5-VL processes video segments in parallel to understand the coding progression and visual outcome.
+3.  **Logic Synthesis**: The agent acts as a Senior Frontend Engineer, aggregating the visual insights to write functional code.

app.py ADDED Viewed

	@@ -0,0 +1,192 @@

+import os
+import cv2
+import time
+import base64
+import gradio as gr
+from openai import OpenAI
+from concurrent.futures import ThreadPoolExecutor, as_completed
+import re
+# --- Configuration ---
+BASE_URL = "https://aistudio.baidu.com/llm/lmapi/v3"
+MODEL_NAME = "ernie-4.5-turbo-vl"
+MAX_CONCURRENT_REQUESTS = 4
+MAX_VIDEO_DURATION_SEC = 1800
+def extract_frames(video_path, interval_sec=1):
+    """Extract frames from the video."""
+    cap = cv2.VideoCapture(video_path)
+    if not cap.isOpened():
+        return []
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    if fps == 0: fps = 30
+    chunk_frames, chunks, frame_count = [], [], 0
+    while cap.isOpened():
+        ret, frame = cap.read()
+        if not ret: break
+        if frame_count % int(fps * interval_sec) == 0:
+            height, width = frame.shape[:2]
+            scale = 512 / height
+            resized_frame = cv2.resize(frame, (int(width * scale), 512))
+            _, buffer = cv2.imencode('.jpg', resized_frame, [int(cv2.IMWRITE_JPEG_QUALITY), 60])
+            chunk_frames.append(base64.b64encode(buffer).decode('utf-8'))
+            if len(chunk_frames) == 30:
+                chunks.append(chunk_frames)
+                chunk_frames = []
+        frame_count += 1
+    if chunk_frames: chunks.append(chunk_frames)
+    cap.release()
+    return chunks
+def process_chunk_with_retry(client, chunk_index, frames_b64, max_retries=3):
+    """Send chunk to LLM."""
+    prompt = (
+        "This is a segment from a frontend web development video tutorial (screenshots taken every second). "
+        "Please focus intently on the code shown on the screen and the resulting web page style. "
+        "Describe in detail the HTML structure, CSS styling rules, or JavaScript logic presented in this segment. "
+        "Ignore unrelated video elements."
+    )
+    content = [{"type": "text", "text": prompt}]
+    for f in frames_b64:
+        content.append({"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{f}"}})
+    for attempt in range(max_retries):
+        try:
+            response = client.chat.completions.create(
+                model=MODEL_NAME,
+                messages=[{"role": "user", "content": content}],
+                temperature=0.1, max_tokens=1024
+            )
+            return chunk_index, response.choices[0].message.content
+        except Exception as e:
+            time.sleep(2)
+    return chunk_index, ""
+def aggregate_and_generate_webpage(client, summaries):
+    """Aggregate summaries and generate final HTML."""
+    full_summary = "\n".join([f"Segment {i+1} Summary: {s}" for i, s in sorted(summaries.items()) if s])
+    # Prompt 稍微加强一点语气
+    final_prompt = f"""
+    You are an expert Frontend Engineer. Based on the video segment summaries, write a complete HTML file.
+    **Summaries:**
+    {full_summary}
+    **Strict Output Instructions:**
+    1. Return ONLY the raw HTML code.
+    2. Start directly with `<!DOCTYPE html>`.
+    3. End directly with `</html>`.
+    4. NO introduction text, NO markdown backticks (```), NO explanations after the code.
+    """
+    response = client.chat.completions.create(
+        model=MODEL_NAME,
+        messages=[{"role": "user", "content": final_prompt}],
+        temperature=0.2, top_p=0.8
+    )
+    content = response.choices[0].message.content
+    content = content.replace("```html", "").replace("```", "").strip()
+    match = re.search(r'(<!DOCTYPE html>.*</html>)', content, re.DOTALL | re.IGNORECASE)
+    if match:
+        content = match.group(1)
+    else:
+        start_match = re.search(r'<!DOCTYPE html>', content, re.IGNORECASE)
+        if start_match:
+            content = content[start_match.start():]
+    return content
+def main_process(video_file, progress=gr.Progress()):
+    api_key = os.environ.get("ERNIE_API_KEY")
+    if not api_key: raise gr.Error("Server Config Error: API KEY missing.")
+    if not video_file: raise gr.Error("Please upload a video.")
+    # check duration
+    cap = cv2.VideoCapture(video_file)
+    fps = cap.get(cv2.CAP_PROP_FPS)
+    count = cap.get(cv2.CAP_PROP_FRAME_COUNT)
+    duration = count / fps if fps > 0 else 0
+    cap.release()
+    if duration > MAX_VIDEO_DURATION_SEC:
+        raise gr.Error(f"Video too long ({duration/60:.1f}m). Limit is 30m.")
+    client = OpenAI(api_key=api_key, base_url=BASE_URL)
+    progress(0.1, desc="Extracting frames...")
+    chunks = extract_frames(video_file)
+    if not chunks: raise gr.Error("Frame extraction failed.")
+    progress(0.3, desc="Analyzing content...")
+    chunk_summaries = {}
+    with ThreadPoolExecutor(max_workers=MAX_CONCURRENT_REQUESTS) as executor:
+        future_to_chunk = {executor.submit(process_chunk_with_retry, client, i, chunk): i for i, chunk in enumerate(chunks)}
+        for i, future in enumerate(as_completed(future_to_chunk)):
+            idx, summary = future.result()
+            if summary: chunk_summaries[idx] = summary
+            progress(0.3 + 0.5 * ((i+1)/len(chunks)), desc=f"Analyzed {i+1}/{len(chunks)}")
+    progress(0.8, desc="Synthesizing code...")
+    html_code = aggregate_and_generate_webpage(client, chunk_summaries)
+    # Save file
+    output_path = "generated_website.html"
+    with open(output_path, "w", encoding="utf-8") as f:
+        f.write(html_code)
+    # --- 关键修改：制作 Data URI Iframe ---
+    # 将 HTML 编码为 Base64，放入 iframe 的 src 中
+    # 这样实现了完美的沙箱隔离，样式不会冲突，JS 也能正常运行
+    b64_html = base64.b64encode(html_code.encode('utf-8')).decode('utf-8')
+    data_uri = f"data:text/html;charset=utf-8;base64,{b64_html}"
+    # 创建一个 HTML 字符串，里面包含一个 iframe
+    iframe_html = f"""
+    <iframe
+        src="{data_uri}"
+        width="100%"
+        height="600px"
+        style="border: 1px solid #ccc; border-radius: 8px; background-color: white;">
+    </iframe>
+    """
+    progress(1.0, desc="Done!")
+    # 返回 iframe 字符串给 HTML 组件，返回路径给下载组件，返回源码给 Code 组件
+    return iframe_html, output_path, html_code
+# --- UI ---
+with gr.Blocks(title="Ernie 4.5 Video2Code", theme=gr.themes.Soft()) as demo:
+    gr.Markdown("# 🎬 Ernie 4.5-VL: Video to Code Agent")
+    gr.Markdown("Upload a frontend video tutorial. The AI will generate and **render** the code instantly.")
+    with gr.Row():
+        with gr.Column(scale=1):
+            video_input = gr.Video(label="Upload Video", format="mp4", height=300)
+            submit_btn = gr.Button("🚀 Generate & Render", variant="primary", size="lg")
+        with gr.Column(scale=2):
+            # 直接展示预览，不再隐藏在 Tab 里，或者设为默认 Tab
+            with gr.Tabs():
+                with gr.TabItem("🌐 Live Preview (Result)"):
+                    # 这个组件现在接收的是 iframe 字符串
+                    html_preview = gr.HTML(label="Rendered Page")
+                with gr.TabItem("📝 Source Code"):
+                    code_output = gr.Code(language="html", label="HTML Source")
+                with gr.TabItem("⬇️ Download"):
+                     file_download = gr.File(label="Download .html File")
+    submit_btn.click(
+        fn=main_process,
+        inputs=[video_input],
+        outputs=[html_preview, file_download, code_output]
+    )
+if __name__ == "__main__":
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,5 @@

+opencv-python-headless
+openai
+tqdm
+gradio
+numpy