generativevideoeditorcpu

Running

App Files Files Community

victor HF Staff commited on Jan 6

Commit

fe730d2

1 Parent(s): bd4d26b

feat: Switch to GLM-4.7 model with improved prompts

Browse files

Files changed (2) hide show

README.md +5 -5
app.py +49 -70

README.md CHANGED Viewed

@@ -1,6 +1,6 @@
 ---
 title: AI Video Composer
-short_description: Create videos with FFMPEG + Qwen2.5-Coder
 emoji: 🏞
 colorFrom: red
 colorTo: yellow
@@ -10,12 +10,12 @@ app_file: app.py
 pinned: false
 disable_embedding: true
 models:
-- Qwen/Qwen2.5-Coder-32B-Instruct
 ---
 # 🏞 AI Video Composer
-AI Video Composer is an intelligent media processing application that uses natural language instructions to create videos from your media assets. It leverages the Qwen2.5-Coder language model to generate FFmpeg commands based on your requirements.
 ## How It Works
@@ -44,7 +44,7 @@ AI Video Composer is an intelligent media processing application that uses natur
 4. **Processing**:
    - The app analyzes your files and instructions
-   - Generates an optimized FFmpeg command using Qwen2.5-Coder
    - Executes the command and returns the processed video
    - Displays the generated FFmpeg command for transparency
@@ -62,7 +62,7 @@ AI Video Composer is an intelligent media processing application that uses natur
 - Built with Gradio for the user interface
 - Uses FFmpeg for media processing
-- Powered by Qwen2.5-Coder for command generation
 - Implements robust error handling and command validation
 - Processes files in a temporary directory for safety
 - Supports both simple operations and complex media transformations

 ---
 title: AI Video Composer
+short_description: Create videos with FFMPEG + GLM-4.7
 emoji: 🏞
 colorFrom: red
 colorTo: yellow
 pinned: false
 disable_embedding: true
 models:
+- zai-org/GLM-4.7
 ---
 # 🏞 AI Video Composer
+AI Video Composer is an intelligent media processing application that uses natural language instructions to create videos from your media assets. It leverages the GLM-4.7 language model to generate FFmpeg commands based on your requirements.
 ## How It Works
 4. **Processing**:
    - The app analyzes your files and instructions
+   - Generates an optimized FFmpeg command using GLM-4.7
    - Executes the command and returns the processed video
    - Displays the generated FFmpeg command for transparency
 - Built with Gradio for the user interface
 - Uses FFmpeg for media processing
+- Powered by GLM-4.7 for command generation
 - Implements robust error handling and command validation
 - Processes files in a temporary directory for safety
 - Supports both simple operations and complex media transformations

app.py CHANGED Viewed

@@ -22,10 +22,10 @@ import shutil
 # Supported models configuration
 MODELS = {
-    "deepseek-ai/DeepSeek-V3": {
-        "base_url": "https://router.huggingface.co/sambanova/v1",
         "env_key": "HF_TOKEN",
-        "model_name": "DeepSeek-V3-0324",
     },
 }
@@ -163,17 +163,19 @@ def get_completion(
         files_info_string += f"| {file_info['type']} | {file_info['name']} | {dimensions} | {duration} | {audio} |\n"
     # Build the user message with optional error feedback
-    user_content = f"""Always output the media as video/mp4 and output file with "output.mp4".
-The current assets and objective follow.
-AVAILABLE ASSETS LIST:
 {files_info_string}
-OBJECTIVE: {prompt} and output at "output.mp4"
-First, think step-by-step about what I'm asking for and reformulate it into a clear technical specification.
-Then provide the FFMPEG command that will accomplish this task."""
     # Add error feedback if this is a retry
     if previous_error and previous_command:
@@ -207,63 +209,40 @@ FORMAT DETECTION KEYWORDS:
         messages = [
             {
                 "role": "system",
-                "content": """
-You are a very experienced media engineer, controlling a UNIX terminal.
-You are an FFMPEG expert with years of experience and multiple contributions to the FFMPEG project.
-You are given:
-(1) a set of video, audio and/or image assets. Including their name, duration, dimensions and file size
-(2) the description of a new video you need to create from the list of assets
-Your objective is to generate the SIMPLEST POSSIBLE single ffmpeg command to create the requested video.
-Key requirements:
-    - First, think step-by-step about what the user is asking for and reformulate it into a clear technical specification
-    - Use the absolute minimum number of ffmpeg options needed
-    - Avoid complex filter chains or filter_complex if possible
-    - Prefer simple concatenation, scaling, and basic filters
-    - Output exactly ONE command that will be directly pasted into the terminal
-    - Never output multiple commands chained together
-    - Output the command in a single line (no line breaks or multiple lines)
-    - If the user asks for waveform visualization make sure to set the mode to `line` with and the use the full width of the video. Also concatenate the audio into a single channel.
-    - For image sequences: Use -framerate and pattern matching (like 'img%d.jpg') when possible, falling back to individual image processing with -loop 1 and appropriate filters only when necessary.
-    - When showing file operations or commands, always use explicit paths and filenames without wildcards - avoid using asterisk (*) or glob patterns. Instead, use specific numbered sequences (like %d), explicit file lists, or show the full filename.
-CRITICAL SLIDESHOW GUIDANCE:
-When creating slideshows from multiple images with different dimensions, ALWAYS follow this proven pattern:
-1. CHOOSE A STANDARD RESOLUTION: Pick 1920x1080 (1080p) as the default target resolution for slideshows, UNLESS the user explicitly requests a different format (e.g., "vertical video", "9:16 ratio", "portrait mode", "TikTok format" → use 1080x1920)
-2. USE SIMPLE SCALE+PAD APPROACH: For each image, scale to fit within the chosen resolution maintaining aspect ratio, then pad with black bars
-3. PROVEN SLIDESHOW PATTERN:
-   ```
-   ffmpeg -loop 1 -t 3 -i image1.jpg -loop 1 -t 3 -i image2.jpg -filter_complex "[0]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1[v0];[1]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1[v1];[v0][v1]concat=n=2:v=1:a=0" -c:v libx264 -pix_fmt yuv420p -movflags +faststart output.mp4
-   ```
-4. SLIDESHOW RULES:
-   - Use 1920x1080 as target resolution by default, adjust if user specifies format
-   - For horizontal: scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2
-   - For vertical: scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2
-   - Always add setsar=1 after padding to fix aspect ratio issues
-   - Use 3-second duration per image by default (-t 3)
-   - For 3+ images, extend the pattern: [v0][v1][v2]concat=n=3:v=1:a=0
-5. DIMENSION MISMATCH FIXES:
-   - Never try to concat images with different dimensions directly
-   - Always normalize dimensions first with scale+pad
-   - Black padding is preferable to stretching/distorting images
-6. SLIDESHOW TRANSITIONS:
-   - For fade transitions, add fade=t=in:st=0:d=0.5,fade=t=out:st=2.5:d=0.5 after setsar=1
-   - Keep transitions simple - complex transitions often fail
-   - Only add transitions if specifically requested
-7. SLIDESHOW TIMING:
-   - Default to 3 seconds per image
-   - Adjust timing based on user request (e.g., "5 seconds per image")
-   - Total duration = (number of images × seconds per image)
-Remember: Simpler is better. Only use advanced ffmpeg features if absolutely necessary for the requested output.
-""",
             },
             {
                 "role": "user",
@@ -413,7 +392,7 @@ def compose_video(
     files: list = None,
     top_p: float = 0.7,
     temperature: float = 0.1,
-    model_choice: str = "deepseek-ai/DeepSeek-V3",
 ) -> str:
     """
     Compose videos from existing media assets using natural language instructions.
@@ -443,7 +422,7 @@ def update(
     prompt,
     top_p=1,
     temperature=1,
-    model_choice="deepseek-ai/DeepSeek-V3",
 ):
     if prompt == "":
         raise gr.Error("Please enter a prompt.")
@@ -601,7 +580,7 @@ with gr.Blocks() as demo:
     gr.Markdown(
         """
             # 🏞 AI Video Composer
-            Compose new videos from your assets using natural language. Add video, image and audio assets and let [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324) generate a new video for you (using FFMPEG).
         """,
         elem_id="header",
     )

 # Supported models configuration
 MODELS = {
+    "zai-org/GLM-4.7": {
+        "base_url": "https://router.huggingface.co/v1",
         "env_key": "HF_TOKEN",
+        "model_name": "zai-org/GLM-4.7",
     },
 }
         files_info_string += f"| {file_info['type']} | {file_info['name']} | {dimensions} | {duration} | {audio} |\n"
     # Build the user message with optional error feedback
+    user_content = f"""## AVAILABLE ASSETS
 {files_info_string}
+## TASK
+{prompt}
+## REQUIREMENTS
+- Output format: MP4 video saved as "output.mp4"
+- Generate a single, complete FFmpeg command
+- Command must work with the exact filenames listed above
+Think briefly about the approach, then output the FFmpeg command in a ```bash code block."""
     # Add error feedback if this is a retry
     if previous_error and previous_command:
         messages = [
             {
                 "role": "system",
+                "content": """You are an expert FFmpeg engineer. Generate precise, working FFmpeg commands.
+## OUTPUT FORMAT
+1. Brief analysis (2-3 sentences max)
+2. Single FFmpeg command in a ```bash code block
+3. Output file must be "output.mp4"
+## CORE RULES
+- ONE command only, no chaining (no && or ;)
+- Use exact filenames from the asset list
+- Keep commands as simple as possible
+- Always use: -c:v libx264 -pix_fmt yuv420p -movflags +faststart
+## SLIDESHOW PATTERN (for multiple images)
+When combining images with different dimensions:
+```bash
+ffmpeg -loop 1 -t 3 -i img1.jpg -loop 1 -t 3 -i img2.jpg -filter_complex "[0]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1[v0];[1]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1[v1];[v0][v1]concat=n=2:v=1:a=0" -c:v libx264 -pix_fmt yuv420p output.mp4
+```
+- Default: 1920x1080, 3 seconds per image
+- Vertical/portrait/TikTok: use 1080x1920
+- Always scale+pad to normalize dimensions
+## AUDIO WAVEFORM
+For waveform visualization:
+```bash
+ffmpeg -i audio.mp3 -i bg.png -filter_complex "[0:a]showwaves=s=1920x200:mode=line:colors=white[wave];[1]scale=1920:1080[bg];[bg][wave]overlay=(W-w)/2:(H-h)/2" -c:v libx264 -c:a aac output.mp4
+```
+## WITH BACKGROUND MUSIC
+Add audio to video/slideshow:
+```bash
+ffmpeg ... -i music.mp3 -map "[vout]" -map N:a -shortest -c:a aac output.mp4
+```
+Where N is the audio input index.""",
             },
             {
                 "role": "user",
     files: list = None,
     top_p: float = 0.7,
     temperature: float = 0.1,
+    model_choice: str = "zai-org/GLM-4.7",
 ) -> str:
     """
     Compose videos from existing media assets using natural language instructions.
     prompt,
     top_p=1,
     temperature=1,
+    model_choice="zai-org/GLM-4.7",
 ):
     if prompt == "":
         raise gr.Error("Please enter a prompt.")
     gr.Markdown(
         """
             # 🏞 AI Video Composer
+            Compose new videos from your assets using natural language. Add video, image and audio assets and let [GLM-4.7](https://huggingface.co/zai-org/GLM-4.7) generate a new video for you (using FFMPEG).
         """,
         elem_id="header",
     )