victor HF Staff commited on
Commit
fe730d2
·
1 Parent(s): bd4d26b

feat: Switch to GLM-4.7 model with improved prompts

Browse files
Files changed (2) hide show
  1. README.md +5 -5
  2. app.py +49 -70
README.md CHANGED
@@ -1,6 +1,6 @@
1
  ---
2
  title: AI Video Composer
3
- short_description: Create videos with FFMPEG + Qwen2.5-Coder
4
  emoji: 🏞
5
  colorFrom: red
6
  colorTo: yellow
@@ -10,12 +10,12 @@ app_file: app.py
10
  pinned: false
11
  disable_embedding: true
12
  models:
13
- - Qwen/Qwen2.5-Coder-32B-Instruct
14
  ---
15
 
16
  # 🏞 AI Video Composer
17
 
18
- AI Video Composer is an intelligent media processing application that uses natural language instructions to create videos from your media assets. It leverages the Qwen2.5-Coder language model to generate FFmpeg commands based on your requirements.
19
 
20
  ## How It Works
21
 
@@ -44,7 +44,7 @@ AI Video Composer is an intelligent media processing application that uses natur
44
 
45
  4. **Processing**:
46
  - The app analyzes your files and instructions
47
- - Generates an optimized FFmpeg command using Qwen2.5-Coder
48
  - Executes the command and returns the processed video
49
  - Displays the generated FFmpeg command for transparency
50
 
@@ -62,7 +62,7 @@ AI Video Composer is an intelligent media processing application that uses natur
62
 
63
  - Built with Gradio for the user interface
64
  - Uses FFmpeg for media processing
65
- - Powered by Qwen2.5-Coder for command generation
66
  - Implements robust error handling and command validation
67
  - Processes files in a temporary directory for safety
68
  - Supports both simple operations and complex media transformations
 
1
  ---
2
  title: AI Video Composer
3
+ short_description: Create videos with FFMPEG + GLM-4.7
4
  emoji: 🏞
5
  colorFrom: red
6
  colorTo: yellow
 
10
  pinned: false
11
  disable_embedding: true
12
  models:
13
+ - zai-org/GLM-4.7
14
  ---
15
 
16
  # 🏞 AI Video Composer
17
 
18
+ AI Video Composer is an intelligent media processing application that uses natural language instructions to create videos from your media assets. It leverages the GLM-4.7 language model to generate FFmpeg commands based on your requirements.
19
 
20
  ## How It Works
21
 
 
44
 
45
  4. **Processing**:
46
  - The app analyzes your files and instructions
47
+ - Generates an optimized FFmpeg command using GLM-4.7
48
  - Executes the command and returns the processed video
49
  - Displays the generated FFmpeg command for transparency
50
 
 
62
 
63
  - Built with Gradio for the user interface
64
  - Uses FFmpeg for media processing
65
+ - Powered by GLM-4.7 for command generation
66
  - Implements robust error handling and command validation
67
  - Processes files in a temporary directory for safety
68
  - Supports both simple operations and complex media transformations
app.py CHANGED
@@ -22,10 +22,10 @@ import shutil
22
 
23
  # Supported models configuration
24
  MODELS = {
25
- "deepseek-ai/DeepSeek-V3": {
26
- "base_url": "https://router.huggingface.co/sambanova/v1",
27
  "env_key": "HF_TOKEN",
28
- "model_name": "DeepSeek-V3-0324",
29
  },
30
  }
31
 
@@ -163,17 +163,19 @@ def get_completion(
163
  files_info_string += f"| {file_info['type']} | {file_info['name']} | {dimensions} | {duration} | {audio} |\n"
164
 
165
  # Build the user message with optional error feedback
166
- user_content = f"""Always output the media as video/mp4 and output file with "output.mp4".
167
- The current assets and objective follow.
168
-
169
- AVAILABLE ASSETS LIST:
170
 
171
  {files_info_string}
172
 
173
- OBJECTIVE: {prompt} and output at "output.mp4"
 
 
 
 
 
 
174
 
175
- First, think step-by-step about what I'm asking for and reformulate it into a clear technical specification.
176
- Then provide the FFMPEG command that will accomplish this task."""
177
 
178
  # Add error feedback if this is a retry
179
  if previous_error and previous_command:
@@ -207,63 +209,40 @@ FORMAT DETECTION KEYWORDS:
207
  messages = [
208
  {
209
  "role": "system",
210
- "content": """
211
- You are a very experienced media engineer, controlling a UNIX terminal.
212
- You are an FFMPEG expert with years of experience and multiple contributions to the FFMPEG project.
213
-
214
- You are given:
215
- (1) a set of video, audio and/or image assets. Including their name, duration, dimensions and file size
216
- (2) the description of a new video you need to create from the list of assets
217
-
218
- Your objective is to generate the SIMPLEST POSSIBLE single ffmpeg command to create the requested video.
219
-
220
- Key requirements:
221
- - First, think step-by-step about what the user is asking for and reformulate it into a clear technical specification
222
- - Use the absolute minimum number of ffmpeg options needed
223
- - Avoid complex filter chains or filter_complex if possible
224
- - Prefer simple concatenation, scaling, and basic filters
225
- - Output exactly ONE command that will be directly pasted into the terminal
226
- - Never output multiple commands chained together
227
- - Output the command in a single line (no line breaks or multiple lines)
228
- - If the user asks for waveform visualization make sure to set the mode to `line` with and the use the full width of the video. Also concatenate the audio into a single channel.
229
- - For image sequences: Use -framerate and pattern matching (like 'img%d.jpg') when possible, falling back to individual image processing with -loop 1 and appropriate filters only when necessary.
230
- - When showing file operations or commands, always use explicit paths and filenames without wildcards - avoid using asterisk (*) or glob patterns. Instead, use specific numbered sequences (like %d), explicit file lists, or show the full filename.
231
-
232
- CRITICAL SLIDESHOW GUIDANCE:
233
- When creating slideshows from multiple images with different dimensions, ALWAYS follow this proven pattern:
234
-
235
- 1. CHOOSE A STANDARD RESOLUTION: Pick 1920x1080 (1080p) as the default target resolution for slideshows, UNLESS the user explicitly requests a different format (e.g., "vertical video", "9:16 ratio", "portrait mode", "TikTok format" → use 1080x1920)
236
- 2. USE SIMPLE SCALE+PAD APPROACH: For each image, scale to fit within the chosen resolution maintaining aspect ratio, then pad with black bars
237
- 3. PROVEN SLIDESHOW PATTERN:
238
- ```
239
- ffmpeg -loop 1 -t 3 -i image1.jpg -loop 1 -t 3 -i image2.jpg -filter_complex "[0]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1[v0];[1]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1[v1];[v0][v1]concat=n=2:v=1:a=0" -c:v libx264 -pix_fmt yuv420p -movflags +faststart output.mp4
240
- ```
241
-
242
- 4. SLIDESHOW RULES:
243
- - Use 1920x1080 as target resolution by default, adjust if user specifies format
244
- - For horizontal: scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2
245
- - For vertical: scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2
246
- - Always add setsar=1 after padding to fix aspect ratio issues
247
- - Use 3-second duration per image by default (-t 3)
248
- - For 3+ images, extend the pattern: [v0][v1][v2]concat=n=3:v=1:a=0
249
-
250
- 5. DIMENSION MISMATCH FIXES:
251
- - Never try to concat images with different dimensions directly
252
- - Always normalize dimensions first with scale+pad
253
- - Black padding is preferable to stretching/distorting images
254
-
255
- 6. SLIDESHOW TRANSITIONS:
256
- - For fade transitions, add fade=t=in:st=0:d=0.5,fade=t=out:st=2.5:d=0.5 after setsar=1
257
- - Keep transitions simple - complex transitions often fail
258
- - Only add transitions if specifically requested
259
-
260
- 7. SLIDESHOW TIMING:
261
- - Default to 3 seconds per image
262
- - Adjust timing based on user request (e.g., "5 seconds per image")
263
- - Total duration = (number of images × seconds per image)
264
-
265
- Remember: Simpler is better. Only use advanced ffmpeg features if absolutely necessary for the requested output.
266
- """,
267
  },
268
  {
269
  "role": "user",
@@ -413,7 +392,7 @@ def compose_video(
413
  files: list = None,
414
  top_p: float = 0.7,
415
  temperature: float = 0.1,
416
- model_choice: str = "deepseek-ai/DeepSeek-V3",
417
  ) -> str:
418
  """
419
  Compose videos from existing media assets using natural language instructions.
@@ -443,7 +422,7 @@ def update(
443
  prompt,
444
  top_p=1,
445
  temperature=1,
446
- model_choice="deepseek-ai/DeepSeek-V3",
447
  ):
448
  if prompt == "":
449
  raise gr.Error("Please enter a prompt.")
@@ -601,7 +580,7 @@ with gr.Blocks() as demo:
601
  gr.Markdown(
602
  """
603
  # 🏞 AI Video Composer
604
- Compose new videos from your assets using natural language. Add video, image and audio assets and let [DeepSeek-V3](https://huggingface.co/deepseek-ai/DeepSeek-V3-0324) generate a new video for you (using FFMPEG).
605
  """,
606
  elem_id="header",
607
  )
 
22
 
23
  # Supported models configuration
24
  MODELS = {
25
+ "zai-org/GLM-4.7": {
26
+ "base_url": "https://router.huggingface.co/v1",
27
  "env_key": "HF_TOKEN",
28
+ "model_name": "zai-org/GLM-4.7",
29
  },
30
  }
31
 
 
163
  files_info_string += f"| {file_info['type']} | {file_info['name']} | {dimensions} | {duration} | {audio} |\n"
164
 
165
  # Build the user message with optional error feedback
166
+ user_content = f"""## AVAILABLE ASSETS
 
 
 
167
 
168
  {files_info_string}
169
 
170
+ ## TASK
171
+ {prompt}
172
+
173
+ ## REQUIREMENTS
174
+ - Output format: MP4 video saved as "output.mp4"
175
+ - Generate a single, complete FFmpeg command
176
+ - Command must work with the exact filenames listed above
177
 
178
+ Think briefly about the approach, then output the FFmpeg command in a ```bash code block."""
 
179
 
180
  # Add error feedback if this is a retry
181
  if previous_error and previous_command:
 
209
  messages = [
210
  {
211
  "role": "system",
212
+ "content": """You are an expert FFmpeg engineer. Generate precise, working FFmpeg commands.
213
+
214
+ ## OUTPUT FORMAT
215
+ 1. Brief analysis (2-3 sentences max)
216
+ 2. Single FFmpeg command in a ```bash code block
217
+ 3. Output file must be "output.mp4"
218
+
219
+ ## CORE RULES
220
+ - ONE command only, no chaining (no && or ;)
221
+ - Use exact filenames from the asset list
222
+ - Keep commands as simple as possible
223
+ - Always use: -c:v libx264 -pix_fmt yuv420p -movflags +faststart
224
+
225
+ ## SLIDESHOW PATTERN (for multiple images)
226
+ When combining images with different dimensions:
227
+ ```bash
228
+ ffmpeg -loop 1 -t 3 -i img1.jpg -loop 1 -t 3 -i img2.jpg -filter_complex "[0]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1[v0];[1]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1[v1];[v0][v1]concat=n=2:v=1:a=0" -c:v libx264 -pix_fmt yuv420p output.mp4
229
+ ```
230
+ - Default: 1920x1080, 3 seconds per image
231
+ - Vertical/portrait/TikTok: use 1080x1920
232
+ - Always scale+pad to normalize dimensions
233
+
234
+ ## AUDIO WAVEFORM
235
+ For waveform visualization:
236
+ ```bash
237
+ ffmpeg -i audio.mp3 -i bg.png -filter_complex "[0:a]showwaves=s=1920x200:mode=line:colors=white[wave];[1]scale=1920:1080[bg];[bg][wave]overlay=(W-w)/2:(H-h)/2" -c:v libx264 -c:a aac output.mp4
238
+ ```
239
+
240
+ ## WITH BACKGROUND MUSIC
241
+ Add audio to video/slideshow:
242
+ ```bash
243
+ ffmpeg ... -i music.mp3 -map "[vout]" -map N:a -shortest -c:a aac output.mp4
244
+ ```
245
+ Where N is the audio input index.""",
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
246
  },
247
  {
248
  "role": "user",
 
392
  files: list = None,
393
  top_p: float = 0.7,
394
  temperature: float = 0.1,
395
+ model_choice: str = "zai-org/GLM-4.7",
396
  ) -> str:
397
  """
398
  Compose videos from existing media assets using natural language instructions.
 
422
  prompt,
423
  top_p=1,
424
  temperature=1,
425
+ model_choice="zai-org/GLM-4.7",
426
  ):
427
  if prompt == "":
428
  raise gr.Error("Please enter a prompt.")
 
580
  gr.Markdown(
581
  """
582
  # 🏞 AI Video Composer
583
+ Compose new videos from your assets using natural language. Add video, image and audio assets and let [GLM-4.7](https://huggingface.co/zai-org/GLM-4.7) generate a new video for you (using FFMPEG).
584
  """,
585
  elem_id="header",
586
  )