feat: Switch to GLM-4.7 model with improved prompts
Browse files
README.md
CHANGED
|
@@ -1,6 +1,6 @@
|
|
| 1 |
---
|
| 2 |
title: AI Video Composer
|
| 3 |
-
short_description: Create videos with FFMPEG +
|
| 4 |
emoji: 🏞
|
| 5 |
colorFrom: red
|
| 6 |
colorTo: yellow
|
|
@@ -10,12 +10,12 @@ app_file: app.py
|
|
| 10 |
pinned: false
|
| 11 |
disable_embedding: true
|
| 12 |
models:
|
| 13 |
-
-
|
| 14 |
---
|
| 15 |
|
| 16 |
# 🏞 AI Video Composer
|
| 17 |
|
| 18 |
-
AI Video Composer is an intelligent media processing application that uses natural language instructions to create videos from your media assets. It leverages the
|
| 19 |
|
| 20 |
## How It Works
|
| 21 |
|
|
@@ -44,7 +44,7 @@ AI Video Composer is an intelligent media processing application that uses natur
|
|
| 44 |
|
| 45 |
4. **Processing**:
|
| 46 |
- The app analyzes your files and instructions
|
| 47 |
-
- Generates an optimized FFmpeg command using
|
| 48 |
- Executes the command and returns the processed video
|
| 49 |
- Displays the generated FFmpeg command for transparency
|
| 50 |
|
|
@@ -62,7 +62,7 @@ AI Video Composer is an intelligent media processing application that uses natur
|
|
| 62 |
|
| 63 |
- Built with Gradio for the user interface
|
| 64 |
- Uses FFmpeg for media processing
|
| 65 |
-
- Powered by
|
| 66 |
- Implements robust error handling and command validation
|
| 67 |
- Processes files in a temporary directory for safety
|
| 68 |
- Supports both simple operations and complex media transformations
|
|
|
|
| 1 |
---
|
| 2 |
title: AI Video Composer
|
| 3 |
+
short_description: Create videos with FFMPEG + GLM-4.7
|
| 4 |
emoji: 🏞
|
| 5 |
colorFrom: red
|
| 6 |
colorTo: yellow
|
|
|
|
| 10 |
pinned: false
|
| 11 |
disable_embedding: true
|
| 12 |
models:
|
| 13 |
+
- zai-org/GLM-4.7
|
| 14 |
---
|
| 15 |
|
| 16 |
# 🏞 AI Video Composer
|
| 17 |
|
| 18 |
+
AI Video Composer is an intelligent media processing application that uses natural language instructions to create videos from your media assets. It leverages the GLM-4.7 language model to generate FFmpeg commands based on your requirements.
|
| 19 |
|
| 20 |
## How It Works
|
| 21 |
|
|
|
|
| 44 |
|
| 45 |
4. **Processing**:
|
| 46 |
- The app analyzes your files and instructions
|
| 47 |
+
- Generates an optimized FFmpeg command using GLM-4.7
|
| 48 |
- Executes the command and returns the processed video
|
| 49 |
- Displays the generated FFmpeg command for transparency
|
| 50 |
|
|
|
|
| 62 |
|
| 63 |
- Built with Gradio for the user interface
|
| 64 |
- Uses FFmpeg for media processing
|
| 65 |
+
- Powered by GLM-4.7 for command generation
|
| 66 |
- Implements robust error handling and command validation
|
| 67 |
- Processes files in a temporary directory for safety
|
| 68 |
- Supports both simple operations and complex media transformations
|
app.py
CHANGED
|
@@ -22,10 +22,10 @@ import shutil
|
|
| 22 |
|
| 23 |
# Supported models configuration
|
| 24 |
MODELS = {
|
| 25 |
-
"
|
| 26 |
-
"base_url": "https://router.huggingface.co/
|
| 27 |
"env_key": "HF_TOKEN",
|
| 28 |
-
"model_name": "
|
| 29 |
},
|
| 30 |
}
|
| 31 |
|
|
@@ -163,17 +163,19 @@ def get_completion(
|
|
| 163 |
files_info_string += f"| {file_info['type']} | {file_info['name']} | {dimensions} | {duration} | {audio} |\n"
|
| 164 |
|
| 165 |
# Build the user message with optional error feedback
|
| 166 |
-
user_content = f"""
|
| 167 |
-
The current assets and objective follow.
|
| 168 |
-
|
| 169 |
-
AVAILABLE ASSETS LIST:
|
| 170 |
|
| 171 |
{files_info_string}
|
| 172 |
|
| 173 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 174 |
|
| 175 |
-
|
| 176 |
-
Then provide the FFMPEG command that will accomplish this task."""
|
| 177 |
|
| 178 |
# Add error feedback if this is a retry
|
| 179 |
if previous_error and previous_command:
|
|
@@ -207,63 +209,40 @@ FORMAT DETECTION KEYWORDS:
|
|
| 207 |
messages = [
|
| 208 |
{
|
| 209 |
"role": "system",
|
| 210 |
-
"content": """
|
| 211 |
-
|
| 212 |
-
|
| 213 |
-
|
| 214 |
-
|
| 215 |
-
|
| 216 |
-
|
| 217 |
-
|
| 218 |
-
|
| 219 |
-
|
| 220 |
-
|
| 221 |
-
|
| 222 |
-
|
| 223 |
-
|
| 224 |
-
|
| 225 |
-
|
| 226 |
-
|
| 227 |
-
|
| 228 |
-
|
| 229 |
-
|
| 230 |
-
|
| 231 |
-
|
| 232 |
-
|
| 233 |
-
|
| 234 |
-
|
| 235 |
-
|
| 236 |
-
|
| 237 |
-
|
| 238 |
-
|
| 239 |
-
|
| 240 |
-
|
| 241 |
-
|
| 242 |
-
|
| 243 |
-
|
| 244 |
-
- For horizontal: scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2
|
| 245 |
-
- For vertical: scale=1080:1920:force_original_aspect_ratio=decrease,pad=1080:1920:(ow-iw)/2:(oh-ih)/2
|
| 246 |
-
- Always add setsar=1 after padding to fix aspect ratio issues
|
| 247 |
-
- Use 3-second duration per image by default (-t 3)
|
| 248 |
-
- For 3+ images, extend the pattern: [v0][v1][v2]concat=n=3:v=1:a=0
|
| 249 |
-
|
| 250 |
-
5. DIMENSION MISMATCH FIXES:
|
| 251 |
-
- Never try to concat images with different dimensions directly
|
| 252 |
-
- Always normalize dimensions first with scale+pad
|
| 253 |
-
- Black padding is preferable to stretching/distorting images
|
| 254 |
-
|
| 255 |
-
6. SLIDESHOW TRANSITIONS:
|
| 256 |
-
- For fade transitions, add fade=t=in:st=0:d=0.5,fade=t=out:st=2.5:d=0.5 after setsar=1
|
| 257 |
-
- Keep transitions simple - complex transitions often fail
|
| 258 |
-
- Only add transitions if specifically requested
|
| 259 |
-
|
| 260 |
-
7. SLIDESHOW TIMING:
|
| 261 |
-
- Default to 3 seconds per image
|
| 262 |
-
- Adjust timing based on user request (e.g., "5 seconds per image")
|
| 263 |
-
- Total duration = (number of images × seconds per image)
|
| 264 |
-
|
| 265 |
-
Remember: Simpler is better. Only use advanced ffmpeg features if absolutely necessary for the requested output.
|
| 266 |
-
""",
|
| 267 |
},
|
| 268 |
{
|
| 269 |
"role": "user",
|
|
@@ -413,7 +392,7 @@ def compose_video(
|
|
| 413 |
files: list = None,
|
| 414 |
top_p: float = 0.7,
|
| 415 |
temperature: float = 0.1,
|
| 416 |
-
model_choice: str = "
|
| 417 |
) -> str:
|
| 418 |
"""
|
| 419 |
Compose videos from existing media assets using natural language instructions.
|
|
@@ -443,7 +422,7 @@ def update(
|
|
| 443 |
prompt,
|
| 444 |
top_p=1,
|
| 445 |
temperature=1,
|
| 446 |
-
model_choice="
|
| 447 |
):
|
| 448 |
if prompt == "":
|
| 449 |
raise gr.Error("Please enter a prompt.")
|
|
@@ -601,7 +580,7 @@ with gr.Blocks() as demo:
|
|
| 601 |
gr.Markdown(
|
| 602 |
"""
|
| 603 |
# 🏞 AI Video Composer
|
| 604 |
-
Compose new videos from your assets using natural language. Add video, image and audio assets and let [
|
| 605 |
""",
|
| 606 |
elem_id="header",
|
| 607 |
)
|
|
|
|
| 22 |
|
| 23 |
# Supported models configuration
|
| 24 |
MODELS = {
|
| 25 |
+
"zai-org/GLM-4.7": {
|
| 26 |
+
"base_url": "https://router.huggingface.co/v1",
|
| 27 |
"env_key": "HF_TOKEN",
|
| 28 |
+
"model_name": "zai-org/GLM-4.7",
|
| 29 |
},
|
| 30 |
}
|
| 31 |
|
|
|
|
| 163 |
files_info_string += f"| {file_info['type']} | {file_info['name']} | {dimensions} | {duration} | {audio} |\n"
|
| 164 |
|
| 165 |
# Build the user message with optional error feedback
|
| 166 |
+
user_content = f"""## AVAILABLE ASSETS
|
|
|
|
|
|
|
|
|
|
| 167 |
|
| 168 |
{files_info_string}
|
| 169 |
|
| 170 |
+
## TASK
|
| 171 |
+
{prompt}
|
| 172 |
+
|
| 173 |
+
## REQUIREMENTS
|
| 174 |
+
- Output format: MP4 video saved as "output.mp4"
|
| 175 |
+
- Generate a single, complete FFmpeg command
|
| 176 |
+
- Command must work with the exact filenames listed above
|
| 177 |
|
| 178 |
+
Think briefly about the approach, then output the FFmpeg command in a ```bash code block."""
|
|
|
|
| 179 |
|
| 180 |
# Add error feedback if this is a retry
|
| 181 |
if previous_error and previous_command:
|
|
|
|
| 209 |
messages = [
|
| 210 |
{
|
| 211 |
"role": "system",
|
| 212 |
+
"content": """You are an expert FFmpeg engineer. Generate precise, working FFmpeg commands.
|
| 213 |
+
|
| 214 |
+
## OUTPUT FORMAT
|
| 215 |
+
1. Brief analysis (2-3 sentences max)
|
| 216 |
+
2. Single FFmpeg command in a ```bash code block
|
| 217 |
+
3. Output file must be "output.mp4"
|
| 218 |
+
|
| 219 |
+
## CORE RULES
|
| 220 |
+
- ONE command only, no chaining (no && or ;)
|
| 221 |
+
- Use exact filenames from the asset list
|
| 222 |
+
- Keep commands as simple as possible
|
| 223 |
+
- Always use: -c:v libx264 -pix_fmt yuv420p -movflags +faststart
|
| 224 |
+
|
| 225 |
+
## SLIDESHOW PATTERN (for multiple images)
|
| 226 |
+
When combining images with different dimensions:
|
| 227 |
+
```bash
|
| 228 |
+
ffmpeg -loop 1 -t 3 -i img1.jpg -loop 1 -t 3 -i img2.jpg -filter_complex "[0]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1[v0];[1]scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,setsar=1[v1];[v0][v1]concat=n=2:v=1:a=0" -c:v libx264 -pix_fmt yuv420p output.mp4
|
| 229 |
+
```
|
| 230 |
+
- Default: 1920x1080, 3 seconds per image
|
| 231 |
+
- Vertical/portrait/TikTok: use 1080x1920
|
| 232 |
+
- Always scale+pad to normalize dimensions
|
| 233 |
+
|
| 234 |
+
## AUDIO WAVEFORM
|
| 235 |
+
For waveform visualization:
|
| 236 |
+
```bash
|
| 237 |
+
ffmpeg -i audio.mp3 -i bg.png -filter_complex "[0:a]showwaves=s=1920x200:mode=line:colors=white[wave];[1]scale=1920:1080[bg];[bg][wave]overlay=(W-w)/2:(H-h)/2" -c:v libx264 -c:a aac output.mp4
|
| 238 |
+
```
|
| 239 |
+
|
| 240 |
+
## WITH BACKGROUND MUSIC
|
| 241 |
+
Add audio to video/slideshow:
|
| 242 |
+
```bash
|
| 243 |
+
ffmpeg ... -i music.mp3 -map "[vout]" -map N:a -shortest -c:a aac output.mp4
|
| 244 |
+
```
|
| 245 |
+
Where N is the audio input index.""",
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 246 |
},
|
| 247 |
{
|
| 248 |
"role": "user",
|
|
|
|
| 392 |
files: list = None,
|
| 393 |
top_p: float = 0.7,
|
| 394 |
temperature: float = 0.1,
|
| 395 |
+
model_choice: str = "zai-org/GLM-4.7",
|
| 396 |
) -> str:
|
| 397 |
"""
|
| 398 |
Compose videos from existing media assets using natural language instructions.
|
|
|
|
| 422 |
prompt,
|
| 423 |
top_p=1,
|
| 424 |
temperature=1,
|
| 425 |
+
model_choice="zai-org/GLM-4.7",
|
| 426 |
):
|
| 427 |
if prompt == "":
|
| 428 |
raise gr.Error("Please enter a prompt.")
|
|
|
|
| 580 |
gr.Markdown(
|
| 581 |
"""
|
| 582 |
# 🏞 AI Video Composer
|
| 583 |
+
Compose new videos from your assets using natural language. Add video, image and audio assets and let [GLM-4.7](https://huggingface.co/zai-org/GLM-4.7) generate a new video for you (using FFMPEG).
|
| 584 |
""",
|
| 585 |
elem_id="header",
|
| 586 |
)
|