banao-tech commited on
Commit
e0ab64b
·
verified ·
1 Parent(s): d4cbec4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +61 -14
README.md CHANGED
@@ -1,14 +1,61 @@
1
- ---
2
- title: Model Testing
3
- emoji: 📉
4
- colorFrom: indigo
5
- colorTo: red
6
- sdk: gradio
7
- sdk_version: 6.5.1
8
- app_file: app.py
9
- pinned: false
10
- license: apache-2.0
11
- short_description: Space for testing new open source AI Avatar models
12
- ---
13
-
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # LatentSync on Hugging Face Spaces (T4 GPU)
2
+
3
+ This is a working implementation of LatentSync 1.5 for lip-sync generation on Hugging Face Spaces with T4 GPU.
4
+
5
+ ## Key Fixes Applied
6
+
7
+ 1. **Config Path Fixed**: Changed from `configs/unet.yaml` to `configs/unet/stage2.yaml`
8
+ 2. **Requirements Optimized**: Properly formatted with newlines between packages
9
+ 3. **Python Version**: Using Python 3.10.13 as specified in `runtime.txt`
10
+
11
+ ## Files Required
12
+
13
+ - `app.py` - Main application (UPDATED with correct config path)
14
+ - `requirements.txt` - Python dependencies (UPDATED with proper formatting)
15
+ - `packages.txt` - System packages (ffmpeg, git)
16
+ - `runtime.txt` - Should contain: `python-3.10.13`
17
+
18
+ ## How It Works
19
+
20
+ 1. Clones LatentSync repository at runtime
21
+ 2. Downloads model checkpoints from `ByteDance/LatentSync-1.5`
22
+ 3. Converts input image + audio to a static video
23
+ 4. Runs lip-sync inference
24
+ 5. Returns the generated video
25
+
26
+ ## Model Notes
27
+
28
+ - Using **LatentSync 1.5** which works better on T4 GPU (16GB)
29
+ - Config: `configs/unet/stage2.yaml` (standard stage 2 config)
30
+ - Alternative: For v1.6, use `configs/unet/stage2_512.yaml` and update `HF_CKPT_REPO` to `ByteDance/LatentSync-1.6`
31
+
32
+ ## Inference Parameters
33
+
34
+ - **Inference Steps**: 10-40 (default 20)
35
+ - **Guidance Scale**: 0.8-2.0 (default 1.0)
36
+ - **Seed**: For reproducibility
37
+ - **DeepCache**: Enabled by default for faster inference
38
+
39
+ ## GPU Requirements
40
+
41
+ - T4 Small (16GB) - Works with LatentSync 1.5
42
+ - Inference takes ~30-60 seconds per generation
43
+
44
+ ## Common Issues
45
+
46
+ ### If you get "FileNotFoundError: configs/unet.yaml"
47
+ - Make sure you're using the updated `app.py` with the correct path: `configs/unet/stage2.yaml`
48
+
49
+ ### If you get CUDA out of memory
50
+ - Reduce inference steps to 15
51
+ - Make sure DeepCache is enabled
52
+ - Use smaller input images (256x256 recommended)
53
+
54
+ ### If output quality is poor
55
+ - Try increasing guidance_scale to 1.5-2.0
56
+ - Increase inference_steps to 30-40
57
+ - For v1.6, switch to `stage2_512.yaml` config for better quality
58
+
59
+ ## Credits
60
+
61
+ Based on [LatentSync by ByteDance](https://github.com/bytedance/LatentSync)