banao-tech commited on
Commit
7e2b460
·
verified ·
1 Parent(s): e0ab64b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -61
README.md CHANGED
@@ -1,61 +1,10 @@
1
- # LatentSync on Hugging Face Spaces (T4 GPU)
2
-
3
- This is a working implementation of LatentSync 1.5 for lip-sync generation on Hugging Face Spaces with T4 GPU.
4
-
5
- ## Key Fixes Applied
6
-
7
- 1. **Config Path Fixed**: Changed from `configs/unet.yaml` to `configs/unet/stage2.yaml`
8
- 2. **Requirements Optimized**: Properly formatted with newlines between packages
9
- 3. **Python Version**: Using Python 3.10.13 as specified in `runtime.txt`
10
-
11
- ## Files Required
12
-
13
- - `app.py` - Main application (UPDATED with correct config path)
14
- - `requirements.txt` - Python dependencies (UPDATED with proper formatting)
15
- - `packages.txt` - System packages (ffmpeg, git)
16
- - `runtime.txt` - Should contain: `python-3.10.13`
17
-
18
- ## How It Works
19
-
20
- 1. Clones LatentSync repository at runtime
21
- 2. Downloads model checkpoints from `ByteDance/LatentSync-1.5`
22
- 3. Converts input image + audio to a static video
23
- 4. Runs lip-sync inference
24
- 5. Returns the generated video
25
-
26
- ## Model Notes
27
-
28
- - Using **LatentSync 1.5** which works better on T4 GPU (16GB)
29
- - Config: `configs/unet/stage2.yaml` (standard stage 2 config)
30
- - Alternative: For v1.6, use `configs/unet/stage2_512.yaml` and update `HF_CKPT_REPO` to `ByteDance/LatentSync-1.6`
31
-
32
- ## Inference Parameters
33
-
34
- - **Inference Steps**: 10-40 (default 20)
35
- - **Guidance Scale**: 0.8-2.0 (default 1.0)
36
- - **Seed**: For reproducibility
37
- - **DeepCache**: Enabled by default for faster inference
38
-
39
- ## GPU Requirements
40
-
41
- - T4 Small (16GB) - Works with LatentSync 1.5
42
- - Inference takes ~30-60 seconds per generation
43
-
44
- ## Common Issues
45
-
46
- ### If you get "FileNotFoundError: configs/unet.yaml"
47
- - Make sure you're using the updated `app.py` with the correct path: `configs/unet/stage2.yaml`
48
-
49
- ### If you get CUDA out of memory
50
- - Reduce inference steps to 15
51
- - Make sure DeepCache is enabled
52
- - Use smaller input images (256x256 recommended)
53
-
54
- ### If output quality is poor
55
- - Try increasing guidance_scale to 1.5-2.0
56
- - Increase inference_steps to 30-40
57
- - For v1.6, switch to `stage2_512.yaml` config for better quality
58
-
59
- ## Credits
60
-
61
- Based on [LatentSync by ByteDance](https://github.com/bytedance/LatentSync)
 
1
+ title: Model Testing
2
+ emoji: 📉
3
+ colorFrom: indigo
4
+ colorTo: red
5
+ sdk: gradio
6
+ sdk_version: 6.5.1
7
+ app_file: app.py
8
+ pinned: false
9
+ license: apache-2.0
10
+ short_description: Space for testing new open source AI Avatar models