File size: 6,200 Bytes
54ed165
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
# 🎬 Local AI Video Generator

Generate AI videos **completely locally** on your computer using CogVideoX-2B model!

## 🌟 Features

- βœ… **100% Local** - No API keys, no cloud services, runs on your computer
- πŸš€ **CogVideoX-2B** - State-of-the-art text-to-video model by Tsinghua University
- πŸŽ₯ **6-second videos** - Generate 49 frames at 8 fps (720p quality)
- πŸ’» **GPU or CPU** - Works on both (GPU recommended for speed)
- 🎨 **Simple UI** - Clean web interface for easy video generation

## πŸ“‹ Requirements

### Hardware Requirements

**Minimum (CPU):**
- 16GB RAM
- 10GB free disk space
- Generation time: 5-10 minutes per video

**Recommended (GPU):**
- NVIDIA GPU with 8GB+ VRAM (RTX 3060 or better)
- 16GB RAM
- 10GB free disk space
- Generation time: 30-120 seconds per video

### Software Requirements

- Python 3.9 or higher
- CUDA 11.8+ (for GPU acceleration)

## πŸš€ Quick Start

### 1. Install Dependencies

```bash
# Install PyTorch with CUDA support (for GPU)
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

# Or install PyTorch for CPU only
pip install torch torchvision torchaudio

# Install other requirements
pip install -r requirements_local.txt
```

### 2. Run the Backend

```bash
python backend_local.py
```

The server will start on `http://localhost:5000`

**First Run Notes:**
- The model (~5GB) will be downloaded automatically
- This happens only once
- Subsequent runs will be much faster

### 3. Open the Web Interface

Open `index_local.html` in your browser:

```bash
# On macOS
open index_local.html

# On Linux
xdg-open index_local.html

# On Windows
start index_local.html
```

Or manually open: `http://localhost:5000` and navigate to the HTML file

### 4. Initialize the Model

1. Click the **"πŸš€ Initialize Model"** button in the UI
2. Wait 2-5 minutes for the model to load
3. Once loaded, you can start generating videos!

### 5. Generate Videos

1. Enter a descriptive prompt (e.g., "A cat playing with a ball of yarn")
2. Click **"🎬 Generate Video"**
3. Wait 30-120 seconds (GPU) or 5-10 minutes (CPU)
4. Download or share your video!

## πŸ“ Example Prompts

- "A golden retriever running through a field of flowers at sunset"
- "Ocean waves crashing on a beach, aerial view"
- "A bird flying through clouds, slow motion"
- "City street with cars at night, neon lights"
- "Flowers blooming in a garden, time-lapse"

## 🎯 Tips for Best Results

1. **Be Descriptive** - Include details about lighting, camera angle, movement
2. **Keep it Simple** - Focus on one main subject or action
3. **Use Cinematic Terms** - "aerial view", "close-up", "slow motion", etc.
4. **GPU Recommended** - Much faster generation (30-120s vs 5-10min)
5. **First Generation** - May take longer as model initializes

## πŸ”§ Troubleshooting

### Model Not Loading
- **Issue**: Model fails to download or load
- **Solution**: Check internet connection, ensure 10GB free disk space

### Out of Memory (GPU)
- **Issue**: CUDA out of memory error
- **Solution**: Close other GPU applications, or use CPU mode

### Slow Generation (CPU)
- **Issue**: Takes 5-10 minutes per video
- **Solution**: This is normal for CPU. Consider using a GPU for faster generation

### Server Won't Start
- **Issue**: Port 5000 already in use
- **Solution**: Change port in `backend_local.py` (line 33): `FLASK_PORT = 5001`

### Video Quality Issues
- **Issue**: Video looks blurry or low quality
- **Solution**: This is expected for the 2B model. For better quality, upgrade to CogVideoX-5B (requires more VRAM)

## πŸ“Š Performance Benchmarks

| Hardware | Model Load Time | Generation Time | Quality |
|----------|----------------|-----------------|---------|
| RTX 4090 | 1-2 min | 30-45 sec | Excellent |
| RTX 3060 | 2-3 min | 60-90 sec | Good |
| CPU (16GB) | 3-5 min | 5-10 min | Good |

## πŸ”„ Model Information

- **Model**: CogVideoX-2B
- **Developer**: Tsinghua University (THUDM)
- **License**: Apache 2.0
- **Size**: ~5GB
- **Output**: 49 frames, 720p, 8 fps (~6 seconds)

## πŸ“ File Structure

```
hailuo-clone/
β”œβ”€β”€ backend_local.py          # Local backend server
β”œβ”€β”€ index_local.html          # Web interface for local backend
β”œβ”€β”€ requirements_local.txt    # Python dependencies
β”œβ”€β”€ README_LOCAL.md          # This file
└── generated_videos/        # Output directory (auto-created)
```

## πŸ†š Comparison with Cloud Backends

| Feature | Local (backend_local.py) | Cloud (backend_enhanced.py) |
|---------|-------------------------|----------------------------|
| Setup | Complex (install PyTorch, download model) | Simple (just API keys) |
| Cost | Free (one-time setup) | Pay per generation |
| Speed | 30-120s (GPU) or 5-10min (CPU) | 30-60s |
| Privacy | 100% private | Data sent to cloud |
| Quality | Good (2B model) | Excellent (5B+ models) |
| Internet | Only for first download | Required for every generation |

## πŸ› οΈ Advanced Configuration

### Change Model

Edit `backend_local.py` line 54-56 to use a different model:

```python
# For better quality (requires 16GB+ VRAM)
pipeline = CogVideoXPipeline.from_pretrained(
    "THUDM/CogVideoX-5b",
    torch_dtype=torch.float16
)
```

### Adjust Generation Parameters

Edit `backend_local.py` lines 126-132:

```python
num_frames = 49          # More frames = longer video
guidance_scale = 6.0     # Higher = more prompt adherence
num_inference_steps = 50 # More steps = better quality (slower)
```

### Pre-load Model on Startup

Uncomment lines 232-233 in `backend_local.py`:

```python
logger.info("Pre-loading model...")
initialize_model()
```

## πŸ“š Resources

- [CogVideoX GitHub](https://github.com/THUDM/CogVideo)
- [Diffusers Documentation](https://huggingface.co/docs/diffusers)
- [PyTorch Installation](https://pytorch.org/get-started/locally/)

## 🀝 Support

If you encounter issues:

1. Check the console logs in the terminal
2. Check browser console (F12) for errors
3. Ensure all dependencies are installed correctly
4. Verify GPU drivers are up to date (for GPU mode)

## πŸ“„ License

This project uses CogVideoX-2B which is licensed under Apache 2.0.

---

**Happy Video Generation! 🎬✨**