| # π¬ Local AI Video Generator | |
| Generate AI videos **completely locally** on your computer using CogVideoX-2B model! | |
| ## π Features | |
| - β **100% Local** - No API keys, no cloud services, runs on your computer | |
| - π **CogVideoX-2B** - State-of-the-art text-to-video model by Tsinghua University | |
| - π₯ **6-second videos** - Generate 49 frames at 8 fps (720p quality) | |
| - π» **GPU or CPU** - Works on both (GPU recommended for speed) | |
| - π¨ **Simple UI** - Clean web interface for easy video generation | |
| ## π Requirements | |
| ### Hardware Requirements | |
| **Minimum (CPU):** | |
| - 16GB RAM | |
| - 10GB free disk space | |
| - Generation time: 5-10 minutes per video | |
| **Recommended (GPU):** | |
| - NVIDIA GPU with 8GB+ VRAM (RTX 3060 or better) | |
| - 16GB RAM | |
| - 10GB free disk space | |
| - Generation time: 30-120 seconds per video | |
| ### Software Requirements | |
| - Python 3.9 or higher | |
| - CUDA 11.8+ (for GPU acceleration) | |
| ## π Quick Start | |
| ### 1. Install Dependencies | |
| ```bash | |
| # Install PyTorch with CUDA support (for GPU) | |
| pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 | |
| # Or install PyTorch for CPU only | |
| pip install torch torchvision torchaudio | |
| # Install other requirements | |
| pip install -r requirements_local.txt | |
| ``` | |
| ### 2. Run the Backend | |
| ```bash | |
| python backend_local.py | |
| ``` | |
| The server will start on `http://localhost:5000` | |
| **First Run Notes:** | |
| - The model (~5GB) will be downloaded automatically | |
| - This happens only once | |
| - Subsequent runs will be much faster | |
| ### 3. Open the Web Interface | |
| Open `index_local.html` in your browser: | |
| ```bash | |
| # On macOS | |
| open index_local.html | |
| # On Linux | |
| xdg-open index_local.html | |
| # On Windows | |
| start index_local.html | |
| ``` | |
| Or manually open: `http://localhost:5000` and navigate to the HTML file | |
| ### 4. Initialize the Model | |
| 1. Click the **"π Initialize Model"** button in the UI | |
| 2. Wait 2-5 minutes for the model to load | |
| 3. Once loaded, you can start generating videos! | |
| ### 5. Generate Videos | |
| 1. Enter a descriptive prompt (e.g., "A cat playing with a ball of yarn") | |
| 2. Click **"π¬ Generate Video"** | |
| 3. Wait 30-120 seconds (GPU) or 5-10 minutes (CPU) | |
| 4. Download or share your video! | |
| ## π Example Prompts | |
| - "A golden retriever running through a field of flowers at sunset" | |
| - "Ocean waves crashing on a beach, aerial view" | |
| - "A bird flying through clouds, slow motion" | |
| - "City street with cars at night, neon lights" | |
| - "Flowers blooming in a garden, time-lapse" | |
| ## π― Tips for Best Results | |
| 1. **Be Descriptive** - Include details about lighting, camera angle, movement | |
| 2. **Keep it Simple** - Focus on one main subject or action | |
| 3. **Use Cinematic Terms** - "aerial view", "close-up", "slow motion", etc. | |
| 4. **GPU Recommended** - Much faster generation (30-120s vs 5-10min) | |
| 5. **First Generation** - May take longer as model initializes | |
| ## π§ Troubleshooting | |
| ### Model Not Loading | |
| - **Issue**: Model fails to download or load | |
| - **Solution**: Check internet connection, ensure 10GB free disk space | |
| ### Out of Memory (GPU) | |
| - **Issue**: CUDA out of memory error | |
| - **Solution**: Close other GPU applications, or use CPU mode | |
| ### Slow Generation (CPU) | |
| - **Issue**: Takes 5-10 minutes per video | |
| - **Solution**: This is normal for CPU. Consider using a GPU for faster generation | |
| ### Server Won't Start | |
| - **Issue**: Port 5000 already in use | |
| - **Solution**: Change port in `backend_local.py` (line 33): `FLASK_PORT = 5001` | |
| ### Video Quality Issues | |
| - **Issue**: Video looks blurry or low quality | |
| - **Solution**: This is expected for the 2B model. For better quality, upgrade to CogVideoX-5B (requires more VRAM) | |
| ## π Performance Benchmarks | |
| | Hardware | Model Load Time | Generation Time | Quality | | |
| |----------|----------------|-----------------|---------| | |
| | RTX 4090 | 1-2 min | 30-45 sec | Excellent | | |
| | RTX 3060 | 2-3 min | 60-90 sec | Good | | |
| | CPU (16GB) | 3-5 min | 5-10 min | Good | | |
| ## π Model Information | |
| - **Model**: CogVideoX-2B | |
| - **Developer**: Tsinghua University (THUDM) | |
| - **License**: Apache 2.0 | |
| - **Size**: ~5GB | |
| - **Output**: 49 frames, 720p, 8 fps (~6 seconds) | |
| ## π File Structure | |
| ``` | |
| hailuo-clone/ | |
| βββ backend_local.py # Local backend server | |
| βββ index_local.html # Web interface for local backend | |
| βββ requirements_local.txt # Python dependencies | |
| βββ README_LOCAL.md # This file | |
| βββ generated_videos/ # Output directory (auto-created) | |
| ``` | |
| ## π Comparison with Cloud Backends | |
| | Feature | Local (backend_local.py) | Cloud (backend_enhanced.py) | | |
| |---------|-------------------------|----------------------------| | |
| | Setup | Complex (install PyTorch, download model) | Simple (just API keys) | | |
| | Cost | Free (one-time setup) | Pay per generation | | |
| | Speed | 30-120s (GPU) or 5-10min (CPU) | 30-60s | | |
| | Privacy | 100% private | Data sent to cloud | | |
| | Quality | Good (2B model) | Excellent (5B+ models) | | |
| | Internet | Only for first download | Required for every generation | | |
| ## π οΈ Advanced Configuration | |
| ### Change Model | |
| Edit `backend_local.py` line 54-56 to use a different model: | |
| ```python | |
| # For better quality (requires 16GB+ VRAM) | |
| pipeline = CogVideoXPipeline.from_pretrained( | |
| "THUDM/CogVideoX-5b", | |
| torch_dtype=torch.float16 | |
| ) | |
| ``` | |
| ### Adjust Generation Parameters | |
| Edit `backend_local.py` lines 126-132: | |
| ```python | |
| num_frames = 49 # More frames = longer video | |
| guidance_scale = 6.0 # Higher = more prompt adherence | |
| num_inference_steps = 50 # More steps = better quality (slower) | |
| ``` | |
| ### Pre-load Model on Startup | |
| Uncomment lines 232-233 in `backend_local.py`: | |
| ```python | |
| logger.info("Pre-loading model...") | |
| initialize_model() | |
| ``` | |
| ## π Resources | |
| - [CogVideoX GitHub](https://github.com/THUDM/CogVideo) | |
| - [Diffusers Documentation](https://huggingface.co/docs/diffusers) | |
| - [PyTorch Installation](https://pytorch.org/get-started/locally/) | |
| ## π€ Support | |
| If you encounter issues: | |
| 1. Check the console logs in the terminal | |
| 2. Check browser console (F12) for errors | |
| 3. Ensure all dependencies are installed correctly | |
| 4. Verify GPU drivers are up to date (for GPU mode) | |
| ## π License | |
| This project uses CogVideoX-2B which is licensed under Apache 2.0. | |
| --- | |
| **Happy Video Generation! π¬β¨** | |