Spaces:
Running
Running
Upload folder using huggingface_hub
Browse files- AGENTS.md +38 -0
- CLAUDE.md +175 -0
- README.md +4 -2
- app.py +6 -3
- requirements.txt +17 -8
AGENTS.md
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Repository Guidelines
|
| 2 |
+
|
| 3 |
+
## Project Structure & Module Organization
|
| 4 |
+
- `app.py`: Gradio entrypoint orchestrating model initialization, audio preprocessing, and inference flow.
|
| 5 |
+
- `src/`: Supporting modules (`audio_analysis/` for wav2vec2 utilities, `vram_management/` for GPU-safe layers, `utils.py` helpers).
|
| 6 |
+
- `utils/`: Infrastructure helpers (`model_loader.py` for WAN/InfiniteTalk weights, `gpu_manager.py` for memory checks/cleanup).
|
| 7 |
+
- `wan/`: Upstream InfiniteTalk model code; treat as vendor code when updating.
|
| 8 |
+
- `assets/` and `examples/`: UI assets and sample media for quick demos; safe to extend.
|
| 9 |
+
- `requirements.txt`, `packages.txt`, `Dockerfile`: Deployment dependencies (note: PyTorch + flash-attn installed via Dockerfile/HF build, not from requirements).
|
| 10 |
+
|
| 11 |
+
## Setup, Build, and Local Run
|
| 12 |
+
- Create an isolated env: `python -m venv .venv && source .venv/bin/activate`.
|
| 13 |
+
- Install Python deps: `pip install -r requirements.txt` (PyTorch/flash-attn come from the base image or HuggingFace Space build).
|
| 14 |
+
- Launch UI locally: `python app.py` (Gradio on port 7860 by default).
|
| 15 |
+
- Quick sanity check: `python -m py_compile app.py` to catch syntax errors before pushing.
|
| 16 |
+
- Docker-based run (mirrors HF build): `docker build -t infinitetalk . && docker run -p 7860:7860 infinitetalk`.
|
| 17 |
+
|
| 18 |
+
## Coding Style & Naming Conventions
|
| 19 |
+
- Python 3.10+, PEP 8 with 4-space indentation; favor type hints where practical.
|
| 20 |
+
- Functions/variables: `snake_case`; classes: `PascalCase`; constants: `UPPER_SNAKE_CASE`.
|
| 21 |
+
- Prefer `logging` over `print` (consistent with existing modules); keep log level INFO for user-facing runs.
|
| 22 |
+
- Add concise docstrings for public functions; keep module-level comments minimal and purposeful.
|
| 23 |
+
|
| 24 |
+
## Testing Guidelines
|
| 25 |
+
- No automated test suite yet; aim to add `pytest`-style tests under `tests/` mirroring `src/` modules.
|
| 26 |
+
- Until then, validate with: (1) `python -m py_compile` for syntax, (2) short inference smoke test using `examples/` media at 480p/30β40 steps.
|
| 27 |
+
- When adding tests, name files `test_<module>.py` and target functional paths (audio preprocessing, GPU guardrails, model loader paths).
|
| 28 |
+
|
| 29 |
+
## Commit & Pull Request Guidelines
|
| 30 |
+
- Repository has no historical git log; use Conventional Commits (`feat:`, `fix:`, `docs:`, `chore:`) for clarity.
|
| 31 |
+
- One topic per commit; keep messages imperative and β€72 chars in the subject.
|
| 32 |
+
- PRs should include: brief summary of behavior change, commands run (tests or smoke steps), any new dependencies, and before/after screenshots or sample outputs if UI/inference is affected.
|
| 33 |
+
- Avoid committing large model weights or cached downloads; rely on `ModelManager` to fetch at runtime and .gitignore caches.
|
| 34 |
+
|
| 35 |
+
## Security & Configuration Tips
|
| 36 |
+
- For private models, set `HF_TOKEN` in the environment/Space secrets; do not hardcode secrets.
|
| 37 |
+
- Respect GPU limits in `gpu_manager.py` when adjusting defaults; keep ZeroGPU duration estimates in mind.
|
| 38 |
+
- Large files: keep under repo size limits; store extra assets in external storage or release artifacts.
|
CLAUDE.md
ADDED
|
@@ -0,0 +1,175 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# CLAUDE.md
|
| 2 |
+
|
| 3 |
+
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
| 4 |
+
|
| 5 |
+
## Project Overview
|
| 6 |
+
|
| 7 |
+
InfiniteTalk is a talking video generator that creates realistic talking head videos with accurate lip-sync. It supports two modes:
|
| 8 |
+
- **Image-to-Video**: Transform static portraits into talking videos using audio input
|
| 9 |
+
- **Video Dubbing**: Re-sync existing videos with new audio while maintaining natural movements
|
| 10 |
+
|
| 11 |
+
Built on the Wan2.1 diffusion model with specialized audio conditioning for photorealistic results.
|
| 12 |
+
|
| 13 |
+
## Architecture
|
| 14 |
+
|
| 15 |
+
### Core Components
|
| 16 |
+
|
| 17 |
+
**Main Application** (`app.py`)
|
| 18 |
+
- Gradio interface with ZeroGPU support via `@spaces.GPU(duration=180)` decorator
|
| 19 |
+
- Two-tab interface: Image-to-Video and Video Dubbing
|
| 20 |
+
- Lazy model loading on first inference to minimize startup time
|
| 21 |
+
- Global `ModelManager` and `GPUManager` instances for resource management
|
| 22 |
+
|
| 23 |
+
**Model Pipeline** (`wan/multitalk.py`)
|
| 24 |
+
- `InfiniteTalkPipeline`: Main generation pipeline using Wan2.1-I2V-14B model
|
| 25 |
+
- Supports two resolutions: 480p (640x640) and 720p (960x960)
|
| 26 |
+
- Uses diffusion-based generation with audio conditioning
|
| 27 |
+
- Implements chunked processing for long videos to manage memory
|
| 28 |
+
|
| 29 |
+
**Audio Processing** (`src/audio_analysis/wav2vec2.py`)
|
| 30 |
+
- Custom `Wav2Vec2Model` extending HuggingFace's implementation
|
| 31 |
+
- Extracts audio embeddings with temporal interpolation via `linear_interpolation`
|
| 32 |
+
- Processes audio at 16kHz with loudness normalization (pyloudnorm)
|
| 33 |
+
- Stacks hidden states from all encoder layers for rich audio representation
|
| 34 |
+
|
| 35 |
+
**Model Management** (`utils/model_loader.py`)
|
| 36 |
+
- `ModelManager`: Handles lazy loading and caching of models from HuggingFace Hub
|
| 37 |
+
- Downloads three model types:
|
| 38 |
+
- Wan2.1-I2V-14B: Main video generation model (Kijai/WanVideo_comfy)
|
| 39 |
+
- InfiniteTalk weights: Specialized talking head weights (MeiGen-AI/InfiniteTalk)
|
| 40 |
+
- Wav2Vec2: Audio encoder (TencentGameMate/chinese-wav2vec2-base)
|
| 41 |
+
- Models cached in `HF_HOME` or `/data/.huggingface`
|
| 42 |
+
|
| 43 |
+
**GPU Management** (`utils/gpu_manager.py`)
|
| 44 |
+
- `GPUManager`: Monitors memory usage and performs cleanup
|
| 45 |
+
- Calculates ZeroGPU duration based on video length and resolution
|
| 46 |
+
- Memory estimation: ~20GB base + 0.8GB/s (480p) or 1.5GB/s (720p)
|
| 47 |
+
- Recommends chunking for videos requiring >50GB memory
|
| 48 |
+
|
| 49 |
+
**Configuration** (`wan/configs/__init__.py`)
|
| 50 |
+
- `WAN_CONFIGS`: Model configurations for different tasks (t2v, i2v, infinitetalk)
|
| 51 |
+
- `SIZE_CONFIGS`: Resolution mappings (infinitetalk-480: 640x640, infinitetalk-720: 960x960)
|
| 52 |
+
- `SUPPORTED_SIZES`: Valid resolution options per model type
|
| 53 |
+
|
| 54 |
+
### Data Flow
|
| 55 |
+
|
| 56 |
+
1. **Audio Processing**: Audio file β librosa load β loudness normalization β Wav2Vec2 feature extraction β audio embeddings (shape: [seq_len, batch, dim])
|
| 57 |
+
2. **Input Processing**: Image/video β PIL/cache_video β frame extraction β resize and center crop to target resolution
|
| 58 |
+
3. **Generation**: InfiniteTalk pipeline combines visual input + audio embeddings β diffusion sampling β video tensor
|
| 59 |
+
4. **Output**: Video tensor β save_video_ffmpeg with audio track β MP4 file
|
| 60 |
+
|
| 61 |
+
### Key Design Patterns
|
| 62 |
+
|
| 63 |
+
- **Lazy Loading**: Models only loaded on first inference to reduce cold start time
|
| 64 |
+
- **Memory Management**: Aggressive cleanup with `torch.cuda.empty_cache()` and `gc.collect()` after generation
|
| 65 |
+
- **ZeroGPU Integration**: `@spaces.GPU` decorator with calculated duration based on video length
|
| 66 |
+
- **Offloading**: Models can be offloaded to CPU between forward passes to save VRAM
|
| 67 |
+
|
| 68 |
+
## Development Commands
|
| 69 |
+
|
| 70 |
+
### Docker Build and Run
|
| 71 |
+
```bash
|
| 72 |
+
# Build Docker image
|
| 73 |
+
docker build -t infinitetalk .
|
| 74 |
+
|
| 75 |
+
# Run locally
|
| 76 |
+
docker run -p 7860:7860 --gpus all infinitetalk
|
| 77 |
+
```
|
| 78 |
+
|
| 79 |
+
### Python Environment
|
| 80 |
+
```bash
|
| 81 |
+
# Install dependencies (requires PyTorch 2.5.1+ for xfuser compatibility)
|
| 82 |
+
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu121
|
| 83 |
+
pip install flash-attn==2.7.4.post1 --no-build-isolation # Optional, may fail on some systems
|
| 84 |
+
pip install -r requirements.txt
|
| 85 |
+
|
| 86 |
+
# Run application
|
| 87 |
+
python app.py
|
| 88 |
+
```
|
| 89 |
+
|
| 90 |
+
### System Dependencies
|
| 91 |
+
Required packages (see `packages.txt`):
|
| 92 |
+
- ffmpeg (video processing)
|
| 93 |
+
- build-essential (compilation)
|
| 94 |
+
- libsndfile1 (audio I/O)
|
| 95 |
+
- git (model downloads)
|
| 96 |
+
|
| 97 |
+
## Important Implementation Details
|
| 98 |
+
|
| 99 |
+
### Resolution Handling
|
| 100 |
+
- User selects "480p" or "720p" in UI
|
| 101 |
+
- Internally mapped to `infinitetalk-480` (640x640) or `infinitetalk-720` (960x960)
|
| 102 |
+
- `sample_shift` parameter: 7 for 480p, 11 for 720p (controls diffusion sampling)
|
| 103 |
+
|
| 104 |
+
### Audio Embedding Format
|
| 105 |
+
Audio embeddings must be saved as `.pt` files in the format expected by the pipeline:
|
| 106 |
+
```python
|
| 107 |
+
audio_embeddings = torch.stack(embeddings.hidden_states[1:], dim=1).squeeze(0)
|
| 108 |
+
audio_embeddings = rearrange(audio_embeddings, "b s d -> s b d") # Shape: [seq_len, batch, dim]
|
| 109 |
+
torch.save(audio_embeddings, emb_path)
|
| 110 |
+
```
|
| 111 |
+
|
| 112 |
+
### Pipeline Input Format
|
| 113 |
+
The `generate_infinitetalk` method expects:
|
| 114 |
+
```python
|
| 115 |
+
input_clip = {
|
| 116 |
+
"prompt": "", # Empty for talking head
|
| 117 |
+
"cond_video": image_or_video_path,
|
| 118 |
+
"cond_audio": {"person1": embedding_path},
|
| 119 |
+
"video_audio": audio_wav_path
|
| 120 |
+
}
|
| 121 |
+
```
|
| 122 |
+
|
| 123 |
+
### ZeroGPU Duration Calculation
|
| 124 |
+
```python
|
| 125 |
+
base_time = 60 # Model loading
|
| 126 |
+
processing_rate = 2.5 (480p) or 3.5 (720p) # Seconds per video second
|
| 127 |
+
duration = int((base_time + video_duration * processing_rate) * 1.2) # 20% safety margin
|
| 128 |
+
duration = min(duration, 300) # Cap at 300s for free tier
|
| 129 |
+
```
|
| 130 |
+
|
| 131 |
+
### Memory Optimization
|
| 132 |
+
- Use `offload_model=True` in pipeline to offload between forwards
|
| 133 |
+
- Enable VRAM management for low-memory scenarios: `pipeline.enable_vram_management()`
|
| 134 |
+
- Flash-attention (if available) reduces memory usage significantly
|
| 135 |
+
- Chunked processing for videos >15s (480p) or >10s (720p)
|
| 136 |
+
|
| 137 |
+
## HuggingFace Space Deployment
|
| 138 |
+
|
| 139 |
+
This project is designed for HuggingFace Spaces with ZeroGPU:
|
| 140 |
+
- SDK: `docker` (specified in README.md frontmatter)
|
| 141 |
+
- Hardware: `zero-gpu` (H200 with 70GB VRAM)
|
| 142 |
+
- Port: `7860` (Gradio default)
|
| 143 |
+
- First generation downloads ~15GB of models (2-3 minutes)
|
| 144 |
+
- Subsequent generations: ~40s for 10s video at 480p
|
| 145 |
+
|
| 146 |
+
See `DEPLOYMENT.md` for detailed deployment instructions and troubleshooting.
|
| 147 |
+
|
| 148 |
+
## Common Pitfalls
|
| 149 |
+
|
| 150 |
+
1. **Flash-attn compilation**: May fail on some systems. The Dockerfile handles this gracefully with `|| echo "Warning..."` fallback
|
| 151 |
+
2. **PyTorch version**: Must use 2.5.1+ for xfuser's `torch.distributed.tensor.experimental` support
|
| 152 |
+
3. **Audio sample rate**: Must be 16kHz for Wav2Vec2 model
|
| 153 |
+
4. **Frame format**: Pipeline expects 4n+1 frames (e.g., 81 frames) for proper temporal modeling
|
| 154 |
+
5. **Model paths**: InfiniteTalk weights must be loaded separately from base Wan model
|
| 155 |
+
6. **TOKENIZERS_PARALLELISM**: Set to 'false' to avoid deadlocks in multi-threaded environments
|
| 156 |
+
|
| 157 |
+
## File Structure
|
| 158 |
+
|
| 159 |
+
```
|
| 160 |
+
βββ app.py # Main Gradio application
|
| 161 |
+
βββ Dockerfile # Docker build configuration
|
| 162 |
+
βββ requirements.txt # Python dependencies
|
| 163 |
+
βββ packages.txt # System dependencies
|
| 164 |
+
βββ utils/
|
| 165 |
+
β βββ model_loader.py # Model download and loading
|
| 166 |
+
β βββ gpu_manager.py # GPU memory management
|
| 167 |
+
βββ wan/
|
| 168 |
+
β βββ multitalk.py # InfiniteTalk pipeline
|
| 169 |
+
β βββ configs/ # Model configurations
|
| 170 |
+
β βββ modules/ # Model architecture (VAE, DiT, etc.)
|
| 171 |
+
β βββ utils/ # Video/audio utilities
|
| 172 |
+
βββ src/
|
| 173 |
+
βββ audio_analysis/
|
| 174 |
+
βββ wav2vec2.py # Audio encoder with interpolation
|
| 175 |
+
```
|
README.md
CHANGED
|
@@ -3,8 +3,10 @@ title: InfiniteTalk - Talking Video Generator
|
|
| 3 |
emoji: π¬
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
| 6 |
-
sdk:
|
| 7 |
-
|
|
|
|
|
|
|
| 8 |
pinned: false
|
| 9 |
license: apache-2.0
|
| 10 |
hardware: zero-gpu
|
|
|
|
| 3 |
emoji: π¬
|
| 4 |
colorFrom: blue
|
| 5 |
colorTo: purple
|
| 6 |
+
sdk: gradio
|
| 7 |
+
sdk_version: 5.6.0
|
| 8 |
+
python_version: "3.10"
|
| 9 |
+
app_file: app.py
|
| 10 |
pinned: false
|
| 11 |
license: apache-2.0
|
| 12 |
hardware: zero-gpu
|
app.py
CHANGED
|
@@ -5,14 +5,17 @@ Gradio Space with ZeroGPU support
|
|
| 5 |
|
| 6 |
import os
|
| 7 |
import sys
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
import random
|
| 9 |
import logging
|
| 10 |
import warnings
|
| 11 |
from pathlib import Path
|
| 12 |
|
| 13 |
-
# Prevent torchvision from registering optional CUDA/Meta ops (nms) that may be missing on ZeroGPU
|
| 14 |
-
os.environ.setdefault("TORCHVISION_DISABLE_META_REGISTRATIONS", "1")
|
| 15 |
-
|
| 16 |
import gradio as gr
|
| 17 |
import torch
|
| 18 |
import numpy as np
|
|
|
|
| 5 |
|
| 6 |
import os
|
| 7 |
import sys
|
| 8 |
+
|
| 9 |
+
# CRITICAL: Set environment variables BEFORE any torch/torchvision imports
|
| 10 |
+
# This prevents torchvision from registering CUDA ops that don't exist on ZeroGPU at import time
|
| 11 |
+
os.environ["TORCHVISION_DISABLE_META_REGISTRATIONS"] = "1"
|
| 12 |
+
os.environ["TORCH_LOGS"] = "-all" # Reduce torch logging noise
|
| 13 |
+
|
| 14 |
import random
|
| 15 |
import logging
|
| 16 |
import warnings
|
| 17 |
from pathlib import Path
|
| 18 |
|
|
|
|
|
|
|
|
|
|
| 19 |
import gradio as gr
|
| 20 |
import torch
|
| 21 |
import numpy as np
|
requirements.txt
CHANGED
|
@@ -1,19 +1,26 @@
|
|
| 1 |
-
#
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
| 3 |
|
| 4 |
-
#
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
| 6 |
transformers>=4.49.0
|
| 7 |
tokenizers>=0.20.3
|
| 8 |
diffusers>=0.31.0
|
| 9 |
accelerate>=1.1.1
|
| 10 |
einops
|
|
|
|
| 11 |
|
| 12 |
-
#
|
| 13 |
gradio>=5.0.0
|
| 14 |
spaces
|
| 15 |
|
| 16 |
-
#
|
| 17 |
opencv-python-headless>=4.9.0.80
|
| 18 |
moviepy==1.0.3
|
| 19 |
imageio
|
|
@@ -21,13 +28,14 @@ imageio-ffmpeg
|
|
| 21 |
scikit-image
|
| 22 |
decord
|
| 23 |
scenedetect
|
|
|
|
| 24 |
|
| 25 |
-
#
|
| 26 |
librosa
|
| 27 |
soundfile
|
| 28 |
pyloudnorm
|
| 29 |
|
| 30 |
-
#
|
| 31 |
tqdm
|
| 32 |
numpy>=1.23.5,<2
|
| 33 |
easydict
|
|
@@ -35,3 +43,4 @@ ftfy
|
|
| 35 |
loguru
|
| 36 |
optimum-quanto==0.2.6
|
| 37 |
xfuser>=0.4.1
|
|
|
|
|
|
| 1 |
+
# PyTorch - must be installed first (HuggingFace Spaces handles CUDA)
|
| 2 |
+
--extra-index-url https://download.pytorch.org/whl/cu121
|
| 3 |
+
torch==2.5.1
|
| 4 |
+
torchvision==0.20.1
|
| 5 |
+
torchaudio==2.5.1
|
| 6 |
|
| 7 |
+
# Flash attention (optional - may fail on some systems)
|
| 8 |
+
# flash-attn
|
| 9 |
+
|
| 10 |
+
# Core ML libraries
|
| 11 |
+
xformers==0.0.28.post3
|
| 12 |
transformers>=4.49.0
|
| 13 |
tokenizers>=0.20.3
|
| 14 |
diffusers>=0.31.0
|
| 15 |
accelerate>=1.1.1
|
| 16 |
einops
|
| 17 |
+
safetensors
|
| 18 |
|
| 19 |
+
# Gradio and Spaces
|
| 20 |
gradio>=5.0.0
|
| 21 |
spaces
|
| 22 |
|
| 23 |
+
# Video/Image processing
|
| 24 |
opencv-python-headless>=4.9.0.80
|
| 25 |
moviepy==1.0.3
|
| 26 |
imageio
|
|
|
|
| 28 |
scikit-image
|
| 29 |
decord
|
| 30 |
scenedetect
|
| 31 |
+
pillow
|
| 32 |
|
| 33 |
+
# Audio processing
|
| 34 |
librosa
|
| 35 |
soundfile
|
| 36 |
pyloudnorm
|
| 37 |
|
| 38 |
+
# Utilities
|
| 39 |
tqdm
|
| 40 |
numpy>=1.23.5,<2
|
| 41 |
easydict
|
|
|
|
| 43 |
loguru
|
| 44 |
optimum-quanto==0.2.6
|
| 45 |
xfuser>=0.4.1
|
| 46 |
+
huggingface_hub
|