.ipynb_checkpoints/README-checkpoint.md DELETED
@@ -1,201 +0,0 @@
1
- # 3D Chibi Text-to-Image (14B) Generation
2
-
3
- This repository contains the necessary steps and scripts to generate **3D chibi-style images** using the **Wan2.1-T2I-14B** text-to-image model with LoRA (Low-Rank Adaptation) weights. The model produces high-quality 3D chibi-style illustrations based on textual prompts, emphasizing vibrant aesthetics, character expressions, and dynamic scenes.
4
-
5
- > ๐Ÿš€ This version is optimized for **text-to-image (t2i)** generation to allow faster testing while maintaining compatibility with future **text-to-video (t2v)** workflows.
6
-
7
- ---
8
-
9
- ## Prerequisites
10
-
11
- Before proceeding, ensure that you have the following installed on your system:
12
-
13
- - **Ubuntu** (or a compatible Linux distribution)
14
- - **Python 3.x**
15
- - **pip** (Python package manager)
16
- - **Git**
17
- - **Git LFS** (Git Large File Storage)
18
-
19
- ---
20
-
21
- ## Installation
22
-
23
- 1. **Update and Install Dependencies**
24
-
25
- ```bash
26
- sudo apt-get update && sudo apt-get install build-essential git-lfs
27
- ```
28
-
29
- 2. **Clone the Repository**
30
-
31
- > โš ๏ธ Note: You can use any existing Wan2.1-compatible repo structure or clone directly from Hugging Face.
32
-
33
- ```bash
34
- git clone https://huggingface.co/svjack/3D_Chibi_wan_2_1_14_B_text2video_lora
35
- cd 3D_Chibi_wan_2_1_14_B_text2video_lora
36
- ```
37
-
38
- 3. **Install Python Dependencies**
39
-
40
- ```bash
41
- pip install torch torchvision
42
- pip install -r requirements.txt
43
- pip install ascii-magic matplotlib tensorboard huggingface_hub datasets
44
- pip install sageattention==1.0.6
45
- ```
46
-
47
- 4. **Download Model Weights**
48
-
49
- > ๐Ÿ“Œ **Note**: You can view previous results in the respective repositories:
50
- - [Xiang_Handsome LoRA](https://huggingface.co/svjack/Xiang_Handsome_wan_2_1_14_B_text2video_lora)
51
- - [Taiga_Aisaka LoRA](https://huggingface.co/svjack/Taiga_Aisaka_wan_2_1_14_B_text2video_lora)
52
- - [Sebastian_Michaelis LoRA](https://huggingface.co/svjack/Sebastian_Michaelis_wan_2_1_14_B_text2video_lora)
53
- - [3D_Chibi LoRA 14B](https://huggingface.co/svjack/3D_Chibi_wan_2_1_14_B_text2video_lora)
54
-
55
- ```bash
56
- # Base Models
57
- wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/wan2.1_t2v_14B_bf16.safetensors
58
- wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/Wan2.1_VAE.pth
59
- wget https://huggingface.co/Wan-AI/Wan2.1-T2V-14B/resolve/main/models_t5_umt5-xxl-enc-bf16.pth
60
-
61
- # LoRA Weights
62
- wget https://huggingface.co/svjack/Xiang_Handsome_wan_2_1_14_B_text2video_lora/resolve/main/Xiang_Handsome_outputs/Xiang_Handsome_w14_lora-000067.safetensors
63
- wget https://huggingface.co/svjack/Taiga_Aisaka_wan_2_1_14_B_text2video_lora/resolve/main/Taiga_Aisaka_w14_outputs/Taiga_Aisaka_w14_lora-000010.safetensors
64
- wget https://huggingface.co/svjack/Sebastian_Michaelis_wan_2_1_14_B_text2video_lora/resolve/main/Sebastian_Michaelis_w14_outputs/Sebastian_Michaelis_w14_lora-000007.safetensors
65
- wget https://huggingface.co/svjack/3D_Chibi_wan_2_1_14_B_text2video_lora/resolve/main/3D_Chibi_w14_outputs/3D_Chibi_w14_lora-000024.safetensors
66
- ```
67
-
68
- ---
69
-
70
- ## Usage
71
-
72
- To generate an image, use the `wan_generate_video.py` script with the `--task t2i-14B` parameter.
73
-
74
- ### Example 1: Xiang InfiniteYou Handsome Style
75
-
76
- ```bash
77
- python wan_generate_video.py --fp8 --task t2i-14B --video_size 480 832 --infer_steps 20 \
78
- --save_path save --output_type both \
79
- --dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
80
- --t5 models_t5_umt5-xxl-enc-bf16.pth \
81
- --attn_mode torch \
82
- --lora_weight Xiang_Handsome_outputs/Xiang_Handsome_w14_lora-000067.safetensors 3D_Chibi_w14_lora-000024.safetensors \
83
- --lora_multiplier 1.0 \
84
- --interactive
85
- ```
86
-
87
- #### Prompt
88
-
89
- ```text
90
- "3D Chibi Style ,In the style of Xiang InfiniteYou Handsome, Xiang, a young person with short, black hair and glasses, stands in a quiet office space. The soft glow of a desk lamp casts a warm light across his thoughtful expression, while the hum of distant keyboards and the faint scent of coffee linger in the air. Outside the window, the city lights twinkle like distant stars, blending with the muted glow of computer screens as the workday stretches on around him."
91
- ```
92
-
93
- -- without 3D_Chibi lora text2video output
94
-
95
- -- with 3D_Chibi lora text2image output
96
-
97
- ---
98
-
99
- ### Example 2: Taiga Aisaka Style
100
-
101
- ```bash
102
- python wan_generate_video.py --fp8 --task t2i-14B --video_size 480 832 --infer_steps 20 \
103
- --save_path save --output_type both \
104
- --dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
105
- --t5 models_t5_umt5-xxl-enc-bf16.pth \
106
- --attn_mode torch \
107
- --lora_weight Taiga_Aisaka_outputs/Taiga_Aisaka_w14_lora-000010.safetensors 3D_Chibi_w14_lora-000024.safetensors \
108
- --lora_multiplier 1.0 \
109
- --interactive
110
- ```
111
-
112
- #### Prompt
113
-
114
- ```text
115
- "3D Chibi Style, ไธ€ไธช่บซ็ฉฟ็บข่‰ฒ้ซ˜ไธญๆ กๆœ็š„้‡‘ๅ‘ๅฅณๅญฉ๏ผŒๆญฃๅœจๅƒๆฑ‰ๅ กใ€‚"
116
- ```
117
-
118
- -- without 3D_Chibi lora text2video output
119
-
120
- -- with 3D_Chibi lora text2image output
121
-
122
- ---
123
-
124
- ### Example 3: Sebastian Michaelis (Black Butler) Style
125
-
126
- ```bash
127
- python wan_generate_video.py --fp8 --task t2i-14B --video_size 480 832 --infer_steps 20 \
128
- --save_path save --output_type both \
129
- --dit wan2.1_t2v_14B_bf16.safetensors --vae Wan2.1_VAE.pth \
130
- --t5 models_t5_umt5-xxl-enc-bf16.pth \
131
- --attn_mode torch \
132
- --lora_weight Sebastian_Michaelis_outputs/Sebastian_Michaelis_w14_lora-000007.safetensors 3D_Chibi_w14_lora-000024.safetensors \
133
- --lora_multiplier 1.0 \
134
- --interactive
135
- ```
136
-
137
- #### Prompt
138
-
139
- ```text
140
- "3D Chibi Style, In the style of Black Butler , The video opens with a close-up of a character dressed in a black suit, white shirt, and black tie. stands in a quiet office space. The soft glow of a desk lamp casts a warm light across his thoughtful expression, while the hum of distant keyboards and the faint scent of coffee linger in the air. Outside the window, the city lights twinkle like distant stars, blending with the muted glow of computer screens as the workday stretches on around him."
141
- ```
142
-
143
- -- without 3D_Chibi lora text2video output
144
-
145
- -- with 3D_Chibi lora text2image output
146
-
147
- ---
148
-
149
- ## Key Parameters
150
-
151
- | Parameter | Description |
152
- |----------|-------------|
153
- | `--fp8` | Enable FP8 precision for improved performance |
154
- | `--task` | Set to `t2i-14B` for image generation |
155
- | `--video_size` | Output resolution (e.g., `480 832`) |
156
- | `--infer_steps` | Speed vs quality trade-off (`20` recommended for quick test) |
157
- | `--lora_weight` | Path to LoRA weight files (can specify multiple) |
158
- | `--lora_multiplier` | Strength of LoRA effect (default: 1.0) |
159
- | `--prompt` | Include `"3D Chibi Style"` for best results |
160
-
161
- ---
162
-
163
- ## Style Characteristics
164
-
165
- For optimal results, prompts should emphasize:
166
- - **Chibi-style characters** with exaggerated heads and facial expressions
167
- - **Vibrant colors** and dynamic lighting effects
168
- - **Fantasy or magical settings** (e.g., gardens, castles, floating islands)
169
- - **Neon or glowing elements**, especially in futuristic or energetic scenes
170
-
171
- ---
172
-
173
- ## Output
174
-
175
- Generated images will be saved in the specified `--save_path` directory with:
176
- - PNG image file
177
- - (Optional) MP4 video (if `--output_type both` is used)
178
-
179
- ---
180
-
181
- ## Troubleshooting
182
-
183
- - Ensure all model weights are correctly downloaded and placed in the right directories.
184
- - Check GPU memory availability; at least **20GB VRAM** is recommended for 14B models.
185
- - Verify no conflicts exist between Python packages using `pip check`.
186
-
187
- ---
188
-
189
- ## License
190
-
191
- This project is licensed under the MIT License.
192
-
193
- ---
194
-
195
- ## Acknowledgments
196
-
197
- - **Hugging Face** โ€“ For hosting the model and dataset repositories
198
- - **Wan-AI** โ€“ For providing base diffusion models
199
- - **svjack** โ€“ For adapting and sharing LoRA weights for various styles
200
-
201
- For support or feedback, please open an issue in this repository.