yugeng-amd commited on
Commit
af6f8ce
·
verified ·
1 Parent(s): 05c4d51

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +235 -2
README.md CHANGED
@@ -7,8 +7,241 @@ For action injection, we favor adaLN for its lightweight parameter footprint, an
7
 
8
  Note that released T2W model is trained using ControlNet architecture.
9
 
10
- More info please refer to [code](https://github.com/AMD-AGI/Micro-World).
 
 
11
  <div style="margin: 0; padding: 0; text-align: center;">
12
  <img src="https://github.com/user-attachments/assets/680b87ac-0c95-4a27-b4fd-fcafb9fdf609" alt="model architecture" title="model architecture" class="model architecture">
13
  <img src="https://github.com/user-attachments/assets/c9cd8d9e-9555-42d3-b884-04705d1e329c" alt="model architecture" title="model architecture" class="model architecture">
14
- </div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
 
8
  Note that released T2W model is trained using ControlNet architecture.
9
 
10
+ More info please refer to [GitHub Repo](https://github.com/AMD-AGI/Micro-World).
11
+
12
+ # Model Architecture
13
  <div style="margin: 0; padding: 0; text-align: center;">
14
  <img src="https://github.com/user-attachments/assets/680b87ac-0c95-4a27-b4fd-fcafb9fdf609" alt="model architecture" title="model architecture" class="model architecture">
15
  <img src="https://github.com/user-attachments/assets/c9cd8d9e-9555-42d3-b884-04705d1e329c" alt="model architecture" title="model architecture" class="model architecture">
16
+ </div>
17
+
18
+ # Video Result
19
+ ## T2W Model
20
+ ### In Domain
21
+ <table border="0" style="width: 100%; table-layout: fixed; text-align: center; margin-top: 20px;">
22
+ <tr>
23
+ <td style="vertical-align: top; width: 33%;">
24
+ <video src="https://github.com/user-attachments/assets/01ecff57-5fc8-40c0-b7c1-1c72525b598c" width="100%" controls autoplay loop></video>
25
+ <div style="margin-top: 8px; overflow:hidden; font-size: 14px;">
26
+ W
27
+ </div>
28
+ </td>
29
+ <td style="vertical-align: top; width: 33%;">
30
+ <video src="https://github.com/user-attachments/assets/0156af1f-5fe2-4276-9cec-ba97b2476018" width="100%" controls autoplay loop></video>
31
+ <div style="margin-top: 8px; font-size: 14px;">
32
+ S
33
+ </div>
34
+ </td>
35
+ <td style="vertical-align: top; width: 33%;">
36
+ <video src="https://github.com/user-attachments/assets/d27268e5-9fbc-49f7-b3ca-882fb58f21b6" width="100%" controls autoplay loop></video>
37
+ <div style="margin-top: 8px; font-size: 14px;">
38
+ A
39
+ </div>
40
+ </td>
41
+ </tr>
42
+ </table>
43
+
44
+ <table border="0" style="width: 100%; table-layout: fixed; text-align: center; margin-top: 20px;">
45
+ <tr>
46
+ <td style="vertical-align: top; width: 33%;">
47
+ <video src="https://github.com/user-attachments/assets/aff52ef1-0c9c-4a03-961f-6aa5361b636d" width="100%" controls autoplay loop></video>
48
+ <div style="margin-top: 8px; font-size: 14px;">
49
+ D
50
+ </div>
51
+ </td>
52
+ <td style="vertical-align: top; width: 33%;">
53
+ <video src="https://github.com/user-attachments/assets/b5d37d89-5cf0-40a5-8504-61f68e944fb9" width="100%" controls autoplay loop></video>
54
+ <div style="margin-top: 8px; font-size: 14px;">
55
+ W+Ctrl
56
+ </div>
57
+ </td>
58
+ <td style="vertical-align: top; width: 33%;">
59
+ <video src="https://github.com/user-attachments/assets/1b0d50c8-a037-4671-a146-b77672260322" width="100%" controls autoplay loop></video>
60
+ <div style="margin-top: 8px; font-size: 14px;">
61
+ W+Shift
62
+ </div>
63
+ </td>
64
+ </tr>
65
+ </table>
66
+
67
+ <table border="0" style="width: 100%; table-layout: fixed; text-align: center; margin-top: 20px;">
68
+ <tr>
69
+ <td style="vertical-align: top; width: 33%;">
70
+ <video src="https://github.com/user-attachments/assets/b13a14d7-5882-42dd-872b-8f61b9ab7060" width="100%" controls autoplay loop></video>
71
+ <div style="margin-top: 8px; font-size: 14px;">
72
+ Multiple control
73
+ </div>
74
+ </td>
75
+ <td style="vertical-align: top; width: 33%;">
76
+ <video src="https://github.com/user-attachments/assets/1218bbda-7993-4075-881b-2e16002acda8" width="100%" controls autoplay loop></video>
77
+ <div style="margin-top: 8px; font-size: 14px;">
78
+ Mouse down and up
79
+ </div>
80
+ </td>
81
+ <td style="vertical-align: top; width: 33%;">
82
+ <video src="https://github.com/user-attachments/assets/31471313-d94b-4936-b23b-12e7f89fda87" width="100%" controls autoplay loop></video>
83
+ <div style="margin-top: 8px; font-size: 14px;">
84
+ Mouse right and left
85
+ </div>
86
+ </td>
87
+ </tr>
88
+ </table>
89
+
90
+ ### Open Domain
91
+ <table border="0" style="width: 100%; table-layout: fixed; text-align: center; margin-top: 20px;">
92
+ <tr>
93
+ <td style="vertical-align: top; width: 33%;">
94
+ <video src="https://github.com/user-attachments/assets/25ec4ba8-4f65-4b26-8966-13437647f240" width="100%" controls autoplay loop></video>
95
+ <div style="margin-top: 8px; text-align: left;">
96
+ <details>
97
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
98
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
99
+ A cozy living room with sunlight streaming through window, vintage furniture, soft shadows.
100
+ </div>
101
+ </details>
102
+ </div>
103
+ </td>
104
+ <td style="vertical-align: top; width: 33%;">
105
+ <video src="https://github.com/user-attachments/assets/a92149a8-6c4d-4b9a-8ada-81b47b4c81e7" width="100%" controls autoplay loop></video>
106
+ <div style="margin-top: 8px; text-align: left;">
107
+ <details>
108
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
109
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
110
+ A cozy living room with sunlight streaming through window, vintage furniture, soft shadows.
111
+ </div>
112
+ </details>
113
+ </div>
114
+ </td>
115
+ <td style="vertical-align: top; width: 33%;">
116
+ <video src="https://github.com/user-attachments/assets/67b35842-04fd-4a0f-9a5c-6914d9f77e66" width="100%" controls autoplay loop></video>
117
+ <div style="margin-top: 8px; text-align: left;">
118
+ <details>
119
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
120
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
121
+ Running along a cliffside path in a tropical island in first person perspective, with turquoise waters crashing against the rocks far below, the salty scent of the ocean carried by the breeze, and the sound of distant waves blending with the calls of seagulls as the path twists and turns along the jagged cliffs.
122
+ </div>
123
+ </details>
124
+ </div>
125
+ </td>
126
+ </tr>
127
+ </table>
128
+
129
+ <table border="0" style="width: 100%; table-layout: fixed; text-align: center; margin-top: 20px;">
130
+ <tr>
131
+ <td style="vertical-align: top; width: 33%;">
132
+ <video src="https://github.com/user-attachments/assets/d4a46b8b-022d-4fca-964f-c1d477111f4e" width="100%" controls autoplay loop></video>
133
+ <div style="margin-top: 8px; text-align: left;">
134
+ <details>
135
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
136
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
137
+ A young bear stands next to a large tree in a grassy meadow, its dark fur catching the soft daylight. The bear seems poised, observing its surroundings in a tranquil landscape, with rolling hills and sparse trees dotting the background under a pale blue sky.
138
+ </div>
139
+ </details>
140
+ </div>
141
+ </td>
142
+ <td style="vertical-align: top; width: 33%;">
143
+ <video src="https://github.com/user-attachments/assets/b6f77a1a-58ce-43db-b6c5-efe09b7a9142" width="100%" controls autoplay loop></video>
144
+ <div style="margin-top: 8px; text-align: left;">
145
+ <details>
146
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
147
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
148
+ A giant panda rests peacefully under a blooming cherry blossom tree, its black and white fur contrasting beautifully with the delicate pink petals. The ground is lightly sprinkled with fallen blossoms, and the tranquil setting is framed by the soft hues of the blossoms and the grassy field surrounding the tree.
149
+ </div>
150
+ </details>
151
+ </div>
152
+ </td>
153
+ <td style="vertical-align: top; width: 33%;">
154
+ <video src="https://github.com/user-attachments/assets/c9225344-8b0b-4249-ab77-c8e5c4dddacc" width="100%" controls autoplay loop></video>
155
+ <div style="margin-top: 8px; text-align: left;">
156
+ <details>
157
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
158
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
159
+ Exploring an ancient jungle ruin in first person perspective surrounded by towering stone statues covered in moss and vines.
160
+ </div>
161
+ </details>
162
+ </div>
163
+ </td>
164
+ </tr>
165
+ </table>
166
+
167
+ ## I2W Model
168
+ <table border="0" style="width: 100%; table-layout: fixed; text-align: center; margin-top: 20px;">
169
+ <tr>
170
+ <td style="vertical-align: top; width: 50%;">
171
+ <video src="https://github.com/user-attachments/assets/f135d5c9-0379-4ace-bf22-1671cef261af" width="100%" controls autoplay loop></video>
172
+ <div style="margin-top: 8px; text-align: left;">
173
+ <details>
174
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
175
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
176
+ First-person perspective walking down a lively city street at night. Neon signs and bright billboards glow on both sides, cars drive past with headlights and taillights streaking slightly. camera motion directly aligned with user actions, immersive urban night scene.
177
+ </div>
178
+ </details>
179
+ </div>
180
+ </td>
181
+ <td style="vertical-align: top; width: 50%;">
182
+ <video src="https://github.com/user-attachments/assets/2088d2da-95a6-4908-b7a2-f60458281b5e" width="100%" controls autoplay loop></video>
183
+ <div style="margin-top: 8px; text-align: left;">
184
+ <details>
185
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
186
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
187
+ First-person perspective standing in front of an ornate traditional Chinese temple. The symmetrical facade features red lanterns, intricate carvings, and a curved tiled roof decorated with dragons. Bright daytime lighting, consistent environment, camera motion directly aligned with user actions, immersive and interactive exploration.
188
+ </div>
189
+ </details>
190
+ </div>
191
+ </td>
192
+ </tr>
193
+ </table>
194
+
195
+ <table border="0" style="width: 100%; table-layout: fixed; text-align: center; margin-top: 20px;">
196
+ <tr>
197
+ <td style="vertical-align: top; width: 50%;">
198
+ <video src="https://github.com/user-attachments/assets/9e1185cc-5480-4059-8643-7b6e08fff0c1" width="100%" controls autoplay loop></video>
199
+ <div style="margin-top: 8px; text-align: left;">
200
+ <details>
201
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
202
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
203
+ First-person perspective of standing in a rocky desert valley, looking at a camel a few meters ahead. The camel stands calmly on uneven stones, its long legs and single hump clearly visible. Bright midday sunlight, dry air, muted earth tones, distant barren mountains. Natural handheld camera feeling, camera motion controlled by user actions, smooth movement, cinematic realism.
204
+ </div>
205
+ </details>
206
+ </div>
207
+ </td>
208
+ <td style="vertical-align: top; width: 50%;">
209
+ <video src="https://github.com/user-attachments/assets/c75d7344-7016-494e-be00-103d28e43738" width="100%" controls autoplay loop></video>
210
+ <div style="margin-top: 8px; text-align: left;">
211
+ <details>
212
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
213
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
214
+ First-person perspective walking through a narrow urban alley, old red brick industrial buildings on both sides, cobblestone street stretching forward with strong depth, metal walkways connecting buildings above, overcast daylight, soft diffused lighting, cool and muted color tones, quiet and empty environment, no people, camera motion controlled by user actions, smooth movement, stable horizon, realistic scale and geometry, high realism, cinematic urban scene.
215
+ </div>
216
+ </details>
217
+ </div>
218
+ </td>
219
+ </tr>
220
+ </table>
221
+
222
+ <table border="0" style="width: 100%; table-layout: fixed; text-align: center; margin-top: 20px;">
223
+ <tr>
224
+ <td style="vertical-align: top; width: 50%;">
225
+ <video src="https://github.com/user-attachments/assets/f6da97af-0d3a-4b6a-b80f-5ae3c03ccbf6" width="100%" controls autoplay loop></video>
226
+ <div style="margin-top: 8px; text-align: left;">
227
+ <details>
228
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
229
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
230
+ First-person perspective coastal exploration scene, walking along a cliffside stone path with wooden railings, green bushes lining the walkway, ocean to the left with gentle waves, distant islands visible under a clear sky, realistic head-mounted camera view, smooth forward motion, stable horizon, natural human eye level, high realism, consistent environment, camera motion directly aligned with user actions, immersive and interactive exploration.
231
+ </div>
232
+ </details>
233
+ </div>
234
+ </td>
235
+ <td style="vertical-align: top; width: 50%;">
236
+ <video src="https://github.com/user-attachments/assets/b76a8aca-d1da-47ba-88e9-3da36f64429d" width="100%" controls autoplay loop></video>
237
+ <div style="margin-top: 8px; text-align: left;">
238
+ <details>
239
+ <summary style="cursor: pointer; font-size: 13px;">View Prompt</summary>
240
+ <div style="font-size: 12px; margin-top: 5px; color: #555;">
241
+ First-person perspective inside a cozy living room, walking around a warm fireplace, soft carpet underfoot, furniture arranged neatly, bookshelves, plants, and warm table lamps on both sides, warm indoor lighting, calm and quiet atmosphere, natural head-level camera movement, camera motion driven by user actions, realistic scale and depth, high realism, cinematic lighting, no people, no distortion.
242
+ </div>
243
+ </details>
244
+ </div>
245
+ </td>
246
+ </tr>
247
+ </table>