Spaces:

selfit-camera
/

Omni-Image-Editor

Running on CPU Upgrade

App Files Files Community

selfit-camera commited on 23 days ago

Commit

4fa21de

1 Parent(s): 51535d8

init

Browse files

Files changed (3) hide show

__lib__/app.py +47 -16
__lib__/i18n/en.pyc +0 -0
__lib__/util.pyc +0 -0

__lib__/app.py CHANGED Viewed

@@ -1726,22 +1726,6 @@ def create_app():
                         ">&#128640; {t('seo_unlimited_button', lang)}</a>
                     </div>
                 </div>
-                <!-- OmniAI Model Introduction -->
-                <div style="background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%); padding: 35px; border-radius: 20px; box-shadow: 0 10px 30px rgba(0,0,0,0.1); margin: 40px 0;">
-                    <h2 style="color: #2c3e50; margin: 0 0 20px 0; font-size: 1.9em; font-weight: 700; text-align: center;">
-                        &#129302; About Omni Creator 2.0 - 8B Multi-Modal Generation Model
-                    </h2>
-                    <div style="line-height: 1.8; font-size: 1.05em;">
-                        <p style="color: #555; margin-bottom: 20px;">
-                            <strong style="color: #667eea;">OmniAI</strong> is an 8-billion parameter multi-modal generation model developed by the OmniCreator research team.
-                            Through extensive training on massive video datasets, we discovered that the network can simultaneously acquire text, image, and video editing and generation capabilities.
-                        </p>
-                        <p style="color: #555; margin-bottom: 20px;">
-                            That's why we call it the <strong style="color: #e74c3c;">Omni Creator</strong> - it handles everything from text-to-image, image editing, to video generation with a single unified architecture.
-                        </p>
-                    </div>
-                </div>
                 <!-- Performance Highlights -->
                 <div style="background: linear-gradient(135deg, #11998e 0%, #38ef7d 100%); color: white; padding: 40px; border-radius: 20px; margin: 40px 0;">
@@ -1782,6 +1766,53 @@ def create_app():
                         No fine-tuning required - just modify the prompt and input parameters!
                     </p>
                 </div>
             </div>
             """

                         ">&#128640; {t('seo_unlimited_button', lang)}</a>
                     </div>
                 </div>
                 <!-- Performance Highlights -->
                 <div style="background: linear-gradient(135deg, #11998e 0%, #38ef7d 100%); color: white; padding: 40px; border-radius: 20px; margin: 40px 0;">
                         No fine-tuning required - just modify the prompt and input parameters!
                     </p>
                 </div>
+                <!-- OmniAI Model Introduction -->
+                <div style="background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%); padding: 35px; border-radius: 20px; box-shadow: 0 10px 30px rgba(0,0,0,0.1); margin: 40px 0;">
+                    <h2 style="color: #2c3e50; margin: 0 0 20px 0; font-size: 1.9em; font-weight: 700; text-align: center;">
+                        &#129302; Omni Creator 2.0: 8B Unified Multi-Modal Diffusion Transformer
+                    </h2>
+                    <div style="line-height: 1.8; font-size: 1.05em;">
+                        <p style="color: #555; margin-bottom: 20px;">
+                            <strong style="color: #667eea;">Omni Creator 2.0</strong> is an <strong>8-billion parameter</strong> native <strong>Multi-Modal Diffusion Transformer (MM-DiT)</strong>.
+                            It unifies <strong>Text-to-Image</strong>, <strong>high-fidelity pixel-level editing</strong>, and <strong>Image-to-Video</strong> generation inside a single differentiable architecture.
+                        </p>
+                        <p style="color: #555; margin-bottom: 20px;">
+                            Existing approaches often split into specialized systems: DiT-style generators can be strong for static synthesis but struggle with temporal coherence, while MLLM-based editors follow instructions well but may lose pixel-level fidelity.
+                            <strong style="color: #e74c3c;">Omni Creator 2.0</strong> bridges both by design.
+                        </p>
+                        <div style="background: rgba(255,255,255,0.55); border: 1px solid rgba(44,62,80,0.15); padding: 18px 20px; border-radius: 14px; margin: 22px 0;">
+                            <div style="color: #2c3e50; font-weight: 700; margin-bottom: 10px;">Core Architectural Innovations</div>
+                            <div style="color: #555;">
+                                <div style="margin: 8px 0;"><strong>1) Spatio-Temporal 3D Shifted Window Attention</strong> — window partitioning with temporal shifting (Swin-style, adapted for diffusion) to improve long-range temporal dependency modeling and reduce flickering.</div>
+                                <div style="margin: 8px 0;"><strong>2) Native Multi-Image Conditioning + Adaptive Fusion</strong> — built-in visual projector supports up to <strong>3</strong> concurrent reference images; <strong>Adaptive Multi-Modal Gating</strong> dynamically balances text, image references, and temporal context.</div>
+                                <div style="margin: 8px 0;"><strong>3) HPC-Ready Optimization</strong> — production-oriented inference with FP8 support, RoPE, and RMSNorm for stable scaling to longer sequences and higher resolutions.</div>
+                            </div>
+                        </div>
+                        <div style="background: rgba(44,62,80,0.06); padding: 16px 18px; border-radius: 12px; font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, 'Liberation Mono', 'Courier New', monospace; font-size: 0.95em; overflow-x: auto;">
+                            <div style="color: #2c3e50; font-weight: 700; margin-bottom: 10px; font-family: inherit;">Mechanism Sketch</div>
+                            <div style="color: #444; white-space: pre; margin: 0;"># Fourier timestep embedding (DiT)
+t_freq = [cos(t · ω_1..m), sin(t · ω_1..m)]
+t_emb  = MLP(t_freq)
+# RoPE + RMSNorm stabilized attention
+Q,K,V = Wq·RMSNorm(x), Wk·RMSNorm(x), Wv·RMSNorm(x)
+Q,K   = RoPE(Q), RoPE(K)
+Attn  = Softmax((QK^T)/sqrt(d) + B_rel) V
+# AdaLN-Zero modulation (diffusion conditioning)
+shift,scale,gate = Linear(t_emb)
+x = x + gate * Attn( (1+scale) ⊙ RMSNorm(x) + shift )
+# Multi-modal fusion (text + up to 3 images)
+ctx = concat(E_text, P_visual(I_1..I_3))
+x   = x + CrossAttn(RMSNorm(x), ctx)
+# High-throughput FFN
+x = x + gate_mlp * SwiGLU(RMSNorm(x))</div>
+                        </div>
+                    </div>
+                </div>
             </div>
             """

__lib__/i18n/en.pyc CHANGED Viewed

Binary files a/__lib__/i18n/en.pyc and b/__lib__/i18n/en.pyc differ

__lib__/util.pyc CHANGED Viewed

Binary files a/__lib__/util.pyc and b/__lib__/util.pyc differ