Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Commit
·
4fa21de
1
Parent(s):
51535d8
init
Browse files- __lib__/app.py +47 -16
- __lib__/i18n/en.pyc +0 -0
- __lib__/util.pyc +0 -0
__lib__/app.py
CHANGED
|
@@ -1726,22 +1726,6 @@ def create_app():
|
|
| 1726 |
">🚀 {t('seo_unlimited_button', lang)}</a>
|
| 1727 |
</div>
|
| 1728 |
</div>
|
| 1729 |
-
|
| 1730 |
-
<!-- OmniAI Model Introduction -->
|
| 1731 |
-
<div style="background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%); padding: 35px; border-radius: 20px; box-shadow: 0 10px 30px rgba(0,0,0,0.1); margin: 40px 0;">
|
| 1732 |
-
<h2 style="color: #2c3e50; margin: 0 0 20px 0; font-size: 1.9em; font-weight: 700; text-align: center;">
|
| 1733 |
-
🤖 About Omni Creator 2.0 - 8B Multi-Modal Generation Model
|
| 1734 |
-
</h2>
|
| 1735 |
-
<div style="line-height: 1.8; font-size: 1.05em;">
|
| 1736 |
-
<p style="color: #555; margin-bottom: 20px;">
|
| 1737 |
-
<strong style="color: #667eea;">OmniAI</strong> is an 8-billion parameter multi-modal generation model developed by the OmniCreator research team.
|
| 1738 |
-
Through extensive training on massive video datasets, we discovered that the network can simultaneously acquire text, image, and video editing and generation capabilities.
|
| 1739 |
-
</p>
|
| 1740 |
-
<p style="color: #555; margin-bottom: 20px;">
|
| 1741 |
-
That's why we call it the <strong style="color: #e74c3c;">Omni Creator</strong> - it handles everything from text-to-image, image editing, to video generation with a single unified architecture.
|
| 1742 |
-
</p>
|
| 1743 |
-
</div>
|
| 1744 |
-
</div>
|
| 1745 |
|
| 1746 |
<!-- Performance Highlights -->
|
| 1747 |
<div style="background: linear-gradient(135deg, #11998e 0%, #38ef7d 100%); color: white; padding: 40px; border-radius: 20px; margin: 40px 0;">
|
|
@@ -1782,6 +1766,53 @@ def create_app():
|
|
| 1782 |
No fine-tuning required - just modify the prompt and input parameters!
|
| 1783 |
</p>
|
| 1784 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1785 |
</div>
|
| 1786 |
"""
|
| 1787 |
|
|
|
|
| 1726 |
">🚀 {t('seo_unlimited_button', lang)}</a>
|
| 1727 |
</div>
|
| 1728 |
</div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1729 |
|
| 1730 |
<!-- Performance Highlights -->
|
| 1731 |
<div style="background: linear-gradient(135deg, #11998e 0%, #38ef7d 100%); color: white; padding: 40px; border-radius: 20px; margin: 40px 0;">
|
|
|
|
| 1766 |
No fine-tuning required - just modify the prompt and input parameters!
|
| 1767 |
</p>
|
| 1768 |
</div>
|
| 1769 |
+
|
| 1770 |
+
<!-- OmniAI Model Introduction -->
|
| 1771 |
+
<div style="background: linear-gradient(135deg, #f5f7fa 0%, #c3cfe2 100%); padding: 35px; border-radius: 20px; box-shadow: 0 10px 30px rgba(0,0,0,0.1); margin: 40px 0;">
|
| 1772 |
+
<h2 style="color: #2c3e50; margin: 0 0 20px 0; font-size: 1.9em; font-weight: 700; text-align: center;">
|
| 1773 |
+
🤖 Omni Creator 2.0: 8B Unified Multi-Modal Diffusion Transformer
|
| 1774 |
+
</h2>
|
| 1775 |
+
<div style="line-height: 1.8; font-size: 1.05em;">
|
| 1776 |
+
<p style="color: #555; margin-bottom: 20px;">
|
| 1777 |
+
<strong style="color: #667eea;">Omni Creator 2.0</strong> is an <strong>8-billion parameter</strong> native <strong>Multi-Modal Diffusion Transformer (MM-DiT)</strong>.
|
| 1778 |
+
It unifies <strong>Text-to-Image</strong>, <strong>high-fidelity pixel-level editing</strong>, and <strong>Image-to-Video</strong> generation inside a single differentiable architecture.
|
| 1779 |
+
</p>
|
| 1780 |
+
<p style="color: #555; margin-bottom: 20px;">
|
| 1781 |
+
Existing approaches often split into specialized systems: DiT-style generators can be strong for static synthesis but struggle with temporal coherence, while MLLM-based editors follow instructions well but may lose pixel-level fidelity.
|
| 1782 |
+
<strong style="color: #e74c3c;">Omni Creator 2.0</strong> bridges both by design.
|
| 1783 |
+
</p>
|
| 1784 |
+
<div style="background: rgba(255,255,255,0.55); border: 1px solid rgba(44,62,80,0.15); padding: 18px 20px; border-radius: 14px; margin: 22px 0;">
|
| 1785 |
+
<div style="color: #2c3e50; font-weight: 700; margin-bottom: 10px;">Core Architectural Innovations</div>
|
| 1786 |
+
<div style="color: #555;">
|
| 1787 |
+
<div style="margin: 8px 0;"><strong>1) Spatio-Temporal 3D Shifted Window Attention</strong> — window partitioning with temporal shifting (Swin-style, adapted for diffusion) to improve long-range temporal dependency modeling and reduce flickering.</div>
|
| 1788 |
+
<div style="margin: 8px 0;"><strong>2) Native Multi-Image Conditioning + Adaptive Fusion</strong> — built-in visual projector supports up to <strong>3</strong> concurrent reference images; <strong>Adaptive Multi-Modal Gating</strong> dynamically balances text, image references, and temporal context.</div>
|
| 1789 |
+
<div style="margin: 8px 0;"><strong>3) HPC-Ready Optimization</strong> — production-oriented inference with FP8 support, RoPE, and RMSNorm for stable scaling to longer sequences and higher resolutions.</div>
|
| 1790 |
+
</div>
|
| 1791 |
+
</div>
|
| 1792 |
+
<div style="background: rgba(44,62,80,0.06); padding: 16px 18px; border-radius: 12px; font-family: ui-monospace, SFMono-Regular, Menlo, Monaco, Consolas, 'Liberation Mono', 'Courier New', monospace; font-size: 0.95em; overflow-x: auto;">
|
| 1793 |
+
<div style="color: #2c3e50; font-weight: 700; margin-bottom: 10px; font-family: inherit;">Mechanism Sketch</div>
|
| 1794 |
+
<div style="color: #444; white-space: pre; margin: 0;"># Fourier timestep embedding (DiT)
|
| 1795 |
+
t_freq = [cos(t · ω_1..m), sin(t · ω_1..m)]
|
| 1796 |
+
t_emb = MLP(t_freq)
|
| 1797 |
+
|
| 1798 |
+
# RoPE + RMSNorm stabilized attention
|
| 1799 |
+
Q,K,V = Wq·RMSNorm(x), Wk·RMSNorm(x), Wv·RMSNorm(x)
|
| 1800 |
+
Q,K = RoPE(Q), RoPE(K)
|
| 1801 |
+
Attn = Softmax((QK^T)/sqrt(d) + B_rel) V
|
| 1802 |
+
|
| 1803 |
+
# AdaLN-Zero modulation (diffusion conditioning)
|
| 1804 |
+
shift,scale,gate = Linear(t_emb)
|
| 1805 |
+
x = x + gate * Attn( (1+scale) ⊙ RMSNorm(x) + shift )
|
| 1806 |
+
|
| 1807 |
+
# Multi-modal fusion (text + up to 3 images)
|
| 1808 |
+
ctx = concat(E_text, P_visual(I_1..I_3))
|
| 1809 |
+
x = x + CrossAttn(RMSNorm(x), ctx)
|
| 1810 |
+
|
| 1811 |
+
# High-throughput FFN
|
| 1812 |
+
x = x + gate_mlp * SwiGLU(RMSNorm(x))</div>
|
| 1813 |
+
</div>
|
| 1814 |
+
</div>
|
| 1815 |
+
</div>
|
| 1816 |
</div>
|
| 1817 |
"""
|
| 1818 |
|
__lib__/i18n/en.pyc
CHANGED
|
Binary files a/__lib__/i18n/en.pyc and b/__lib__/i18n/en.pyc differ
|
|
|
__lib__/util.pyc
CHANGED
|
Binary files a/__lib__/util.pyc and b/__lib__/util.pyc differ
|
|
|