Model card @ step 12125
Browse files
README.md
CHANGED
|
@@ -27,11 +27,23 @@ Uses CLIP weights from `AbstractPhil/clips`:
|
|
| 27 |
|
| 28 |
- **Fusion Strategy**: adaptive_cantor
|
| 29 |
- **Latent Dimension**: 2048
|
| 30 |
-
- **Training Steps**:
|
| 31 |
-
- **Best Loss**: 0.
|
| 32 |
- **Prompt Source**: booru
|
| 33 |
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 35 |
## T5 Input Format
|
| 36 |
|
| 37 |
T5 receives a different input than CLIP to enable richer semantic understanding:
|
|
@@ -41,24 +53,20 @@ CLIP sees: "masterpiece, 1girl, blue hair, school uniform, smile"
|
|
| 41 |
T5 sees: "masterpiece, 1girl, blue hair, school uniform, smile ¶ A cheerful schoolgirl with blue hair smiling warmly"
|
| 42 |
```
|
| 43 |
|
| 44 |
-
The pilcrow (`¶`) separator acts as a mode-switch token
|
| 45 |
-
- **Before ¶**: Structured booru tags (shuffled for robustness)
|
| 46 |
-
- **After ¶**: Natural language description from Qwen summarizer
|
| 47 |
-
|
| 48 |
-
This enables deviant T5 usage during inference without affecting CLIP behavior.
|
| 49 |
|
| 50 |
|
| 51 |
## Learned Parameters
|
| 52 |
|
| 53 |
**Alpha (Visibility):**
|
| 54 |
-
- clip_g: 0.
|
| 55 |
-
- clip_l: 0.
|
| 56 |
-
- t5_xl_g: 0.
|
| 57 |
-
- t5_xl_l: 0.
|
| 58 |
|
| 59 |
**Beta (Capacity):**
|
| 60 |
-
- clip_l_t5_xl_l: 0.
|
| 61 |
-
- clip_g_t5_xl_g: 0.
|
| 62 |
|
| 63 |
|
| 64 |
## Usage
|
|
@@ -69,7 +77,6 @@ from lyra_xl_multimodal import load_lyra_from_hub
|
|
| 69 |
model = load_lyra_from_hub("AbstractPhil/vae-lyra-xl-adaptive-cantor-illustrious")
|
| 70 |
model.eval()
|
| 71 |
|
| 72 |
-
# Inputs (use Illustrious CLIP encoders with clip_skip=2 for best results)
|
| 73 |
inputs = {
|
| 74 |
"clip_l": clip_l_embeddings, # [batch, 77, 768]
|
| 75 |
"clip_g": clip_g_embeddings, # [batch, 77, 1280]
|
|
@@ -79,3 +86,11 @@ inputs = {
|
|
| 79 |
|
| 80 |
recons, mu, logvar, _ = model(inputs, target_modalities=["clip_l", "clip_g"])
|
| 81 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 27 |
|
| 28 |
- **Fusion Strategy**: adaptive_cantor
|
| 29 |
- **Latent Dimension**: 2048
|
| 30 |
+
- **Training Steps**: 12,125
|
| 31 |
+
- **Best Loss**: 0.0377
|
| 32 |
- **Prompt Source**: booru
|
| 33 |
|
| 34 |
|
| 35 |
+
## Quick Load (Safetensors)
|
| 36 |
+
|
| 37 |
+
```python
|
| 38 |
+
from safetensors.torch import load_file
|
| 39 |
+
|
| 40 |
+
# Load just the weights (fast)
|
| 41 |
+
state_dict = load_file("weights/lyra_illustrious_best.safetensors")
|
| 42 |
+
|
| 43 |
+
# Or specific step
|
| 44 |
+
state_dict = load_file("weights/lyra_illustrious_step_5000.safetensors")
|
| 45 |
+
```
|
| 46 |
+
|
| 47 |
## T5 Input Format
|
| 48 |
|
| 49 |
T5 receives a different input than CLIP to enable richer semantic understanding:
|
|
|
|
| 53 |
T5 sees: "masterpiece, 1girl, blue hair, school uniform, smile ¶ A cheerful schoolgirl with blue hair smiling warmly"
|
| 54 |
```
|
| 55 |
|
| 56 |
+
The pilcrow (`¶`) separator acts as a mode-switch token.
|
|
|
|
|
|
|
|
|
|
|
|
|
| 57 |
|
| 58 |
|
| 59 |
## Learned Parameters
|
| 60 |
|
| 61 |
**Alpha (Visibility):**
|
| 62 |
+
- clip_g: 0.7316
|
| 63 |
+
- clip_l: 0.7316
|
| 64 |
+
- t5_xl_g: 0.7339
|
| 65 |
+
- t5_xl_l: 0.7451
|
| 66 |
|
| 67 |
**Beta (Capacity):**
|
| 68 |
+
- clip_l_t5_xl_l: 0.5709
|
| 69 |
+
- clip_g_t5_xl_g: 0.5763
|
| 70 |
|
| 71 |
|
| 72 |
## Usage
|
|
|
|
| 77 |
model = load_lyra_from_hub("AbstractPhil/vae-lyra-xl-adaptive-cantor-illustrious")
|
| 78 |
model.eval()
|
| 79 |
|
|
|
|
| 80 |
inputs = {
|
| 81 |
"clip_l": clip_l_embeddings, # [batch, 77, 768]
|
| 82 |
"clip_g": clip_g_embeddings, # [batch, 77, 1280]
|
|
|
|
| 86 |
|
| 87 |
recons, mu, logvar, _ = model(inputs, target_modalities=["clip_l", "clip_g"])
|
| 88 |
```
|
| 89 |
+
|
| 90 |
+
## Files
|
| 91 |
+
|
| 92 |
+
- `model.pt` - Full checkpoint (model + optimizer + scheduler)
|
| 93 |
+
- `checkpoint_lyra_illustrious_XXXX.pt` - Step checkpoints
|
| 94 |
+
- `config.json` - Training configuration
|
| 95 |
+
- `weights/lyra_illustrious_best.safetensors` - Best model weights only
|
| 96 |
+
- `weights/lyra_illustrious_step_XXXX.safetensors` - Step checkpoints (weights only)
|