AbstractPhil
/

vae-lyra-xl-adaptive-cantor-illustrious

adaptive-cantor

Model card Files Files and versions

AbstractPhil commited on Nov 27, 2025

Commit

aae8fcc

·

verified ·

1 Parent(s): deff791

Model card @ step 12125

Files changed (1) hide show

README.md +29 -14

README.md CHANGED Viewed

@@ -27,11 +27,23 @@ Uses CLIP weights from `AbstractPhil/clips`:
 - **Fusion Strategy**: adaptive_cantor
 - **Latent Dimension**: 2048
-- **Training Steps**: 1,564
-- **Best Loss**: 0.0710
 - **Prompt Source**: booru
 ## T5 Input Format
 T5 receives a different input than CLIP to enable richer semantic understanding:
@@ -41,24 +53,20 @@ CLIP sees:  "masterpiece, 1girl, blue hair, school uniform, smile"
 T5 sees:    "masterpiece, 1girl, blue hair, school uniform, smile ¶ A cheerful schoolgirl with blue hair smiling warmly"
 ```
-The pilcrow (`¶`) separator acts as a mode-switch token, allowing T5 to learn:
-- **Before ¶**: Structured booru tags (shuffled for robustness)
-- **After ¶**: Natural language description from Qwen summarizer
-This enables deviant T5 usage during inference without affecting CLIP behavior.
 ## Learned Parameters
 **Alpha (Visibility):**
-- clip_g: 0.7309
-- clip_l: 0.7309
-- t5_xl_g: 0.7320
-- t5_xl_l: 0.7319
 **Beta (Capacity):**
-- clip_l_t5_xl_l: 0.5757
-- clip_g_t5_xl_g: 0.5744
 ## Usage
@@ -69,7 +77,6 @@ from lyra_xl_multimodal import load_lyra_from_hub
 model = load_lyra_from_hub("AbstractPhil/vae-lyra-xl-adaptive-cantor-illustrious")
 model.eval()
-# Inputs (use Illustrious CLIP encoders with clip_skip=2 for best results)
 inputs = {
     "clip_l": clip_l_embeddings,     # [batch, 77, 768]
     "clip_g": clip_g_embeddings,     # [batch, 77, 1280]
@@ -79,3 +86,11 @@ inputs = {
 recons, mu, logvar, _ = model(inputs, target_modalities=["clip_l", "clip_g"])
 ```

 - **Fusion Strategy**: adaptive_cantor
 - **Latent Dimension**: 2048
+- **Training Steps**: 12,125
+- **Best Loss**: 0.0377
 - **Prompt Source**: booru
+## Quick Load (Safetensors)
+```python
+from safetensors.torch import load_file
+# Load just the weights (fast)
+state_dict = load_file("weights/lyra_illustrious_best.safetensors")
+# Or specific step
+state_dict = load_file("weights/lyra_illustrious_step_5000.safetensors")
+```
 ## T5 Input Format
 T5 receives a different input than CLIP to enable richer semantic understanding:
 T5 sees:    "masterpiece, 1girl, blue hair, school uniform, smile ¶ A cheerful schoolgirl with blue hair smiling warmly"
 ```
+The pilcrow (`¶`) separator acts as a mode-switch token.
 ## Learned Parameters
 **Alpha (Visibility):**
+- clip_g: 0.7316
+- clip_l: 0.7316
+- t5_xl_g: 0.7339
+- t5_xl_l: 0.7451
 **Beta (Capacity):**
+- clip_l_t5_xl_l: 0.5709
+- clip_g_t5_xl_g: 0.5763
 ## Usage
 model = load_lyra_from_hub("AbstractPhil/vae-lyra-xl-adaptive-cantor-illustrious")
 model.eval()
 inputs = {
     "clip_l": clip_l_embeddings,     # [batch, 77, 768]
     "clip_g": clip_g_embeddings,     # [batch, 77, 1280]
 recons, mu, logvar, _ = model(inputs, target_modalities=["clip_l", "clip_g"])
 ```
+## Files
+- `model.pt` - Full checkpoint (model + optimizer + scheduler)
+- `checkpoint_lyra_illustrious_XXXX.pt` - Step checkpoints
+- `config.json` - Training configuration
+- `weights/lyra_illustrious_best.safetensors` - Best model weights only
+- `weights/lyra_illustrious_step_XXXX.safetensors` - Step checkpoints (weights only)