AbstractPhil commited on
Commit
aae8fcc
·
verified ·
1 Parent(s): deff791

Model card @ step 12125

Browse files
Files changed (1) hide show
  1. README.md +29 -14
README.md CHANGED
@@ -27,11 +27,23 @@ Uses CLIP weights from `AbstractPhil/clips`:
27
 
28
  - **Fusion Strategy**: adaptive_cantor
29
  - **Latent Dimension**: 2048
30
- - **Training Steps**: 1,564
31
- - **Best Loss**: 0.0710
32
  - **Prompt Source**: booru
33
 
34
 
 
 
 
 
 
 
 
 
 
 
 
 
35
  ## T5 Input Format
36
 
37
  T5 receives a different input than CLIP to enable richer semantic understanding:
@@ -41,24 +53,20 @@ CLIP sees: "masterpiece, 1girl, blue hair, school uniform, smile"
41
  T5 sees: "masterpiece, 1girl, blue hair, school uniform, smile ¶ A cheerful schoolgirl with blue hair smiling warmly"
42
  ```
43
 
44
- The pilcrow (`¶`) separator acts as a mode-switch token, allowing T5 to learn:
45
- - **Before ¶**: Structured booru tags (shuffled for robustness)
46
- - **After ¶**: Natural language description from Qwen summarizer
47
-
48
- This enables deviant T5 usage during inference without affecting CLIP behavior.
49
 
50
 
51
  ## Learned Parameters
52
 
53
  **Alpha (Visibility):**
54
- - clip_g: 0.7309
55
- - clip_l: 0.7309
56
- - t5_xl_g: 0.7320
57
- - t5_xl_l: 0.7319
58
 
59
  **Beta (Capacity):**
60
- - clip_l_t5_xl_l: 0.5757
61
- - clip_g_t5_xl_g: 0.5744
62
 
63
 
64
  ## Usage
@@ -69,7 +77,6 @@ from lyra_xl_multimodal import load_lyra_from_hub
69
  model = load_lyra_from_hub("AbstractPhil/vae-lyra-xl-adaptive-cantor-illustrious")
70
  model.eval()
71
 
72
- # Inputs (use Illustrious CLIP encoders with clip_skip=2 for best results)
73
  inputs = {
74
  "clip_l": clip_l_embeddings, # [batch, 77, 768]
75
  "clip_g": clip_g_embeddings, # [batch, 77, 1280]
@@ -79,3 +86,11 @@ inputs = {
79
 
80
  recons, mu, logvar, _ = model(inputs, target_modalities=["clip_l", "clip_g"])
81
  ```
 
 
 
 
 
 
 
 
 
27
 
28
  - **Fusion Strategy**: adaptive_cantor
29
  - **Latent Dimension**: 2048
30
+ - **Training Steps**: 12,125
31
+ - **Best Loss**: 0.0377
32
  - **Prompt Source**: booru
33
 
34
 
35
+ ## Quick Load (Safetensors)
36
+
37
+ ```python
38
+ from safetensors.torch import load_file
39
+
40
+ # Load just the weights (fast)
41
+ state_dict = load_file("weights/lyra_illustrious_best.safetensors")
42
+
43
+ # Or specific step
44
+ state_dict = load_file("weights/lyra_illustrious_step_5000.safetensors")
45
+ ```
46
+
47
  ## T5 Input Format
48
 
49
  T5 receives a different input than CLIP to enable richer semantic understanding:
 
53
  T5 sees: "masterpiece, 1girl, blue hair, school uniform, smile ¶ A cheerful schoolgirl with blue hair smiling warmly"
54
  ```
55
 
56
+ The pilcrow (`¶`) separator acts as a mode-switch token.
 
 
 
 
57
 
58
 
59
  ## Learned Parameters
60
 
61
  **Alpha (Visibility):**
62
+ - clip_g: 0.7316
63
+ - clip_l: 0.7316
64
+ - t5_xl_g: 0.7339
65
+ - t5_xl_l: 0.7451
66
 
67
  **Beta (Capacity):**
68
+ - clip_l_t5_xl_l: 0.5709
69
+ - clip_g_t5_xl_g: 0.5763
70
 
71
 
72
  ## Usage
 
77
  model = load_lyra_from_hub("AbstractPhil/vae-lyra-xl-adaptive-cantor-illustrious")
78
  model.eval()
79
 
 
80
  inputs = {
81
  "clip_l": clip_l_embeddings, # [batch, 77, 768]
82
  "clip_g": clip_g_embeddings, # [batch, 77, 1280]
 
86
 
87
  recons, mu, logvar, _ = model(inputs, target_modalities=["clip_l", "clip_g"])
88
  ```
89
+
90
+ ## Files
91
+
92
+ - `model.pt` - Full checkpoint (model + optimizer + scheduler)
93
+ - `checkpoint_lyra_illustrious_XXXX.pt` - Step checkpoints
94
+ - `config.json` - Training configuration
95
+ - `weights/lyra_illustrious_best.safetensors` - Best model weights only
96
+ - `weights/lyra_illustrious_step_XXXX.safetensors` - Step checkpoints (weights only)