ProgramerSalar commited on
Commit
3435aa3
·
verified ·
1 Parent(s): 9dbcccd

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +57 -3
README.md CHANGED
@@ -1,3 +1,57 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ library_name: diffusers
4
+ tags:
5
+ - text-to-video
6
+ - dit
7
+ - diffusion-transformer
8
+ - education
9
+ - zulense
10
+ ---
11
+
12
+ # 🧠 DiT (Diffusion Transformer) Fine-Tuning Experiments
13
+
14
+ **Core Backbone for the [Zulense Z1 Foundation Model](https://huggingface.co/zulense/z1)**
15
+
16
+ This repository hosts the **Diffusion Transformer (DiT)** checkpoints trained to generate educational video content. These models operate in the latent space of our [Causal VAE](https://huggingface.co/ProgramerSalar/causal_vae_checkpoint) and are responsible for the temporal consistency and logical flow of the generated math lectures.
17
+
18
+ ## 📂 Model Ledger & Performance
19
+
20
+ We are releasing the training logs to demonstrate the optimization curve of the "Imagination Engine."
21
+
22
+ ### 1. `finetune_2_pytorch_model.bin` (🌟 Production Candidate)
23
+ * **Role:** **The Z1 Foundation Backbone**
24
+ * **Status:** ✅ **Converged / High Fidelity**
25
+ * **Performance:**
26
+ * This checkpoint represents our stable run. It successfully learned to align temporal attention with the "teacher's movement" and "blackboard writing" logic.
27
+ * **Metrics:** Achieved target validation loss on the Class 5 & 8 Math dataset.
28
+ * **Behavior:** Shows strong temporal coherence (objects don't disappear randomly) and adheres to the physics of writing on a board.
29
+ * **Recommendation:** **Use this file** for all inference tasks related to Zulense Z1.
30
+
31
+ ### 2. `finetune_1_pytorch_model.bin` (Experimental / Deprecated)
32
+ * **Role:** **Initial Warmup Run**
33
+ * **Status:** ⚠️ **Underfitted / High Noise**
34
+ * **Performance:**
35
+ * This was an early checkpoint where the model struggled to decouple the background (classroom) from the foreground (teacher).
36
+ * **Issues:** Resulted in "flickering" artifacts and poor text alignment.
37
+ * **Archived:** Kept here for research comparison to show the impact of our improved data scheduling in `finetune_2`.
38
+
39
+ ## 🏗️ Architecture Context
40
+
41
+ The Zulense Video Pipeline follows a two-stage generation process:
42
+ 1. **Stage 1 (VAE):** Compresses video into latents (See: `causal_vae_checkpoint`).
43
+ 2. **Stage 2 (DiT):** This model (`finetune_2`) acts as the denoising backbone, predicting the latent patches over time based on text prompts (e.g., *"Draw a triangle with 3 angles"*).
44
+
45
+ ## 💻 Usage (Loading Weights)
46
+
47
+ ```python
48
+ import torch
49
+
50
+ # Path to the best performing checkpoint
51
+ model_path = "finetune_2_pytorch_model.bin"
52
+
53
+ # Load weights (assuming standard DiT structure)
54
+ state_dict = torch.load(model_path, map_location="cpu")
55
+
56
+ print(f"✅ Loaded DiT Backbone: {model_path}")
57
+ print(f"Tensor keys found: {len(state_dict.keys())}")