Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,53 @@
|
|
| 1 |
-
---
|
| 2 |
-
license:
|
| 3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
tags:
|
| 4 |
+
- vae
|
| 5 |
+
- video-generation
|
| 6 |
+
- education
|
| 7 |
+
- fine-tuning
|
| 8 |
+
- pytorch
|
| 9 |
+
---
|
| 10 |
+
|
| 11 |
+
# 🎓 Causal VAE Fine-Tuning Experiments (Indian Math Curriculum)
|
| 12 |
+
|
| 13 |
+
**Developing the "Imagination Engine" for [Zulense](https://huggingface.co/zulense)**
|
| 14 |
+
|
| 15 |
+
This repository contains experimental checkpoints for a **Causal VAE (Variational Autoencoder)** fine-tuned specifically on Indian educational content (NCERT Math).
|
| 16 |
+
|
| 17 |
+
The goal of these experiments is to adapt standard video generation VAEs to better reconstruct "blackboard style" line art, diagrams, and text-heavy educational videos, which often suffer from artifacts in general-purpose models.
|
| 18 |
+
|
| 19 |
+
## 📂 Checkpoint Manifest
|
| 20 |
+
|
| 21 |
+
We are releasing two distinct checkpoints representing different stages of our training curriculum.
|
| 22 |
+
|
| 23 |
+
### 1. `FineTune_2_checkpoint.pth` (Recommended)
|
| 24 |
+
* **Target Domain:** **Class 5 Numeracy & Foundation**
|
| 25 |
+
* **Status:** ✅ **Improved Stability**
|
| 26 |
+
* **Experiment Notes:** * This run focused on simpler, foundational concepts (Class 5 curriculum) to stabilize the loss.
|
| 27 |
+
* **Improvements:** Significantly reduced `kl_divergence` and reconstruction loss compared to the V1 baseline.
|
| 28 |
+
* **Use Case:** Better at handling basic shapes and slower temporal movements typical in primary education teaching.
|
| 29 |
+
|
| 30 |
+
### 2. `checkpoint-0.pth` (Legacy / Research Artifact)
|
| 31 |
+
* **Target Domain:** **Class 8 Geometry & Algebra**
|
| 32 |
+
* **Status:** ⚠️ **Unstable / High Loss**
|
| 33 |
+
* **Experiment Notes:** * This was our initial attempt at modeling complex Class 8 geometry.
|
| 34 |
+
* **Known Issues:** The model struggled with high-frequency details (text/grid lines), resulting in higher `vae_loss` and unstable KL divergence.
|
| 35 |
+
* **Why we kept it:** Retained for comparative analysis to show the difficulty jump between primary and middle school visual complexity.
|
| 36 |
+
|
| 37 |
+
## 🔬 Technical Context
|
| 38 |
+
|
| 39 |
+
Standard video VAEs are optimized for photorealism. Our experiments suggest that for **educational video synthesis**:
|
| 40 |
+
1. **Text Preservation:** Standard VAEs struggle to reconstruct the sharp text found in math explanations.
|
| 41 |
+
2. **Curriculum Learning:** Fine-tuning on simpler visual concepts (Class 5) before complex ones (Class 8) yields better convergence.
|
| 42 |
+
|
| 43 |
+
## 💻 Usage (PyTorch)
|
| 44 |
+
|
| 45 |
+
```python
|
| 46 |
+
import torch
|
| 47 |
+
|
| 48 |
+
# Load the Causal VAE checkpoint
|
| 49 |
+
checkpoint_path = "FineTune_2_checkpoint.pth" # Use the stable Class 5 checkpoint
|
| 50 |
+
state_dict = torch.load(checkpoint_path, map_location="cpu")
|
| 51 |
+
|
| 52 |
+
print(f"Loaded checkpoint: {checkpoint_path}")
|
| 53 |
+
# Note: This requires the specific Causal VAE architecture definition to load state_dict
|