ProgramerSalar
/

causal_vae_checkpoint

video-generation

Model card Files Files and versions

ProgramerSalar commited on Feb 12

Commit

6c42947

·

verified ·

1 Parent(s): d481e5e

Update README.md

Files changed (1) hide show

README.md +53 -3

README.md CHANGED Viewed

@@ -1,3 +1,53 @@
----
-license: mit
----

+---
+license: apache-2.0
+tags:
+- vae
+- video-generation
+- education
+- fine-tuning
+- pytorch
+---
+# 🎓 Causal VAE Fine-Tuning Experiments (Indian Math Curriculum)
+**Developing the "Imagination Engine" for [Zulense](https://huggingface.co/zulense)**
+This repository contains experimental checkpoints for a **Causal VAE (Variational Autoencoder)** fine-tuned specifically on Indian educational content (NCERT Math).
+The goal of these experiments is to adapt standard video generation VAEs to better reconstruct "blackboard style" line art, diagrams, and text-heavy educational videos, which often suffer from artifacts in general-purpose models.
+## 📂 Checkpoint Manifest
+We are releasing two distinct checkpoints representing different stages of our training curriculum.
+### 1. `FineTune_2_checkpoint.pth` (Recommended)
+* **Target Domain:** **Class 5 Numeracy & Foundation**
+* **Status:** ✅ **Improved Stability**
+* **Experiment Notes:** * This run focused on simpler, foundational concepts (Class 5 curriculum) to stabilize the loss.
+    * **Improvements:** Significantly reduced `kl_divergence` and reconstruction loss compared to the V1 baseline.
+    * **Use Case:** Better at handling basic shapes and slower temporal movements typical in primary education teaching.
+### 2. `checkpoint-0.pth` (Legacy / Research Artifact)
+* **Target Domain:** **Class 8 Geometry & Algebra**
+* **Status:** ⚠️ **Unstable / High Loss**
+* **Experiment Notes:** * This was our initial attempt at modeling complex Class 8 geometry.
+    * **Known Issues:** The model struggled with high-frequency details (text/grid lines), resulting in higher `vae_loss` and unstable KL divergence.
+    * **Why we kept it:** Retained for comparative analysis to show the difficulty jump between primary and middle school visual complexity.
+## 🔬 Technical Context
+Standard video VAEs are optimized for photorealism. Our experiments suggest that for **educational video synthesis**:
+1.  **Text Preservation:** Standard VAEs struggle to reconstruct the sharp text found in math explanations.
+2.  **Curriculum Learning:** Fine-tuning on simpler visual concepts (Class 5) before complex ones (Class 8) yields better convergence.
+## 💻 Usage (PyTorch)
+```python
+import torch
+# Load the Causal VAE checkpoint
+checkpoint_path = "FineTune_2_checkpoint.pth" # Use the stable Class 5 checkpoint
+state_dict = torch.load(checkpoint_path, map_location="cpu")
+print(f"Loaded checkpoint: {checkpoint_path}")
+# Note: This requires the specific Causal VAE architecture definition to load state_dict