Animesh-null
/

STATERA

@@ -31,11 +31,11 @@ We provide two primary versions of the model in the root directory.
 > *(Note: Your local inference script must still connect to PyTorch Hub to fetch Meta's underlying Python class definitions to build the architecture graph in memory before our modified 1.25 GB weights are loaded into it).*
 ### 1. `STATERA-50K-Crescent.pth`
-* **Training Constraint:** Trained with phase-aware spatial targets (a crescent shape matching the rotational angle, which then decays to a high-frequency point).
 * **Behavior:** Achieves the highest physical disentanglement (**40.96% Physics Capture Ratio**) by actively hunting the offset mass and securing the highest overall **Unified HiddenMass Score (HMS)**. However, its strict precision makes it subject to Visual-Kinematic Aliasing (bimodal splits) during complex bounces. This is due to a "settling-state dataset bias" where the heavy face of the object naturally falls closer to the floor most of the time during simulated resting phases, occasionally causing the model to guess the gravitational bottom instead of the true mass mid-air.
 ### 2. `STATERA-50K-Sigma.pth`
-* **Training Constraint:** Trained with phase-agnostic Isotropic Gaussian targets (curriculum label smoothing) to cure the gravitational settling bias.
 * **Behavior:** Highly robust with exceptionally low spatial error (N-CoME) and Normalized Kinematic Jitter. However, spatial entropy analysis reveals this is driven by **"Expectation Collapse"** predicting high-entropy probability masses near the geometric center to minimize Euclidean risk and artificially safely score well on standard metrics. Its prediction heatmap is very diffuse, making it less physically accurate at capturing the true offset mass direction compared to the Crescent model.
 ---
@@ -54,7 +54,7 @@ In addition to the main models, we provide our baseline comparison models inside
 ## Architecture Details
 - **Base Model:** Meta V-JEPA 2.1 (ViT-Large [16-frame sequence compressed to T=8], initialized via PyTorch Hub: `vjepa2_1_vit_large_384`).
-- **Partial Fine-Tuning:** The final two transformer blocks of the V-JEPA backbone were unfrozen during training (utilizing gradient accumulation) to adapt the visual latent space to Newtonian mechanics.
 - **Time Processing:** Uses a 1D Convolution to mix the temporal tubelets together in the high-dimensional latent space, allowing the model to smoothly preserve momentum.
 - **Output:** A ~2.5M parameter Spatial Preservation Decoder outputs a smooth, continuous 2D heatmap (extracted via a Temperature-Scaled Soft-Argmax) guessing where the hidden mass is, alongside a 1D Z-Depth estimator to act as a perspective-invariant geometric regularizer.

 > *(Note: Your local inference script must still connect to PyTorch Hub to fetch Meta's underlying Python class definitions to build the architecture graph in memory before our modified 1.25 GB weights are loaded into it).*
 ### 1. `STATERA-50K-Crescent.pth`
+* **Training Constraint:** Trained with phase-aware spatial targets (using the `--target_type crescent` flag matching the rotational angle, which then decays to a high-frequency point).
 * **Behavior:** Achieves the highest physical disentanglement (**40.96% Physics Capture Ratio**) by actively hunting the offset mass and securing the highest overall **Unified HiddenMass Score (HMS)**. However, its strict precision makes it subject to Visual-Kinematic Aliasing (bimodal splits) during complex bounces. This is due to a "settling-state dataset bias" where the heavy face of the object naturally falls closer to the floor most of the time during simulated resting phases, occasionally causing the model to guess the gravitational bottom instead of the true mass mid-air.
 ### 2. `STATERA-50K-Sigma.pth`
+* **Training Constraint:** Trained with phase-agnostic Isotropic Gaussian targets (using the `--target_type dot` flag alongside curriculum label smoothing) to cure the gravitational settling bias.
 * **Behavior:** Highly robust with exceptionally low spatial error (N-CoME) and Normalized Kinematic Jitter. However, spatial entropy analysis reveals this is driven by **"Expectation Collapse"** predicting high-entropy probability masses near the geometric center to minimize Euclidean risk and artificially safely score well on standard metrics. Its prediction heatmap is very diffuse, making it less physically accurate at capturing the true offset mass direction compared to the Crescent model.
 ---
 ## Architecture Details
 - **Base Model:** Meta V-JEPA 2.1 (ViT-Large [16-frame sequence compressed to T=8], initialized via PyTorch Hub: `vjepa2_1_vit_large_384`).
+- **Partial Fine-Tuning:** The final two transformer blocks of the V-JEPA backbone were unfrozen during training (utilizing `--finetune_blocks 2` and gradient accumulation) to adapt the visual latent space to Newtonian mechanics.
 - **Time Processing:** Uses a 1D Convolution to mix the temporal tubelets together in the high-dimensional latent space, allowing the model to smoothly preserve momentum.
 - **Output:** A ~2.5M parameter Spatial Preservation Decoder outputs a smooth, continuous 2D heatmap (extracted via a Temperature-Scaled Soft-Argmax) guessing where the hidden mass is, alongside a 1D Z-Depth estimator to act as a perspective-invariant geometric regularizer.