File size: 6,540 Bytes
d144d82 593cc7d aa988e4 593cc7d 328bca8 593cc7d aa988e4 593cc7d 328bca8 593cc7d 328bca8 593cc7d aa988e4 593cc7d 328bca8 593cc7d 328bca8 593cc7d 7b9c1dc 593cc7d 328bca8 d144d82 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 |
---
license: mit
language:
- en
---
# Physics Foundation Vision Transformer (PhysicsViT-ExtendedVersion)
A Vision Transformer model trained on multi-physics simulation data for scientific computing applications. This model is specifically designed for understanding and analyzing physics simulations across multiple domains.
**Model Version:** Extended Version - Trained for 195,930 steps
## Model Details
### Model Description
- **Developed by:** PhysicsAlchemists Research Team
- **Model type:** Vision Transformer (ViT-Huge)
- **License:** MIT Licence
- **Finetuned from model:** Trained from scratch on physics simulation data
- **Training Steps:** 195,930 steps
### Model Architecture
- **Architecture:** ViT-Huge (Feature Extraction)
- **Hidden size:** 1280
- **Number of layers:** 32
- **Number of attention heads:** 16
- **Intermediate size:** 5120
- **Image size:** 224×224
- **Patch size:** 16×16
- **Embedding dimension:** 1280
## Training Details
### Training Data
The model was trained on a comprehensive dataset of physics simulations including:
- Acoustic scattering (inclusions, discontinuous, maze)
- Active matter simulations
- Euler equations (multi-quadrants with open/periodic BC)
- Gray-Scott reaction-diffusion
- Helmholtz staircase
- Planetary shallow water equations
- Rayleigh-Bénard convection (standard and uniform)
- Shear flow dynamics
- Turbulent radiative layer (2D)
- Viscoelastic instability
### Training Configuration
- **Training regime:** 195,930 steps
- **Batch size:** 1,470
- **Learning rate:** 0.0005 (with warmup and cosine decay)
- **Optimizer:** Adam (β₁=0.9, β₂=0.999, weight_decay=0.0003)
- **Mixed precision:** bfloat16
- **Hardware:** Cerebras CS-X systems
### Data Augmentation
- Random colormap application (viridis, plasma, inferno, coolwarm)
- Grayscale conversion (30% probability)
- Temporal trajectory preservation during training
## Usage
⚠️ **Important:** This model requires specific preprocessing that differs from standard ViT models.
### Basic Usage
```python
from transformers import AutoModel, AutoImageProcessor
from PIL import Image
import torch
# Load model and processor
model = AutoModel.from_pretrained("JessicaE/physics-vit-full")
processor = AutoImageProcessor.from_pretrained("JessicaE/physics-vit-full")
# Load your physics image
image = Image.open("physics_simulation.png").convert('RGB')
# Apply custom preprocessing
image = expand_to_square(image, background_color=(128, 128, 128))
image = image.resize((224, 224), Image.BILINEAR)
# Convert to tensor and add batch dimension
from torchvision import transforms
tensor = transforms.ToTensor()(image).unsqueeze(0)
# Extract physics-aware embeddings
with torch.no_grad():
outputs = model(pixel_values=tensor)
# CLS token embedding (best for classification tasks)
cls_embedding = outputs.last_hidden_state[:, 0, :] # Shape: [1, 1280]
# Average pooled embedding (good for trajectory prediction)
pooled_embedding = outputs.last_hidden_state.mean(dim=1) # Shape: [1, 1280]
# Patch embeddings (for spatial analysis)
patch_embeddings = outputs.last_hidden_state[:, 1:, :] # Shape: [1, 196, 1280]
print(f"CLS embedding shape: {cls_embedding.shape}")
```
### Required Preprocessing Function
```python
from PIL import Image
def expand_to_square(pil_img, background_color):
"""
Pad image to square with background color, keeping image centered.
REQUIRED for Physics ViT - this preprocessing was used during training.
"""
background_color = tuple(background_color)
width, height = pil_img.size
if width == height:
return pil_img
elif width > height:
result = Image.new(pil_img.mode, (width, width), background_color)
result.paste(pil_img, (0, (width - height) // 2))
return result
else:
result = Image.new(pil_img.mode, (height, height), background_color)
result.paste(pil_img, ((height - width) // 2, 0))
return result
```
### Downstream Tasks
This model produces rich 1280-dimensional embeddings optimized for:
- **Physics Domain Classification:** Use CLS token embeddings
- **Temporal Forecasting:** Use pooled embeddings for trajectory prediction
- **Clustering & Similarity:** Use CLS or pooled embeddings
- **Spatial Analysis:** Use patch embeddings
- **Transfer Learning:** Fine-tune embeddings for new physics domains
## Performance
The model has been evaluated against DINO v2 and CLIP on physics-specific tasks:
- **Classification:** Superior performance on physics domain classification
- **Temporal Forecasting:** Better prediction of physics evolution
- **Clustering:** Clearer separation of physics simulation types
- **Transfer Learning:** Robust features for new physics applications
*Detailed benchmarks available in the original research.*
## Model Versions
- **Standard Version:** 78,372 training steps- Good balance of performance and training efficiency
- **Extended Version:** 195,930 training steps- Maximum performance, longer training
## Installation
```bash
pip install transformers torch torchvision pillow
```
## Limitations
- **Domain Specific:** Optimized for physics simulations, may not generalize to natural images
- **Preprocessing Required:** Must use expand_to_square preprocessing for correct results
- **Resolution:** Optimized for 224×224 input images
- **Physics Domains:** Trained on specific simulation types listed above
## Citation
```bibtex
@misc{physics-vit-2025,
title={PhySiViT : A Physics Simulation Vision Transformer},
author={Jessica Ezemba, James Afful, Mei-Yu Wang},
year={2025},
howpublished={HuggingFace Model Hub},
url={https://huggingface.co/JessicaE/physics-vit-full}
}
```
## Acknowledgments
- Built using [Cerebras ModelZoo](https://github.com/Cerebras/modelzoo)
- Trained on Cerebras CS-X systems and Bridges-2 GPUs (Pittsburgh Supercomputing Center)
- Based on Vision Transformer architecture
- This work was made possible thanks to the ByteBoost cybertraining program which is funded by the National Science Foundation Cybertraining awards: 2320990, 2320991, and 2320992, and the Neocortex project, the ACES platform, and the Ookami cluster.
- The Neocortex project is supported by National Science Foundation award number 2005597.
- The ACES (Accelerating Computing for Emerging Sciences) platform was funded by National Science Foundation award number 2112356.
- The Ookami cluster is supported by National Science Foundation award number 1927880. |