File size: 6,540 Bytes
d144d82
 
 
 
 
593cc7d
 
 
 
aa988e4
593cc7d
 
 
 
 
 
 
328bca8
593cc7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aa988e4
593cc7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
328bca8
 
593cc7d
 
 
 
328bca8
593cc7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
aa988e4
 
593cc7d
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
328bca8
 
 
 
593cc7d
328bca8
593cc7d
 
 
 
 
 
7b9c1dc
593cc7d
328bca8
 
 
d144d82
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
---
license: mit
language:
- en
---
# Physics Foundation Vision Transformer (PhysicsViT-ExtendedVersion)

A Vision Transformer model trained on multi-physics simulation data for scientific computing applications. This model is specifically designed for understanding and analyzing physics simulations across multiple domains.

**Model Version:** Extended Version - Trained for 195,930 steps

## Model Details

### Model Description

- **Developed by:** PhysicsAlchemists Research Team
- **Model type:** Vision Transformer (ViT-Huge)
- **License:** MIT Licence
- **Finetuned from model:** Trained from scratch on physics simulation data
- **Training Steps:** 195,930 steps

### Model Architecture

- **Architecture:** ViT-Huge (Feature Extraction)
- **Hidden size:** 1280
- **Number of layers:** 32
- **Number of attention heads:** 16
- **Intermediate size:** 5120
- **Image size:** 224×224
- **Patch size:** 16×16
- **Embedding dimension:** 1280

## Training Details

### Training Data

The model was trained on a comprehensive dataset of physics simulations including:

- Acoustic scattering (inclusions, discontinuous, maze)
- Active matter simulations  
- Euler equations (multi-quadrants with open/periodic BC)
- Gray-Scott reaction-diffusion
- Helmholtz staircase
- Planetary shallow water equations
- Rayleigh-Bénard convection (standard and uniform)
- Shear flow dynamics
- Turbulent radiative layer (2D)
- Viscoelastic instability

### Training Configuration

- **Training regime:** 195,930 steps
- **Batch size:** 1,470
- **Learning rate:** 0.0005 (with warmup and cosine decay)
- **Optimizer:** Adam (β₁=0.9, β₂=0.999, weight_decay=0.0003)
- **Mixed precision:** bfloat16
- **Hardware:** Cerebras CS-X systems

### Data Augmentation

- Random colormap application (viridis, plasma, inferno, coolwarm)
- Grayscale conversion (30% probability)
- Temporal trajectory preservation during training

## Usage

⚠️ **Important:** This model requires specific preprocessing that differs from standard ViT models.

### Basic Usage

```python
from transformers import AutoModel, AutoImageProcessor
from PIL import Image
import torch

# Load model and processor
model = AutoModel.from_pretrained("JessicaE/physics-vit-full")
processor = AutoImageProcessor.from_pretrained("JessicaE/physics-vit-full")

# Load your physics image
image = Image.open("physics_simulation.png").convert('RGB')

# Apply custom preprocessing
image = expand_to_square(image, background_color=(128, 128, 128))
image = image.resize((224, 224), Image.BILINEAR)

# Convert to tensor and add batch dimension
from torchvision import transforms
tensor = transforms.ToTensor()(image).unsqueeze(0)

# Extract physics-aware embeddings
with torch.no_grad():
    outputs = model(pixel_values=tensor)
    
    # CLS token embedding (best for classification tasks)
    cls_embedding = outputs.last_hidden_state[:, 0, :]  # Shape: [1, 1280]
    
    # Average pooled embedding (good for trajectory prediction)  
    pooled_embedding = outputs.last_hidden_state.mean(dim=1)  # Shape: [1, 1280]
    
    # Patch embeddings (for spatial analysis)
    patch_embeddings = outputs.last_hidden_state[:, 1:, :]  # Shape: [1, 196, 1280]

print(f"CLS embedding shape: {cls_embedding.shape}")
```

### Required Preprocessing Function

```python
from PIL import Image

def expand_to_square(pil_img, background_color):
    """
    Pad image to square with background color, keeping image centered.
    
    REQUIRED for Physics ViT - this preprocessing was used during training.
    """
    background_color = tuple(background_color)
    width, height = pil_img.size
    if width == height:
        return pil_img
    elif width > height:
        result = Image.new(pil_img.mode, (width, width), background_color)
        result.paste(pil_img, (0, (width - height) // 2))
        return result
    else:
        result = Image.new(pil_img.mode, (height, height), background_color)
        result.paste(pil_img, ((height - width) // 2, 0))
        return result
```

### Downstream Tasks

This model produces rich 1280-dimensional embeddings optimized for:

- **Physics Domain Classification:** Use CLS token embeddings
- **Temporal Forecasting:** Use pooled embeddings for trajectory prediction
- **Clustering & Similarity:** Use CLS or pooled embeddings
- **Spatial Analysis:** Use patch embeddings
- **Transfer Learning:** Fine-tune embeddings for new physics domains

## Performance

The model has been evaluated against DINO v2 and CLIP on physics-specific tasks:

- **Classification:** Superior performance on physics domain classification
- **Temporal Forecasting:** Better prediction of physics evolution
- **Clustering:** Clearer separation of physics simulation types
- **Transfer Learning:** Robust features for new physics applications

*Detailed benchmarks available in the original research.*

## Model Versions

- **Standard Version:** 78,372 training steps- Good balance of performance and training efficiency
- **Extended Version:** 195,930 training steps- Maximum performance, longer training

## Installation

```bash
pip install transformers torch torchvision pillow
```

## Limitations

- **Domain Specific:** Optimized for physics simulations, may not generalize to natural images
- **Preprocessing Required:** Must use expand_to_square preprocessing for correct results
- **Resolution:** Optimized for 224×224 input images
- **Physics Domains:** Trained on specific simulation types listed above

## Citation

```bibtex
@misc{physics-vit-2025,
  title={PhySiViT : A Physics Simulation Vision Transformer},
  author={Jessica Ezemba, James Afful, Mei-Yu Wang},
  year={2025},
  howpublished={HuggingFace Model Hub},
  url={https://huggingface.co/JessicaE/physics-vit-full}
}
```

## Acknowledgments

- Built using [Cerebras ModelZoo](https://github.com/Cerebras/modelzoo)
- Trained on Cerebras CS-X systems and Bridges-2 GPUs (Pittsburgh Supercomputing Center)
- Based on Vision Transformer architecture
- This work was made possible thanks to the ByteBoost cybertraining program which is funded by the National Science Foundation Cybertraining awards: 2320990, 2320991, and 2320992, and the Neocortex project, the ACES platform, and the Ookami cluster.
- The Neocortex project is supported by National Science Foundation award number 2005597.
- The ACES (Accelerating Computing for Emerging Sciences) platform was funded by National Science Foundation award number 2112356.
- The Ookami cluster is supported by National Science Foundation award number 1927880.