--- license: mit language: - en --- # Physics Foundation Vision Transformer (PhysicsViT-ExtendedVersion) A Vision Transformer model trained on multi-physics simulation data for scientific computing applications. This model is specifically designed for understanding and analyzing physics simulations across multiple domains. **Model Version:** Extended Version - Trained for 195,930 steps ## Model Details ### Model Description - **Developed by:** PhysicsAlchemists Research Team - **Model type:** Vision Transformer (ViT-Huge) - **License:** MIT Licence - **Finetuned from model:** Trained from scratch on physics simulation data - **Training Steps:** 195,930 steps ### Model Architecture - **Architecture:** ViT-Huge (Feature Extraction) - **Hidden size:** 1280 - **Number of layers:** 32 - **Number of attention heads:** 16 - **Intermediate size:** 5120 - **Image size:** 224×224 - **Patch size:** 16×16 - **Embedding dimension:** 1280 ## Training Details ### Training Data The model was trained on a comprehensive dataset of physics simulations including: - Acoustic scattering (inclusions, discontinuous, maze) - Active matter simulations - Euler equations (multi-quadrants with open/periodic BC) - Gray-Scott reaction-diffusion - Helmholtz staircase - Planetary shallow water equations - Rayleigh-Bénard convection (standard and uniform) - Shear flow dynamics - Turbulent radiative layer (2D) - Viscoelastic instability ### Training Configuration - **Training regime:** 195,930 steps - **Batch size:** 1,470 - **Learning rate:** 0.0005 (with warmup and cosine decay) - **Optimizer:** Adam (β₁=0.9, β₂=0.999, weight_decay=0.0003) - **Mixed precision:** bfloat16 - **Hardware:** Cerebras CS-X systems ### Data Augmentation - Random colormap application (viridis, plasma, inferno, coolwarm) - Grayscale conversion (30% probability) - Temporal trajectory preservation during training ## Usage ⚠️ **Important:** This model requires specific preprocessing that differs from standard ViT models. ### Basic Usage ```python from transformers import AutoModel, AutoImageProcessor from PIL import Image import torch # Load model and processor model = AutoModel.from_pretrained("JessicaE/physics-vit-full") processor = AutoImageProcessor.from_pretrained("JessicaE/physics-vit-full") # Load your physics image image = Image.open("physics_simulation.png").convert('RGB') # Apply custom preprocessing image = expand_to_square(image, background_color=(128, 128, 128)) image = image.resize((224, 224), Image.BILINEAR) # Convert to tensor and add batch dimension from torchvision import transforms tensor = transforms.ToTensor()(image).unsqueeze(0) # Extract physics-aware embeddings with torch.no_grad(): outputs = model(pixel_values=tensor) # CLS token embedding (best for classification tasks) cls_embedding = outputs.last_hidden_state[:, 0, :] # Shape: [1, 1280] # Average pooled embedding (good for trajectory prediction) pooled_embedding = outputs.last_hidden_state.mean(dim=1) # Shape: [1, 1280] # Patch embeddings (for spatial analysis) patch_embeddings = outputs.last_hidden_state[:, 1:, :] # Shape: [1, 196, 1280] print(f"CLS embedding shape: {cls_embedding.shape}") ``` ### Required Preprocessing Function ```python from PIL import Image def expand_to_square(pil_img, background_color): """ Pad image to square with background color, keeping image centered. REQUIRED for Physics ViT - this preprocessing was used during training. """ background_color = tuple(background_color) width, height = pil_img.size if width == height: return pil_img elif width > height: result = Image.new(pil_img.mode, (width, width), background_color) result.paste(pil_img, (0, (width - height) // 2)) return result else: result = Image.new(pil_img.mode, (height, height), background_color) result.paste(pil_img, ((height - width) // 2, 0)) return result ``` ### Downstream Tasks This model produces rich 1280-dimensional embeddings optimized for: - **Physics Domain Classification:** Use CLS token embeddings - **Temporal Forecasting:** Use pooled embeddings for trajectory prediction - **Clustering & Similarity:** Use CLS or pooled embeddings - **Spatial Analysis:** Use patch embeddings - **Transfer Learning:** Fine-tune embeddings for new physics domains ## Performance The model has been evaluated against DINO v2 and CLIP on physics-specific tasks: - **Classification:** Superior performance on physics domain classification - **Temporal Forecasting:** Better prediction of physics evolution - **Clustering:** Clearer separation of physics simulation types - **Transfer Learning:** Robust features for new physics applications *Detailed benchmarks available in the original research.* ## Model Versions - **Standard Version:** 78,372 training steps- Good balance of performance and training efficiency - **Extended Version:** 195,930 training steps- Maximum performance, longer training ## Installation ```bash pip install transformers torch torchvision pillow ``` ## Limitations - **Domain Specific:** Optimized for physics simulations, may not generalize to natural images - **Preprocessing Required:** Must use expand_to_square preprocessing for correct results - **Resolution:** Optimized for 224×224 input images - **Physics Domains:** Trained on specific simulation types listed above ## Citation ```bibtex @misc{physics-vit-2025, title={PhySiViT : A Physics Simulation Vision Transformer}, author={Jessica Ezemba, James Afful, Mei-Yu Wang}, year={2025}, howpublished={HuggingFace Model Hub}, url={https://huggingface.co/JessicaE/physics-vit-full} } ``` ## Acknowledgments - Built using [Cerebras ModelZoo](https://github.com/Cerebras/modelzoo) - Trained on Cerebras CS-X systems and Bridges-2 GPUs (Pittsburgh Supercomputing Center) - Based on Vision Transformer architecture - This work was made possible thanks to the ByteBoost cybertraining program which is funded by the National Science Foundation Cybertraining awards: 2320990, 2320991, and 2320992, and the Neocortex project, the ACES platform, and the Ookami cluster. - The Neocortex project is supported by National Science Foundation award number 2005597. - The ACES (Accelerating Computing for Emerging Sciences) platform was funded by National Science Foundation award number 2112356. - The Ookami cluster is supported by National Science Foundation award number 1927880.