physics-vit-full / README.md

Update README.md

aa988e4 verified 4 months ago

6.54 kB

	---
	license: mit
	language:
	- en
	---
	# Physics Foundation Vision Transformer (PhysicsViT-ExtendedVersion)

	A Vision Transformer model trained on multi-physics simulation data for scientific computing applications. This model is specifically designed for understanding and analyzing physics simulations across multiple domains.

	Model Version: Extended Version - Trained for 195,930 steps

	## Model Details

	### Model Description

	- Developed by: PhysicsAlchemists Research Team
	- Model type: Vision Transformer (ViT-Huge)
	- License: MIT Licence
	- Finetuned from model: Trained from scratch on physics simulation data
	- Training Steps: 195,930 steps

	### Model Architecture

	- Architecture: ViT-Huge (Feature Extraction)
	- Hidden size: 1280
	- Number of layers: 32
	- Number of attention heads: 16
	- Intermediate size: 5120
	- Image size: 224×224
	- Patch size: 16×16
	- Embedding dimension: 1280

	## Training Details

	### Training Data

	The model was trained on a comprehensive dataset of physics simulations including:

	- Acoustic scattering (inclusions, discontinuous, maze)
	- Active matter simulations
	- Euler equations (multi-quadrants with open/periodic BC)
	- Gray-Scott reaction-diffusion
	- Helmholtz staircase
	- Planetary shallow water equations
	- Rayleigh-Bénard convection (standard and uniform)
	- Shear flow dynamics
	- Turbulent radiative layer (2D)
	- Viscoelastic instability

	### Training Configuration

	- Training regime: 195,930 steps
	- Batch size: 1,470
	- Learning rate: 0.0005 (with warmup and cosine decay)
	- Optimizer: Adam (β₁=0.9, β₂=0.999, weight_decay=0.0003)
	- Mixed precision: bfloat16
	- Hardware: Cerebras CS-X systems

	### Data Augmentation

	- Random colormap application (viridis, plasma, inferno, coolwarm)
	- Grayscale conversion (30% probability)
	- Temporal trajectory preservation during training

	## Usage

	⚠️ Important: This model requires specific preprocessing that differs from standard ViT models.

	### Basic Usage

	```python
	from transformers import AutoModel, AutoImageProcessor
	from PIL import Image
	import torch

	# Load model and processor
	model = AutoModel.from_pretrained("JessicaE/physics-vit-full")
	processor = AutoImageProcessor.from_pretrained("JessicaE/physics-vit-full")

	# Load your physics image
	image = Image.open("physics_simulation.png").convert('RGB')

	# Apply custom preprocessing
	image = expand_to_square(image, background_color=(128, 128, 128))
	image = image.resize((224, 224), Image.BILINEAR)

	# Convert to tensor and add batch dimension
	from torchvision import transforms
	tensor = transforms.ToTensor()(image).unsqueeze(0)

	# Extract physics-aware embeddings
	with torch.no_grad():
	outputs = model(pixel_values=tensor)

	# CLS token embedding (best for classification tasks)
	cls_embedding = outputs.last_hidden_state[:, 0, :] # Shape: [1, 1280]

	# Average pooled embedding (good for trajectory prediction)
	pooled_embedding = outputs.last_hidden_state.mean(dim=1) # Shape: [1, 1280]

	# Patch embeddings (for spatial analysis)
	patch_embeddings = outputs.last_hidden_state[:, 1:, :] # Shape: [1, 196, 1280]

	print(f"CLS embedding shape: {cls_embedding.shape}")
	```

	### Required Preprocessing Function

	```python
	from PIL import Image

	def expand_to_square(pil_img, background_color):
	"""
	Pad image to square with background color, keeping image centered.

	REQUIRED for Physics ViT - this preprocessing was used during training.
	"""
	background_color = tuple(background_color)
	width, height = pil_img.size
	if width == height:
	return pil_img
	elif width > height:
	result = Image.new(pil_img.mode, (width, width), background_color)
	result.paste(pil_img, (0, (width - height) // 2))
	return result
	else:
	result = Image.new(pil_img.mode, (height, height), background_color)
	result.paste(pil_img, ((height - width) // 2, 0))
	return result
	```

	### Downstream Tasks

	This model produces rich 1280-dimensional embeddings optimized for:

	- Physics Domain Classification: Use CLS token embeddings
	- Temporal Forecasting: Use pooled embeddings for trajectory prediction
	- Clustering & Similarity: Use CLS or pooled embeddings
	- Spatial Analysis: Use patch embeddings
	- Transfer Learning: Fine-tune embeddings for new physics domains

	## Performance

	The model has been evaluated against DINO v2 and CLIP on physics-specific tasks:

	- Classification: Superior performance on physics domain classification
	- Temporal Forecasting: Better prediction of physics evolution
	- Clustering: Clearer separation of physics simulation types
	- Transfer Learning: Robust features for new physics applications

	Detailed benchmarks available in the original research.

	## Model Versions

	- Standard Version: 78,372 training steps- Good balance of performance and training efficiency
	- Extended Version: 195,930 training steps- Maximum performance, longer training

	## Installation

	```bash
	pip install transformers torch torchvision pillow
	```

	## Limitations

	- Domain Specific: Optimized for physics simulations, may not generalize to natural images
	- Preprocessing Required: Must use expand_to_square preprocessing for correct results
	- Resolution: Optimized for 224×224 input images
	- Physics Domains: Trained on specific simulation types listed above

	## Citation

	```bibtex
	@misc{physics-vit-2025,
	title={PhySiViT : A Physics Simulation Vision Transformer},
	author={Jessica Ezemba, James Afful, Mei-Yu Wang},
	year={2025},
	howpublished={HuggingFace Model Hub},
	url={https://huggingface.co/JessicaE/physics-vit-full}
	}
	```

	## Acknowledgments

	- Built using [Cerebras ModelZoo](https://github.com/Cerebras/modelzoo)
	- Trained on Cerebras CS-X systems and Bridges-2 GPUs (Pittsburgh Supercomputing Center)
	- Based on Vision Transformer architecture
	- This work was made possible thanks to the ByteBoost cybertraining program which is funded by the National Science Foundation Cybertraining awards: 2320990, 2320991, and 2320992, and the Neocortex project, the ACES platform, and the Ookami cluster.
	- The Neocortex project is supported by National Science Foundation award number 2005597.
	- The ACES (Accelerating Computing for Emerging Sciences) platform was funded by National Science Foundation award number 2112356.
	- The Ookami cluster is supported by National Science Foundation award number 1927880.