Spaces:
Runtime error
System Architecture (Hardware Agnostic)
Deep technical reference for the SciMLx autonomous research loop components, optimized for both NVIDIA GPUs (PyTorch) and Apple Silicon (MLX).
3-Tier Scientific Implementation (SI) Layer
SciMLx utilizes a modular SI layer in core/ to decouple scientific logic from underlying hardware and compute frameworks. The layer is organized into three tiers:
Tier 1: Foundations
Core math and device abstractions that provide a stable, hardware-agnostic base.
device.py(Hardware Agnostic Dispatch): Automatically detects the best available backend (CUDA, MLX, MPS, or CPU) and provides a single API for framework-agnostic tensor creation (to_array()) and device placement (to_device()).units.py(SciMLTensor): Ensures mathematical and physical groundedness by performing dimensional analysis on every operation using thepintunit registry.lie_math.py&heat_kernels.py(Geometric Math): Implements Lie Algebra foundations and mesh-based heat kernel signatures for geometry-aware operators.oracle_constants.py(Buckingham Pi Theorem): Identifies dimensionless groups (like Reynolds or Péclet numbers) to aid in feature discovery and similarity analysis.
Tier 2: Models
Production-grade neural operators and the infrastructure to build them.
- Dual-Backend Operators: All new models are implemented with both PyTorch (
_torch.py) and MLX (_mlx.py) backends, managed by a central dispatcher (e.g.,models/mff.py). scaffold.py(Automated Scaffolding): Generates dual-backend model stubs from research proposals, ensuring feature parity across hardware.losses.py(Physics-Informed Operators): Framework-agnostic implementations of Sobolev ($H^1$, $H^2$) and Spectral losses that penalize unphysical oscillations.spectral_governor.py(Frequency Governance): Dynamically monitors the Fourier spectrum of residuals across backends and adjusts loss weighting to ensure high-frequency features are captured.
Tier 3: Production
Systems for scaling, deploying, and automating the research cycle.
deployment.py: Integration with Google Cloud Platform (Vertex AI, Compute Engine) for serverless GPU training.model_versioning.py: A centralized model registry and lineage tracking system for managing champion models.hpo.py(Bayesian Optimization): Automated hyperparameter search that adapts to hardware-specific constraints.dp_federated.py(Differential Privacy): Logic for secure, privacy-preserving federated training of scientific models.arxiv_agent.py(ASIL Pipeline): Orchestrates the Agentic Scientist Ideation Loop, from literature review to automated model scaffolding.
Two-Mode Operation
The system supports two operating modes that can be mixed within a session:
Mode A — Human-Guided
A human (or AI agent) reads RESEARCH_BRAIN.md, interprets results, edits experiments.yaml directly, and invokes autorun.py to execute the queue. The system handles execution, retry, and logging; the human handles strategy.
Mode B — Fully Autonomous (agent_loop.py)agent_loop.py performs one full autonomous cycle:
- Calls
tracker.analyze_lineage()to build per-benchmark summaries. - Calls
HypothesisEngine.analyze_benchmark()for each priority benchmark. - Calls
BayesianHPO.ask()to sample hyperparameters. - Generates new
ExperimentConfigentries and appends them toexperiments.yaml. - Triggers
autorun.pyto process the queue on NVIDIA GPUs.
Unified Trainer (core/trainer.py)
The trainer is designed to be high-performance while remaining flexible across backends:
Compute Optimizations
- NVIDIA/PyTorch: Utilizes
torch.compile()for kernel fusion andtorch.ampfor mixed precision training. - Apple/MLX: Leverages MLX's lazy evaluation and unified memory for efficient processing on M-series chips.
- Precision Management: Configurable precision levels (float32, bfloat16) mapped to hardware-specific best practices.
Training Logic
- EMA (Exponential Moving Average): Maintains a shadow copy of model weights for more stable evaluation.
- Dynamic Budget Extension: Automatically extends training time by 20% if the loss is still decreasing significantly at the end of the budget.
- Snapshot Ensembling: Optionally saves and averages multiple model states throughout the run.
Hardware-Accelerated PDE Solvers (data/simulations/)
All PDE solvers are implemented using framework-native spectral methods to ensure high-speed simulation on the active device:
- Spectral Methods: Utilize fast Fourier transforms (
torch.fftormlx.fft) for high-speed spectral derivatives and integration. - Zero-Copy Data: Solvers execute directly on the
DEVICE, producing tensors that never leave high-speed device memory during training. - Batch Processing: All simulations are vectorized to solve multiple initial conditions in parallel, maximizing device throughput.
High-Throughput Data Pipeline
The I/O bottleneck is eliminated through:
PDEDataset: AnIterableDatasetthat interfaces with cached.npzfiles or on-the-fly solvers.DataLoader: Standard PyTorch implementation with:pin_memory=True: For faster Host-to-Device transfer.num_workers > 0: For multi-process data pre-fetching.prefetch_factor: To keep the GPU saturated.
HypothesisEngine (core/hypothesis.py)
The engine classifies experiment outcomes to guide follow-up logic:
| Mode | Detection Logic |
|---|---|
gradient_collapse |
val_l2_rel ≥ 1.0, NaN loss, or CUDA launch errors. |
spectral_bias |
High-frequency error > 0.3 in spectral diagnostic. |
capacity_limited |
Small model (hidden_dim < 64) with high error. |
cuda_oom |
Log analysis detects "Out of Memory" on GPU. |
Retry Escalation (autorun.py)
Escalates through recovery levels for NVIDIA environments:
- r1 —
smart_fix(): Detects CUDA OOM and automatically halveshidden_dim. - r2: Aggressive reduction of
hidden_dim,n_layers, andn_modes. - r3: Minimal viable fallback (
h=32, l=2, lr=1e-4).
Cloud Infrastructure (GCP)
Configured for project gdpr-494411:
- Vertex AI: Custom container execution using the project's Artifact Registry.
- Compute Engine: G2-standard instances with NVIDIA L4 GPUs for development.