Rtx09 commited on Mar 15

Commit

8a82d34

0 Parent(s):

TRIADS — 6-benchmark weights + model code + Gradio app

Benchmarks:
- matbench_steels: 91.20 MPa (HybridTRIADS V13A, 225K, 5-fold 5-seed avg)
- matbench_expt_gap: 0.3068 eV (HybridTRIADS V3, 100K)
- matbench_expt_ismetal: 0.9655 AUC (HybridTRIADS, 44K, best comp-only)
- matbench_glass: 0.9285 AUC (HybridTRIADS, 44K, 5-seed)
- matbench_jdft2d: 35.89 meV (HybridTRIADS V4, 75K, 5-fold 5-seed avg)
- matbench_phonons: 41.91 cm-1 (GraphTRIADS V6, 247K, gate-halt)

Files changed (18) hide show

.gitattributes +2 -0
README.md +160 -0
app.py +658 -0
model_code/__init__.py +9 -0
model_code/classification_model.py +734 -0
model_code/expt_gap_model.py +579 -0
model_code/jdft2d_model.py +589 -0
model_code/phonons_dataset_builder.py +749 -0
model_code/phonons_model.py +839 -0
model_code/steels_model.py +1056 -0
requirements.txt +10 -0
weights/README.md +3 -0
weights/expt_gap/weights.pt +3 -0
weights/glass/weights.pt +3 -0
weights/is_metal/weights.pt +3 -0
weights/jdft2d/weights.pt +3 -0
weights/phonons/weights.pt +3 -0
weights/steels/weights.pt +3 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,2 @@


1	+ *.pt filter=lfs diff=lfs merge=lfs -text
2	+ weights/** filter=lfs diff=lfs merge=lfs -text

README.md ADDED Viewed

	@@ -0,0 +1,160 @@

+---
+license: mit
+language: en
+tags:
+  - materials-science
+  - machine-learning
+  - pytorch
+  - matbench
+  - small-data
+  - attention
+  - recursive
+  - crystal
+  - gradio
+datasets:
+  - matbench
+metrics:
+  - mae
+  - roc_auc
+model-index:
+  - name: TRIADS
+    results:
+      - task:
+          type: regression
+          name: Yield Strength Prediction (MPa)
+        dataset:
+          name: matbench_steels
+          type: matbench
+        metrics:
+          - type: mae
+            value: 91.20
+            name: MAE (MPa)
+      - task:
+          type: regression
+          name: Band Gap Prediction (eV)
+        dataset:
+          name: matbench_expt_gap
+          type: matbench
+        metrics:
+          - type: mae
+            value: 0.3068
+            name: MAE (eV)
+      - task:
+          type: classification
+          name: Metallicity Classification
+        dataset:
+          name: matbench_expt_ismetal
+          type: matbench
+        metrics:
+          - type: roc_auc
+            value: 0.9655
+            name: ROC-AUC
+      - task:
+          type: classification
+          name: Glass Forming Ability
+        dataset:
+          name: matbench_glass
+          type: matbench
+        metrics:
+          - type: roc_auc
+            value: 0.9285
+            name: ROC-AUC
+      - task:
+          type: regression
+          name: Exfoliation Energy (meV/atom)
+        dataset:
+          name: matbench_jdft2d
+          type: matbench
+        metrics:
+          - type: mae
+            value: 35.89
+            name: MAE (meV/atom)
+      - task:
+          type: regression
+          name: Peak Phonon Frequency (cm⁻¹)
+        dataset:
+          name: matbench_phonons
+          type: matbench
+        metrics:
+          - type: mae
+            value: 41.91
+            name: MAE (cm⁻¹)
+---
+# TRIADS — Materials Property Prediction Across 6 Matbench Benchmarks
+**TRIADS (Tiny Recursive Information-Attention with Deep Supervision)** is a parameter-efficient recursive architecture for materials property prediction, purpose-built for the **small-data regime** (312–5,680 samples).
+[![GitHub](https://img.shields.io/badge/GitHub-Code-black?logo=github)](https://github.com/Rtx09x/TRIADS)
+[![Paper](https://img.shields.io/badge/Paper-PDF-red)](https://github.com/Rtx09x/TRIADS/raw/main/TRIADS_Final.pdf)
+## Live Demo
+Try the interactive demo with all 6 benchmarks → **[Launch App](https://huggingface.co/spaces/Rtx09/TRIADS)**
+## Results Summary
+| Task | N | TRIADS | Params | Rank |
+|---|---|---|---|---|
+| `matbench_steels` (yield strength) | 312 | **91.20 MPa** | 225K | #3 |
+| `matbench_expt_gap` (band gap) | 4,604 | **0.3068 eV** | 100K | #2 composition-only |
+| `matbench_expt_ismetal` (metal?) | 4,921 | **0.9655 ROC-AUC** | 100K | **#1** composition-only |
+| `matbench_glass` (glass forming) | 5,680 | **0.9285 ROC-AUC** | 44K | #2 |
+| `matbench_jdft2d` (exfol. energy) | 636 | **35.89 meV/atom** | 75K | **#1** no-pretraining |
+| `matbench_phonons` (phonon freq.) | 1,265 | **41.91 cm⁻¹** | 247K | **#1** no-pretraining |
+## Two Model Variants
+### HybridTRIADS (composition tasks: steels, gap, ismetal, glass, jdft2d)
+Input: Chemical formula → Magpie + Mat2Vec (composition tokens)
+Core: 2-layer self-attention cell, iterated T=16-20 times with shared weights
+Training: Per-cycle deep supervision (w_t ∝ t)
+### GraphTRIADS (structural task: phonons)
+Input: CIF/structure → 3-order hierarchical crystal graph (atoms, bonds, triplet angles, dihedral angles)
+Core: Hierarchical GNN message-passing stack inside the shared recursive cell
+Halting: Gate-based adaptive halting (4–16 cycles per sample)
+## Pretrained Checkpoints
+Weights are organized by benchmark. Download via `huggingface_hub`:
+```python
+from huggingface_hub import hf_hub_download
+import torch
+# Download one benchmark's weights (contains all folds compiled)
+ckpt = torch.load(
+    hf_hub_download("Rtx09/TRIADS", "steels/weights.pt"),
+    map_location="cpu"
+)
+# ckpt['folds']   -> list of fold dicts, each with 'model_state' and 'test_mae'
+# ckpt['n_extra'] -> int  (needed for model init)
+# ckpt['config']  -> dict (d_attn, d_hidden, ff_dim, dropout, max_steps)
+```
+### Checkpoint Index
+| Benchmark | File | Folds | Notes |
+|---|---|---|---|
+| matbench_steels | `steels/weights.pt` | 5 | HybridTRIADS V13A · 225K · 5-seed ensemble compiled |
+| matbench_expt_gap | `expt_gap/weights.pt` | 5 | HybridTRIADS V3 · 100K |
+| matbench_expt_ismetal | `is_metal/weights.pt` | 5 | HybridTRIADS · 100K |
+| matbench_glass | `glass/weights.pt` | 5 | HybridTRIADS · 44K |
+| matbench_jdft2d | `jdft2d/weights.pt` | 5 | HybridTRIADS V4 · 75K · 5-seed ensemble compiled |
+| matbench_phonons | `phonons/weights.pt` | 5 | GraphTRIADS V6 · 247K · also needs `phonons/dataset.pt` |
+## Citation
+```bibtex
+@article{tiwari2026triads,
+  author    = {Rudra Tiwari},
+  title     = {TRIADS: Tiny Recursive Information-Attention with Deep Supervision},
+  year      = {2026},
+  url       = {https://github.com/Rtx09x/TRIADS}
+}
+```
+## License
+MIT License — see [GitHub repository](https://github.com/Rtx09x/TRIADS/blob/main/LICENSE).

app.py ADDED Viewed

	@@ -0,0 +1,658 @@

+"""
+TRIADS — Multi-Benchmark Materials Property Prediction
+HuggingFace Gradio App
+Covers all 6 Matbench benchmarks:
+  1. matbench_steels    — Yield Strength (MPa)
+  2. matbench_expt_gap  — Band Gap (eV)
+  3. matbench_ismetal   — Metallicity (ROC-AUC)
+  4. matbench_glass     — Glass Forming Ability
+  5. matbench_jdft2d    — Exfoliation Energy (meV/atom)
+  6. matbench_phonons   — Peak Phonon Frequency (cm⁻¹)
+"""
+import os
+import warnings
+import urllib.request
+import json
+warnings.filterwarnings("ignore")
+import numpy as np
+import torch
+import torch.nn as nn
+import gradio as gr
+from huggingface_hub import hf_hub_download
+# ─────────────────────────────────────────────────────────────────
+# CONFIG
+# ─────────────────────────────────────────────────────────────────
+REPO_ID = "Rtx09/TRIADS"
+BENCHMARK_INFO = {
+    "steels": {
+        "title": "🔩 Steel Yield Strength",
+        "description": "Predict yield strength (MPa) of steel alloys from composition.",
+        "unit": "MPa",
+        "example": "Fe0.7Cr0.15Ni0.15",
+        "examples": ["Fe0.7Cr0.15Ni0.15", "Fe0.8C0.02Mn0.1Si0.05Cr0.03", "Fe0.6Ni0.25Mo0.1Cr0.05"],
+        "task": "regression",
+        "result": "91.20 ± 12.23 MPa MAE (5-fold, 5-seed ensemble)",
+    },
+    "expt_gap": {
+        "title": "⚡ Experimental Band Gap",
+        "description": "Predict experimental electronic band gap (eV) from composition.",
+        "unit": "eV",
+        "example": "TiO2",
+        "examples": ["TiO2", "GaN", "ZnO", "Si", "CdS"],
+        "task": "regression",
+        "result": "0.3068 ± 0.0082 eV MAE (5-fold, composition-only)",
+    },
+    "ismetal": {
+        "title": "🔮 Metallicity Classifier",
+        "description": "Predict whether a material is metallic or non-metallic from composition.",
+        "unit": "probability (1 = metal)",
+        "example": "Cu",
+        "examples": ["Cu", "SiO2", "Fe3O4", "BaTiO3", "Al"],
+        "task": "classification",
+        "result": "0.9655 ± 0.0029 ROC-AUC (5-fold, composition-only)",
+    },
+    "glass": {
+        "title": "🪟 Glass Forming Ability",
+        "description": "Predict metallic glass forming ability from alloy composition.",
+        "unit": "probability (1 = glass former)",
+        "example": "Cu46Zr54",
+        "examples": ["Cu46Zr54", "Fe80B20", "Al86Ni7La6Y1", "Pd40Cu30Ni10P20"],
+        "task": "classification",
+        "result": "0.9285 ± 0.0063 ROC-AUC (5-fold, 5-seed ensemble)",
+    },
+    "jdft2d": {
+        "title": "📐 Exfoliation Energy",
+        "description": "Predict exfoliation energy (meV/atom) of 2D materials from structure+composition.",
+        "unit": "meV/atom",
+        "example": "MoS2",
+        "examples": ["MoS2", "WSe2", "BN", "graphene (C)", "MoTe2"],
+        "task": "regression",
+        "result": "35.89 ± 12.40 meV/atom MAE (5-fold, 5-seed ensemble)",
+    },
+    "phonons": {
+        "title": "🎵 Phonon Peak Frequency",
+        "description": "Predict peak phonon frequency (cm⁻¹) from crystal structure.",
+        "unit": "cm⁻¹",
+        "example": "Si (diamond cubic)",
+        "examples": ["Si", "GaAs", "MgO", "BN (wurtzite)", "TiO2 (rutile)"],
+        "task": "regression",
+        "result": "41.91 ± 4.04 cm⁻¹ MAE (5-fold, gate-halt GraphTRIADS)",
+    },
+}
+# ─────────────────────────────────────────────────────────────────
+# MODEL DEFINITIONS (inlined for self-contained app)
+# ─────────────────────────────────────────────────────────────────
+class DeepHybridTRM(nn.Module):
+    """
+    HybridTRIADS — composition-only tasks.
+    Shared across: steels, expt_gap, ismetal, glass, jdft2d.
+    """
+    def __init__(self, n_props=22, stat_dim=6, n_extra=0, mat2vec_dim=200,
+                 d_attn=64, nhead=4, d_hidden=96, ff_dim=150,
+                 dropout=0.2, max_steps=20, **kw):
+        super().__init__()
+        self.max_steps, self.D = max_steps, d_hidden
+        self.n_props, self.stat_dim, self.n_extra = n_props, stat_dim, n_extra
+        self.tok_proj = nn.Sequential(
+            nn.Linear(stat_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        self.m2v_proj = nn.Sequential(
+            nn.Linear(mat2vec_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        self.sa1 = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa1_n = nn.LayerNorm(d_attn)
+        self.sa1_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa1_fn = nn.LayerNorm(d_attn)
+        self.sa2 = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa2_n = nn.LayerNorm(d_attn)
+        self.sa2_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa2_fn = nn.LayerNorm(d_attn)
+        self.ca = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.ca_n = nn.LayerNorm(d_attn)
+        pool_in = d_attn + (n_extra if n_extra > 0 else 0)
+        self.pool = nn.Sequential(
+            nn.Linear(pool_in, d_hidden), nn.LayerNorm(d_hidden), nn.GELU())
+        self.z_up = nn.Sequential(
+            nn.Linear(d_hidden*3, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.y_up = nn.Sequential(
+            nn.Linear(d_hidden*2, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.head = nn.Linear(d_hidden, 1)
+        self._init()
+    def _init(self):
+        for m in self.modules():
+            if isinstance(m, nn.Linear):
+                nn.init.xavier_uniform_(m.weight)
+                if m.bias is not None: nn.init.zeros_(m.bias)
+    def _attention(self, x):
+        B = x.size(0)
+        mg_dim = self.n_props * self.stat_dim
+        if self.n_extra > 0:
+            extra = x[:, mg_dim:mg_dim + self.n_extra]
+            m2v = x[:, mg_dim + self.n_extra:]
+        else:
+            extra, m2v = None, x[:, mg_dim:]
+        tok = self.tok_proj(x[:, :mg_dim].view(B, self.n_props, self.stat_dim))
+        ctx = self.m2v_proj(m2v).unsqueeze(1)
+        tok = self.sa1_n(tok + self.sa1(tok, tok, tok)[0])
+        tok = self.sa1_fn(tok + self.sa1_ff(tok))
+        tok = self.sa2_n(tok + self.sa2(tok, tok, tok)[0])
+        tok = self.sa2_fn(tok + self.sa2_ff(tok))
+        tok = self.ca_n(tok + self.ca(tok, ctx, ctx)[0])
+        pooled = tok.mean(dim=1)
+        if extra is not None:
+            pooled = torch.cat([pooled, extra], dim=-1)
+        return self.pool(pooled)
+    def forward(self, x, deep_supervision=False):
+        B = x.size(0)
+        xp = self._attention(x)
+        z = torch.zeros(B, self.D, device=x.device)
+        y = torch.zeros(B, self.D, device=x.device)
+        step_preds = []
+        for _ in range(self.max_steps):
+            z = z + self.z_up(torch.cat([xp, y, z], -1))
+            y = y + self.y_up(torch.cat([y, z], -1))
+            step_preds.append(self.head(y).squeeze(1))
+        return step_preds if deep_supervision else step_preds[-1]
+# ─────────────────────────────────────────────────────────────────
+# FEATURIZER (composition-only, shared across HybridTRIADS tasks)
+# ─────────────────────────────────────────────────────────────────
+_featurizer_cache = {}
+_mat2vec_cache = {}
+def _get_featurizer():
+    """Lazy-load the ExpandedFeaturizer (downloads Mat2Vec once)."""
+    if "main" in _featurizer_cache:
+        return _featurizer_cache["main"]
+    try:
+        from matminer.featurizers.composition import (
+            ElementProperty, ElementFraction, Stoichiometry,
+            ValenceOrbital, IonProperty, BandCenter
+        )
+        from matminer.featurizers.base import MultipleFeaturizer
+        from gensim.models import Word2Vec
+        from sklearn.preprocessing import StandardScaler
+        GCS = "https://storage.googleapis.com/mat2vec/"
+        M2V_FILES = [
+            "pretrained_embeddings",
+            "pretrained_embeddings.wv.vectors.npy",
+            "pretrained_embeddings.trainables.syn1neg.npy",
+        ]
+        os.makedirs("mat2vec_cache", exist_ok=True)
+        for f in M2V_FILES:
+            p = os.path.join("mat2vec_cache", f)
+            if not os.path.exists(p):
+                urllib.request.urlretrieve(GCS + f, p)
+        ep = ElementProperty.from_preset("magpie")
+        m2v = Word2Vec.load("mat2vec_cache/pretrained_embeddings")
+        emb = {w: m2v.wv[w] for w in m2v.wv.index_to_key}
+        extra = MultipleFeaturizer([ElementFraction(), Stoichiometry(),
+                                     ValenceOrbital(), IonProperty(), BandCenter()])
+        _featurizer_cache["main"] = (ep, m2v, emb, extra)
+        return _featurizer_cache["main"]
+    except Exception as e:
+        return None
+def featurize_composition(formula: str):
+    """Featurize a chemical formula into the TRIADS feature vector."""
+    from pymatgen.core import Composition
+    result = _get_featurizer()
+    if result is None:
+        return None, f"Featurizer not available: {str(e)}"
+    ep, m2v, emb, extra = result
+    try:
+        comp = Composition(formula)
+    except Exception:
+        return None, f"Invalid formula: '{formula}'"
+    try:
+        mg = np.array(ep.featurize(comp), np.float32)
+    except Exception:
+        mg = np.zeros(len(ep.feature_labels()), np.float32)
+    try:
+        ex = np.array(extra.featurize(comp), np.float32)
+        ex = np.nan_to_num(ex, nan=0.0)
+    except Exception:
+        ex = np.zeros(50, np.float32)
+    # Mat2Vec pooled
+    v, t = np.zeros(200, np.float32), 0.0
+    for s, f in comp.get_el_amt_dict().items():
+        if s in emb:
+            v += f * emb[s]
+            t += f
+    m2v_vec = v / max(t, 1e-8)
+    mg = np.nan_to_num(mg, nan=0.0)
+    feat = np.concatenate([mg, ex, m2v_vec])
+    return feat.astype(np.float32), None
+# ─────────────────────────────────────────────────────────────────
+# WEIGHT LOADING (lazy, cached)
+# ─────────────────────────────────────────────────────────────────
+# weights.pt format (one file per benchmark on HuggingFace):
+# {
+#   'folds':  [ {'model_state': OrderedDict, 'test_mae': float}, ... ],  # len == n_folds
+#   'n_extra': int,
+#   'config':  {'d_attn': int, 'd_hidden': int, 'ff_dim': int,
+#               'dropout': float, 'max_steps': int},
+#   'benchmark': str,
+# }
+_fold_models = {}   # benchmark -> list[nn.Module]  (one entry per fold)
+_MODEL_CONFIGS = {
+    # These MUST match the architecture configs baked into the saved weights.pt files.
+    # Values verified by inspecting ckpt['config'] from each weights.pt directly.
+    "steels":   dict(d_attn=64, d_hidden=96,  ff_dim=150, dropout=0.20, max_steps=20),
+    "expt_gap": dict(d_attn=64, d_hidden=96,  ff_dim=150, dropout=0.20, max_steps=20),   # V3 s42 (actual weights)
+    "ismetal":  dict(d_attn=24, d_hidden=48,  ff_dim=72,  dropout=0.20, max_steps=16),   # 100K actual
+    "glass":    dict(d_attn=24, d_hidden=48,  ff_dim=72,  dropout=0.20, max_steps=16),   # actual weights
+    "jdft2d":   dict(d_attn=32, d_hidden=64,  ff_dim=96,  dropout=0.20, max_steps=16),   # V4-75K actual
+}
+_HF_PATHS = {
+    "steels":   "steels/weights.pt",
+    "expt_gap": "expt_gap/weights.pt",
+    "ismetal":  "is_metal/weights.pt",
+    "glass":    "glass/weights.pt",
+    "jdft2d":   "jdft2d/weights.pt",
+    "phonons":  "phonons/weights.pt",
+}
+def _load_benchmark_models(benchmark: str):
+    """
+    Download benchmark/weights.pt once, build one nn.Module per fold,
+    cache the list in _fold_models[benchmark].
+    Returns list[nn.Module] or None on failure.
+    """
+    if benchmark in _fold_models:
+        return _fold_models[benchmark]
+    if benchmark == "phonons":
+        # Phonons needs structure input — no composition-only inference
+        return None
+    try:
+        path = hf_hub_download(repo_id=REPO_ID, filename=_HF_PATHS[benchmark])
+        ckpt = torch.load(path, map_location="cpu", weights_only=False)
+        # Accept both old per-fold dicts and the new compiled format
+        fold_entries = ckpt.get("folds", [ckpt])   # fallback: single-fold legacy
+        n_extra = ckpt.get("n_extra", 0)
+        cfg = {**_MODEL_CONFIGS[benchmark], "n_extra": n_extra}
+        models = []
+        for entry in fold_entries:
+            m = DeepHybridTRM(**cfg)
+            state = entry if isinstance(entry, dict) and "weight" not in str(list(entry.keys())[:1]) \
+                    else entry  # entry is either a state_dict or {'model_state': ..., ...}
+            # Handle both {'model_state': sd} and raw state_dict formats
+            sd = entry.get("model_state", entry) if isinstance(entry, dict) else entry
+            m.load_state_dict(sd)
+            m.eval()
+            models.append(m)
+        _fold_models[benchmark] = models
+        return models
+    except Exception:
+        return None
+def _ensemble_predict(benchmark: str, x: np.ndarray,
+                      is_classification: bool = False):
+    """Run inference through all fold models, return averaged prediction."""
+    models = _load_benchmark_models(benchmark)
+    if not models:
+        return None, "Could not load model weights. Are they uploaded to HuggingFace?"
+    xt = torch.tensor(x[None], dtype=torch.float32)
+    preds = []
+    for m in models:
+        with torch.no_grad():
+            out = m(xt).item()
+            if is_classification:
+                out = torch.sigmoid(torch.tensor(out)).item()
+        preds.append(out)
+    return float(np.mean(preds)), None
+# ─────────────────────────────────────────────────────────────────
+# PREDICTION FUNCTIONS (one per benchmark tab)
+# ─────────────────────────────────────────────────────────────────
+def _status_bar(benchmark_key: str):
+    info = BENCHMARK_INFO[benchmark_key]
+    return (f"📊 **Benchmark result:** {info['result']}\n\n"
+            f"*Weights will be downloaded from HuggingFace on first prediction.*")
+def predict_steels(formula: str):
+    feat, err = featurize_composition(formula)
+    if err:
+        return f"❌ Error: {err}", ""
+    pred, err = _ensemble_predict("steels", feat, is_classification=False)
+    if err:
+        return f"❌ {err}", ""
+    context = (f"**{pred:.1f} MPa** yield strength\n\n"
+               f"> TRIADS benchmark MAE: 91.20 MPa | "
+               f"CrabNet: 107.32 MPa | Darwin: 123.29 MPa")
+    return f"### {pred:.1f} MPa", context
+def predict_expt_gap(formula: str):
+    feat, err = featurize_composition(formula)
+    if err:
+        return f"❌ Error: {err}", ""
+    pred, err = _ensemble_predict("expt_gap", feat, is_classification=False)
+    if err:
+        return f"❌ {err}", ""
+    metal_class = "Likely metallic (Eg ≈ 0)" if pred < 0.3 else (
+                   "Small gap semiconductor" if pred < 1.5 else
+                   "Wide-gap semiconductor/insulator")
+    context = (f"**{pred:.3f} eV** band gap\n\n"
+               f"Classification: {metal_class}\n\n"
+               f"> TRIADS benchmark MAE: 0.3068 eV | Darwin: 0.2865 eV")
+    return f"### {pred:.3f} eV", context
+def predict_ismetal(formula: str):
+    feat, err = featurize_composition(formula)
+    if err:
+        return f"❌ Error: {err}", ""
+    pred, err = _ensemble_predict("ismetal", feat, is_classification=True)
+    if err:
+        return f"❌ {err}", ""
+    label = "🔩 **METALLIC**" if pred > 0.5 else "💎 **NON-METALLIC**"
+    pct = pred * 100 if pred > 0.5 else (1 - pred) * 100
+    confidence = "high" if pct > 80 else "moderate" if pct > 60 else "uncertain"
+    context = (f"{label} (confidence: {confidence}, p={pred:.3f})\n\n"
+               f"> TRIADS benchmark ROC-AUC: 0.9655 (best composition-only model)")
+    return f"### {pred:.3f} probability of being metallic", context
+def predict_glass(formula: str):
+    feat, err = featurize_composition(formula)
+    if err:
+        return f"❌ Error: {err}", ""
+    pred, err = _ensemble_predict("glass", feat, is_classification=True)
+    if err:
+        return f"❌ {err}", ""
+    label = "🪟 **Likely glass-former**" if pred > 0.5 else "❌ **Unlikely glass-former**"
+    context = (f"{label} (p={pred:.3f})\n\n"
+               f"> TRIADS benchmark ROC-AUC: 0.9285 | MODNet: 0.9603")
+    return f"### {pred:.3f} glass-forming probability", context
+def predict_jdft2d(formula: str):
+    feat, err = featurize_composition(formula)
+    if err:
+        return f"❌ Error: {err}", ""
+    pred, err = _ensemble_predict("jdft2d", feat, is_classification=False)
+    if err:
+        return f"❌ {err}", ""
+    ease = "Easy to exfoliate" if pred < 50 else "Moderate" if pred < 150 else "Hard to exfoliate"
+    context = (f"**{pred:.1f} meV/atom** exfoliation energy\n\n"
+               f"Exfoliatability: {ease}\n\n"
+               f"> TRIADS benchmark MAE: 35.89 meV/atom (best no-pretraining)")
+    return f"### {pred:.1f} meV/atom", context
+def predict_phonons_placeholder(formula: str):
+    return ("### ⚠️ Phonons — Structure Required",
+            "GraphTRIADS for phonons requires a crystal structure (CIF file), "
+            "not just a formula. The pretrained weights are available at "
+            "`huggingface.co/Rtx09/TRIADS` under `phonons/`.\n\n"
+            f"> Benchmark MAE: 41.91 cm⁻¹ (gate-halt GraphTRIADS V6, 247K params)")
+# ─────────────────────────────────────────────────────────────────
+# GRADIO INTERFACE
+# ─────────────────────────────────────────────────────────────────
+CSS = """
+.gr-box { border-radius: 12px !important; }
+.tab-nav button { font-weight: 600; font-size: 14px; }
+#result_text { font-size: 1.5rem; font-weight: 700; color: #6366f1; }
+.benchmark-badge {
+    background: #1e293b; color: #94a3b8; border-radius: 8px;
+    padding: 8px 14px; font-family: monospace; font-size: 12px;
+}
+footer { display: none !important; }
+"""
+def build_interface():
+    with gr.Blocks(css=CSS, title="TRIADS — Materials Property Prediction") as demo:
+        gr.Markdown("""
+# ⚡ TRIADS — Materials Property Prediction
+**Tiny Recursive Information-Attention with Deep Supervision**
+Six Matbench benchmarks · Parameter-efficient · Small-data specialist
+Select a benchmark tab below to predict a material property.
+""")
+        with gr.Tabs():
+            # ── TAB 1: STEELS ───────────────────────────────────���─────────
+            with gr.Tab("🔩 Steel Yield"):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### Alloy Yield Strength (MPa)")
+                        gr.Markdown("Input an alloy composition (elemental fractions must sum to 1).")
+                        formula_s = gr.Textbox(
+                            label="Alloy formula",
+                            placeholder="e.g. Fe0.7Cr0.15Ni0.15",
+                            value="Fe0.7Cr0.15Ni0.15"
+                        )
+                        gr.Examples(
+                            examples=["Fe0.7Cr0.15Ni0.15", "Fe0.8C0.02Mn0.1Si0.05Cr0.03",
+                                      "Fe0.6Ni0.25Mo0.1Cr0.05"],
+                            inputs=formula_s
+                        )
+                        btn_s = gr.Button("Predict Yield Strength", variant="primary")
+                    with gr.Column(scale=1):
+                        out_s = gr.Markdown(elem_id="result_text")
+                        ctx_s = gr.Markdown()
+                        gr.Markdown(
+                            "📊 TRIADS V13A · 225K params · 5-seed ensemble · **91.20 MPa MAE**",
+                            elem_classes="benchmark-badge"
+                        )
+                btn_s.click(predict_steels, inputs=formula_s, outputs=[out_s, ctx_s])
+            # ── TAB 2: BAND GAP ───────────────────────────────────────────
+            with gr.Tab("⚡ Band Gap"):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### Experimental Band Gap (eV)")
+                        gr.Markdown("Input a chemical composition formula.")
+                        formula_g = gr.Textbox(
+                            label="Composition",
+                            placeholder="e.g. TiO2",
+                            value="TiO2"
+                        )
+                        gr.Examples(
+                            examples=["TiO2", "GaN", "ZnO", "Si", "CdS", "SrTiO3"],
+                            inputs=formula_g
+                        )
+                        btn_g = gr.Button("Predict Band Gap", variant="primary")
+                    with gr.Column(scale=1):
+                        out_g = gr.Markdown(elem_id="result_text")
+                        ctx_g = gr.Markdown()
+                        gr.Markdown(
+                            "📊 TRIADS V3 · 100K params · **0.3068 eV MAE** (best comp-only)",
+                            elem_classes="benchmark-badge"
+                        )
+                btn_g.click(predict_expt_gap, inputs=formula_g, outputs=[out_g, ctx_g])
+            # ── TAB 3: METALLICITY ────────────────────────────────────────
+            with gr.Tab("🔮 Metallicity"):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### Metal vs. Non-metal Classifier")
+                        gr.Markdown("Predicts electronic metallicity from composition.")
+                        formula_m = gr.Textbox(
+                            label="Composition",
+                            placeholder="e.g. Cu",
+                            value="Cu"
+                        )
+                        gr.Examples(
+                            examples=["Cu", "SiO2", "Fe3O4", "BaTiO3", "Al", "MgO", "NiO"],
+                            inputs=formula_m
+                        )
+                        btn_m = gr.Button("Classify Metallicity", variant="primary")
+                    with gr.Column(scale=1):
+                        out_m = gr.Markdown(elem_id="result_text")
+                        ctx_m = gr.Markdown()
+                        gr.Markdown(
+                            "📊 TRIADS 100K · **0.9655 ROC-AUC** · Best composition-only (beats GPTChem 1B+)",
+                            elem_classes="benchmark-badge"
+                        )
+                btn_m.click(predict_ismetal, inputs=formula_m, outputs=[out_m, ctx_m])
+            # ── TAB 4: GLASS FORMING ──────────────────────────────────────
+            with gr.Tab("🪟 Glass Forming"):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### Metallic Glass Forming Ability")
+                        gr.Markdown("Predicts glass forming probability from alloy composition.")
+                        formula_gf = gr.Textbox(
+                            label="Alloy composition",
+                            placeholder="e.g. Cu46Zr54",
+                            value="Cu46Zr54"
+                        )
+                        gr.Examples(
+                            examples=["Cu46Zr54", "Fe80B20", "Al86Ni7La6Y1", "Pd40Cu30Ni10P20"],
+                            inputs=formula_gf
+                        )
+                        btn_gf = gr.Button("Predict Glass Forming", variant="primary")
+                    with gr.Column(scale=1):
+                        out_gf = gr.Markdown(elem_id="result_text")
+                        ctx_gf = gr.Markdown()
+                        gr.Markdown(
+                            "📊 TRIADS 44K · 5-seed ensemble · **0.9285 ROC-AUC**",
+                            elem_classes="benchmark-badge"
+                        )
+                btn_gf.click(predict_glass, inputs=formula_gf, outputs=[out_gf, ctx_gf])
+            # ── TAB 5: JDFT2D ─────────────────────────────────────────────
+            with gr.Tab("📐 JDFT2D"):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### 2D Material Exfoliation Energy (meV/atom)")
+                        gr.Markdown("Predicts how easily a layered 2D material can be exfoliated.")
+                        formula_j = gr.Textbox(
+                            label="Composition",
+                            placeholder="e.g. MoS2",
+                            value="MoS2"
+                        )
+                        gr.Examples(
+                            examples=["MoS2", "WSe2", "BN", "MoTe2", "WS2"],
+                            inputs=formula_j
+                        )
+                        btn_j = gr.Button("Predict Exfoliation Energy", variant="primary")
+                    with gr.Column(scale=1):
+                        out_j = gr.Markdown(elem_id="result_text")
+                        ctx_j = gr.Markdown()
+                        gr.Markdown(
+                            "📊 TRIADS V4 · 75K params · 5-seed ensemble · **35.89 meV/atom MAE**",
+                            elem_classes="benchmark-badge"
+                        )
+                btn_j.click(predict_jdft2d, inputs=formula_j, outputs=[out_j, ctx_j])
+            # ── TAB 6: PHONONS ────────────────────────────────────────────
+            with gr.Tab("🎵 Phonons"):
+                with gr.Row():
+                    with gr.Column(scale=1):
+                        gr.Markdown("### Peak Phonon Frequency (cm⁻¹)")
+                        gr.Markdown(
+                            "GraphTRIADS V6 predicts phonon peak frequency from crystal structure.\n\n"
+                            "⚠️ **Structure required.** This model requires a full crystal "
+                            "structure (CIF) rather than composition alone. Enter a composition "
+                            "below to get a benchmark context, or see the GitHub repo for full "
+                            "structure-based inference."
+                        )
+                        formula_ph = gr.Textbox(
+                            label="Formula (for context only)",
+                            placeholder="e.g. Si",
+                            value="Si"
+                        )
+                        btn_ph = gr.Button("Show Benchmark Info", variant="primary")
+                    with gr.Column(scale=1):
+                        out_ph = gr.Markdown(elem_id="result_text")
+                        ctx_ph = gr.Markdown()
+                        gr.Markdown(
+                            "📊 GraphTRIADS V6 · 247K params · Gate-halt · **41.91 cm⁻¹ MAE**",
+                            elem_classes="benchmark-badge"
+                        )
+                btn_ph.click(predict_phonons_placeholder, inputs=formula_ph, outputs=[out_ph, ctx_ph])
+        # ── FOOTER ──────────────────────────────────────────────────────
+        gr.Markdown("""
+---
+**TRIADS** · [GitHub](https://github.com/Rtx09x/TRIADS) · MIT License · Rudra Tiwari, 2026
+*All benchmarks use exact Matbench 5-fold CV protocol (random\_state=18012019).
+Predictions are ensemble averages across 5 folds (fold-specific scalers approximated at inference).*
+""")
+    return demo
+if __name__ == "__main__":
+    demo = build_interface()
+    demo.launch(share=False)

model_code/__init__.py ADDED Viewed

	@@ -0,0 +1,9 @@

+# TRIADS model_code package
+# Import the model classes for convenience
+from .steels_model import DeepHybridTRM as SteelsModel
+from .expt_gap_model import DeepHybridTRM as ExptGapModel
+from .classification_model import DeepHybridTRM as ClassificationModel
+from .jdft2d_model import DeepHybridTRM as Jdft2dModel
+__all__ = ["SteelsModel", "ExptGapModel", "ClassificationModel", "Jdft2dModel"]

model_code/classification_model.py ADDED Viewed

	@@ -0,0 +1,734 @@

+"""
++=============================================================+
+|  TRIADS — Classification Benchmarks (Combined)              |
+|                                                             |
+|  1. matbench_expt_is_metal (4,921) — Metal vs Non-metal     |
+|  2. matbench_glass (5,680) — Metallic Glass Forming         |
+|                                                             |
+|  44K model | BCEWithLogitsLoss | ROCAUC | Single Seed     |
+|  Seeds: [42, 123, 456, 789, 1024]                           |
+|  Folds: KFold(5, shuffle=True, random_state=18012019)       |
+|         ^^^ exact matbench v0.1 fold generation ^^^         |
++=============================================================+
+DEPENDENCIES (run before executing):
+    pip install matminer pymatgen gensim tqdm scikit-learn torch
+USAGE:
+    python classification_benchmarks.py      # runs both sequentially
+"""
+import os, copy, json, time, logging, warnings, urllib.request, shutil
+warnings.filterwarnings('ignore')
+import numpy as np
+import pandas as pd
+from tqdm import tqdm
+from sklearn.metrics import roc_auc_score
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.optim.swa_utils import AveragedModel, SWALR, update_bn
+from sklearn.model_selection import KFold
+from sklearn.preprocessing import StandardScaler
+from pymatgen.core import Composition
+from matminer.featurizers.composition import ElementProperty
+from gensim.models import Word2Vec
+logging.basicConfig(level=logging.INFO, format='%(name)s | %(message)s')
+log = logging.getLogger("TRIADS-CLS")
+BATCH_SIZE = 64
+# Single seed first — test before committing to full ensemble
+SEEDS = [42]
+# Uncomment below for 5-seed ensemble after single seed looks good:
+# SEEDS = [42, 123, 456, 789, 1024]
+# ~44K config — smaller to prevent overfitting
+MODEL_CFG = dict(
+    d_attn=24, nhead=4, d_hidden=48, ff_dim=72,
+    dropout=0.20, max_steps=16,
+)
+# Matbench v0.1 exact fold seed — DO NOT CHANGE
+MATBENCH_FOLD_SEED = 18012019
+# ======================================================================
+# FAST TENSOR DATALOADER
+# ======================================================================
+class FastTensorDataLoader:
+    def __init__(self, *tensors, batch_size=64, shuffle=False):
+        assert all(t.shape[0] == tensors[0].shape[0] for t in tensors)
+        self.tensors = tensors
+        self.dataset_len = tensors[0].shape[0]
+        self.batch_size = batch_size
+        self.shuffle = shuffle
+        self.n_batches = (self.dataset_len + batch_size - 1) // batch_size
+    def __iter__(self):
+        if self.shuffle:
+            idx = torch.randperm(self.dataset_len, device=self.tensors[0].device)
+            self.tensors = tuple(t[idx] for t in self.tensors)
+        self.i = 0
+        return self
+    def __next__(self):
+        if self.i >= self.dataset_len:
+            raise StopIteration
+        batch = tuple(t[self.i:self.i + self.batch_size] for t in self.tensors)
+        self.i += self.batch_size
+        return batch
+    def __len__(self):
+        return self.n_batches
+# ======================================================================
+# FEATURIZERS
+# ======================================================================
+_ORBITAL_ENERGIES = {
+    'H':  {'1s': -13.6}, 'He': {'1s': -24.6},
+    'Li': {'2s': -5.4}, 'Be': {'2s': -9.3},
+    'B':  {'2s': -14.0, '2p': -8.3}, 'C':  {'2s': -19.4, '2p': -11.3},
+    'N':  {'2s': -25.6, '2p': -14.5}, 'O':  {'2s': -32.4, '2p': -13.6},
+    'F':  {'2s': -40.2, '2p': -17.4}, 'Ne': {'2s': -48.5, '2p': -21.6},
+    'Na': {'3s': -5.1}, 'Mg': {'3s': -7.6},
+    'Al': {'3s': -11.3, '3p': -6.0}, 'Si': {'3s': -15.0, '3p': -8.2},
+    'P':  {'3s': -18.7, '3p': -10.5}, 'S':  {'3s': -22.7, '3p': -10.4},
+    'Cl': {'3s': -25.3, '3p': -13.0}, 'Ar': {'3s': -29.2, '3p': -15.8},
+    'K':  {'4s': -4.3}, 'Ca': {'4s': -6.1},
+    'Sc': {'4s': -6.6, '3d': -8.0}, 'Ti': {'4s': -6.8, '3d': -8.5},
+    'V':  {'4s': -6.7, '3d': -8.3}, 'Cr': {'4s': -6.8, '3d': -8.7},
+    'Mn': {'4s': -7.4, '3d': -9.5}, 'Fe': {'4s': -7.9, '3d': -10.0},
+    'Co': {'4s': -7.9, '3d': -10.0}, 'Ni': {'4s': -7.6, '3d': -10.0},
+    'Cu': {'4s': -7.7, '3d': -11.7}, 'Zn': {'4s': -9.4, '3d': -17.3},
+    'Ga': {'4s': -12.6, '4p': -6.0}, 'Ge': {'4s': -15.6, '4p': -7.9},
+    'As': {'4s': -18.6, '4p': -9.8}, 'Se': {'4s': -21.1, '4p': -9.8},
+    'Br': {'4s': -24.0, '4p': -11.8}, 'Kr': {'4s': -27.5, '4p': -14.0},
+    'Rb': {'5s': -4.2}, 'Sr': {'5s': -5.7},
+    'Y':  {'5s': -6.5, '4d': -7.4}, 'Zr': {'5s': -6.8, '4d': -8.3},
+    'Nb': {'5s': -6.9, '4d': -8.5}, 'Mo': {'5s': -7.1, '4d': -8.9},
+    'Ru': {'5s': -7.4, '4d': -8.7}, 'Rh': {'5s': -7.5, '4d': -8.8},
+    'Pd': {'4d': -8.3}, 'Ag': {'5s': -7.6, '4d': -12.3},
+    'Cd': {'5s': -9.0, '4d': -16.7}, 'In': {'5s': -12.0, '5p': -5.8},
+    'Sn': {'5s': -14.6, '5p': -7.3}, 'Sb': {'5s': -16.5, '5p': -8.6},
+    'Te': {'5s': -19.0, '5p': -9.0}, 'I':  {'5s': -21.1, '5p': -10.5},
+    'Xe': {'5s': -23.4, '5p': -12.1}, 'Cs': {'6s': -3.9}, 'Ba': {'6s': -5.2},
+    'La': {'6s': -5.6, '5d': -7.5},
+    'Ce': {'6s': -5.5, '5d': -7.3, '4f': -7.0},
+    'Hf': {'6s': -7.0, '5d': -8.1}, 'Ta': {'6s': -7.9, '5d': -9.6},
+    'W':  {'6s': -8.0, '5d': -9.8}, 'Re': {'6s': -7.9, '5d': -9.2},
+    'Os': {'6s': -8.4, '5d': -10.0}, 'Ir': {'6s': -9.1, '5d': -10.7},
+    'Pt': {'6s': -9.0, '5d': -10.5}, 'Au': {'6s': -9.2, '5d': -12.8},
+    'Pb': {'6s': -15.0, '6p': -7.4}, 'Bi': {'6s': -16.7, '6p': -7.3},
+}
+def _compute_homo_lumo_gap(comp):
+    elements = comp.get_el_amt_dict()
+    highest_occ, all_energies = [], []
+    for el, frac in elements.items():
+        if el not in _ORBITAL_ENERGIES:
+            return np.array([0.0, 0.0, 0.0], dtype=np.float32)
+        orbs = _ORBITAL_ENERGIES[el]
+        highest_occ.append((max(orbs.values()), frac))
+        all_energies.extend(orbs.values())
+    if not highest_occ:
+        return np.array([0.0, 0.0, 0.0], dtype=np.float32)
+    homo = sum(e * f for e, f in highest_occ) / sum(f for _, f in highest_occ)
+    above = [e for e in all_energies if e > homo]
+    lumo = min(above) if above else homo + 1.0
+    return np.array([homo, lumo, lumo - homo], dtype=np.float32)
+class _BaseFeaturizer:
+    """Shared Mat2Vec loading and Magpie featurization."""
+    GCS = "https://storage.googleapis.com/mat2vec/"
+    FILES = ["pretrained_embeddings",
+             "pretrained_embeddings.wv.vectors.npy",
+             "pretrained_embeddings.trainables.syn1neg.npy"]
+    def __init__(self, cache="mat2vec_cache"):
+        self.ep_magpie = ElementProperty.from_preset("magpie")
+        self.n_mg = len(self.ep_magpie.feature_labels())
+        self.n_extra = None
+        self.scaler = None
+        os.makedirs(cache, exist_ok=True)
+        for f in self.FILES:
+            p = os.path.join(cache, f)
+            if not os.path.exists(p):
+                log.info(f"  Downloading {f}...")
+                urllib.request.urlretrieve(self.GCS + f, p)
+        self.m2v = Word2Vec.load(os.path.join(cache, "pretrained_embeddings"))
+        self.emb = {w: self.m2v.wv[w] for w in self.m2v.wv.index_to_key}
+    def _pool(self, c):
+        v, t = np.zeros(200, np.float32), 0.0
+        for s, f in c.get_el_amt_dict().items():
+            if s in self.emb: v += f * self.emb[s]; t += f
+        return v / max(t, 1e-8)
+    def featurize_all(self, comps):
+        out = []
+        test_ex = self._featurize_extra(comps[0])
+        self.n_extra = len(test_ex)
+        total = self.n_mg + self.n_extra + 200
+        log.info(f"Features: {self.n_mg} Magpie + "
+                 f"{self.n_extra} Extra + 200 Mat2Vec = {total}d")
+        for c in tqdm(comps, desc="  Featurizing", leave=False):
+            try: mg = np.array(self.ep_magpie.featurize(c), np.float32)
+            except: mg = np.zeros(self.n_mg, np.float32)
+            ex = self._featurize_extra(c)
+            out.append(np.concatenate([
+                np.nan_to_num(mg, nan=0.0),
+                np.nan_to_num(ex, nan=0.0),
+                self._pool(c)
+            ]))
+        return np.array(out)
+    def fit_scaler(self, X): self.scaler = StandardScaler().fit(X)
+    def transform(self, X):
+        if not self.scaler: return X
+        return np.nan_to_num(self.scaler.transform(X), nan=0.0).astype(np.float32)
+class MetallicityFeaturizer(_BaseFeaturizer):
+    """354d — keeps HOMO/LUMO gap + BandCenter (relevant to metallicity)."""
+    def __init__(self, cache="mat2vec_cache"):
+        super().__init__(cache)
+        from matminer.featurizers.composition import (
+            Stoichiometry, ValenceOrbital, IonProperty, BandCenter
+        )
+        from matminer.featurizers.composition.element import TMetalFraction
+        self.extra_featurizers = [
+            ("Stoichiometry",  Stoichiometry()),
+            ("ValenceOrbital", ValenceOrbital()),
+            ("IonProperty",    IonProperty()),
+            ("BandCenter",     BandCenter()),
+            ("TMetalFraction", TMetalFraction()),
+        ]
+        self._extra_sizes = {}
+        for name, ftzr in self.extra_featurizers:
+            try: self._extra_sizes[name] = len(ftzr.feature_labels())
+            except: self._extra_sizes[name] = None
+    def _featurize_extra(self, comp):
+        parts = []
+        for name, ftzr in self.extra_featurizers:
+            try:
+                vals = np.array(ftzr.featurize(comp), np.float32)
+                parts.append(np.nan_to_num(vals, nan=0.0))
+                if self._extra_sizes.get(name) is None:
+                    self._extra_sizes[name] = len(vals)
+            except:
+                sz = self._extra_sizes.get(name, 0) or 1
+                parts.append(np.zeros(sz, np.float32))
+        parts.append(_compute_homo_lumo_gap(comp))
+        return np.concatenate(parts)
+class GlassFeaturizer(_BaseFeaturizer):
+    """~351d — removes BandCenter & HOMO/LUMO (irrelevant to glass forming)."""
+    def __init__(self, cache="mat2vec_cache"):
+        super().__init__(cache)
+        from matminer.featurizers.composition import (
+            Stoichiometry, ValenceOrbital, IonProperty
+        )
+        from matminer.featurizers.composition.element import TMetalFraction
+        self.extra_featurizers = [
+            ("Stoichiometry",  Stoichiometry()),
+            ("ValenceOrbital", ValenceOrbital()),
+            ("IonProperty",    IonProperty()),
+            ("TMetalFraction", TMetalFraction()),
+        ]
+        self._extra_sizes = {}
+        for name, ftzr in self.extra_featurizers:
+            try: self._extra_sizes[name] = len(ftzr.feature_labels())
+            except: self._extra_sizes[name] = None
+    def _featurize_extra(self, comp):
+        parts = []
+        for name, ftzr in self.extra_featurizers:
+            try:
+                vals = np.array(ftzr.featurize(comp), np.float32)
+                parts.append(np.nan_to_num(vals, nan=0.0))
+                if self._extra_sizes.get(name) is None:
+                    self._extra_sizes[name] = len(vals)
+            except:
+                sz = self._extra_sizes.get(name, 0) or 1
+                parts.append(np.zeros(sz, np.float32))
+        return np.concatenate(parts)
+# ======================================================================
+# MODEL — DeepHybridTRM (100K params)
+# ======================================================================
+class DeepHybridTRM(nn.Module):
+    def __init__(self, n_props=22, stat_dim=6, n_extra=0, mat2vec_dim=200,
+                 d_attn=32, nhead=4, d_hidden=64, ff_dim=96,
+                 dropout=0.15, max_steps=16, **kw):
+        super().__init__()
+        self.max_steps, self.D = max_steps, d_hidden
+        self.n_props, self.stat_dim, self.n_extra = n_props, stat_dim, n_extra
+        self.tok_proj = nn.Sequential(
+            nn.Linear(stat_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        self.m2v_proj = nn.Sequential(
+            nn.Linear(mat2vec_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        self.sa1 = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa1_n = nn.LayerNorm(d_attn)
+        self.sa1_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa1_fn = nn.LayerNorm(d_attn)
+        self.sa2 = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa2_n = nn.LayerNorm(d_attn)
+        self.sa2_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa2_fn = nn.LayerNorm(d_attn)
+        self.ca = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.ca_n = nn.LayerNorm(d_attn)
+        pool_in = d_attn + (n_extra if n_extra > 0 else 0)
+        self.pool = nn.Sequential(
+            nn.Linear(pool_in, d_hidden), nn.LayerNorm(d_hidden), nn.GELU())
+        self.z_up = nn.Sequential(
+            nn.Linear(d_hidden*3, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.y_up = nn.Sequential(
+            nn.Linear(d_hidden*2, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.head = nn.Linear(d_hidden, 1)
+        self._init()
+    def _init(self):
+        for m in self.modules():
+            if isinstance(m, nn.Linear):
+                nn.init.xavier_uniform_(m.weight)
+                if m.bias is not None: nn.init.zeros_(m.bias)
+    def _attention(self, x):
+        B = x.size(0)
+        mg_dim = self.n_props * self.stat_dim
+        if self.n_extra > 0:
+            extra = x[:, mg_dim:mg_dim + self.n_extra]
+            m2v = x[:, mg_dim + self.n_extra:]
+        else:
+            extra, m2v = None, x[:, mg_dim:]
+        tok = self.tok_proj(x[:, :mg_dim].view(B, self.n_props, self.stat_dim))
+        ctx = self.m2v_proj(m2v).unsqueeze(1)
+        tok = self.sa1_n(tok + self.sa1(tok, tok, tok)[0])
+        tok = self.sa1_fn(tok + self.sa1_ff(tok))
+        tok = self.sa2_n(tok + self.sa2(tok, tok, tok)[0])
+        tok = self.sa2_fn(tok + self.sa2_ff(tok))
+        tok = self.ca_n(tok + self.ca(tok, ctx, ctx)[0])
+        pooled = tok.mean(dim=1)
+        if extra is not None:
+            pooled = torch.cat([pooled, extra], dim=-1)
+        return self.pool(pooled)
+    def forward(self, x, deep_supervision=False):
+        B = x.size(0)
+        xp = self._attention(x)
+        z = torch.zeros(B, self.D, device=x.device)
+        y = torch.zeros(B, self.D, device=x.device)
+        step_preds = []
+        for s in range(self.max_steps):
+            z = z + self.z_up(torch.cat([xp, y, z], -1))
+            y = y + self.y_up(torch.cat([y, z], -1))
+            step_preds.append(self.head(y).squeeze(1))
+        return step_preds if deep_supervision else step_preds[-1]
+    def count_parameters(self):
+        return sum(p.numel() for p in self.parameters() if p.requires_grad)
+# ======================================================================
+# LOSS + UTILS
+# ======================================================================
+def deep_supervision_loss_bce(step_preds, targets):
+    preds = torch.stack(step_preds)
+    n = preds.shape[0]
+    w = torch.arange(1, n + 1, device=preds.device, dtype=preds.dtype)
+    w = w / w.sum()
+    per_step = torch.stack([
+        F.binary_cross_entropy_with_logits(preds[i], targets, reduction='mean')
+        for i in range(n)
+    ])
+    return (w * per_step).sum()
+def strat_split_cls(targets, val_size=0.15, seed=42):
+    tr, vl = [], []
+    rng = np.random.RandomState(seed)
+    for cls in [0, 1]:
+        m = np.where(targets == cls)[0]
+        if len(m) == 0: continue
+        n = max(1, int(len(m) * val_size))
+        c = rng.choice(m, n, replace=False)
+        vl.extend(c.tolist()); tr.extend(np.setdiff1d(m, c).tolist())
+    return np.array(tr), np.array(vl)
+@torch.inference_mode()
+def predict_proba(model, dl):
+    model.eval()
+    preds = []
+    for bx, _ in dl:
+        preds.append(torch.sigmoid(model(bx)).cpu())
+    return torch.cat(preds)
+# ======================================================================
+# TRAINING
+# ======================================================================
+def train_fold(model, tr_dl, vl_dl, device,
+               epochs=300, swa_start=200, fold=1, seed=42, label="100K"):
+    opt = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
+    sch = torch.optim.lr_scheduler.CosineAnnealingLR(
+        opt, T_max=swa_start, eta_min=1e-4)
+    swa_m = AveragedModel(model)
+    swa_s = SWALR(opt, swa_lr=5e-4)
+    swa_on = False
+    best_v, best_w = float('-inf'), None
+    pbar = tqdm(range(epochs), desc=f"  [{label}|s{seed}] F{fold}/5",
+                leave=False, ncols=120)
+    for ep in pbar:
+        model.train()
+        epoch_loss, n_batches = 0.0, 0
+        for bx, by in tr_dl:
+            sp = model(bx, deep_supervision=True)
+            loss = deep_supervision_loss_bce(sp, by)
+            opt.zero_grad(set_to_none=True)
+            loss.backward()
+            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+            opt.step()
+            epoch_loss += loss.item()
+            n_batches += 1
+        model.eval()
+        vp_list, vt_list = [], []
+        with torch.inference_mode():
+            for bx, by in vl_dl:
+                vp_list.append(torch.sigmoid(model(bx)).cpu())
+                vt_list.append(by.cpu())
+        vp = torch.cat(vp_list).numpy()
+        vt = torch.cat(vt_list).numpy()
+        try: val_auc = roc_auc_score(vt, vp)
+        except: val_auc = 0.5
+        if ep < swa_start:
+            sch.step()
+            if val_auc > best_v:
+                best_v = val_auc
+                best_w = copy.deepcopy(model.state_dict())
+        else:
+            if not swa_on: swa_on = True
+            swa_m.update_parameters(model); swa_s.step()
+        if ep % 10 == 0 or ep == epochs - 1:
+            pbar.set_postfix(Best=f'{best_v:.4f}', Ph='SWA' if swa_on else 'COS',
+                            Loss=f'{epoch_loss/max(n_batches,1):.4f}',
+                            AUC=f'{val_auc:.4f}')
+    if swa_on:
+        update_bn(tr_dl, swa_m, device=device)
+        model.load_state_dict(swa_m.module.state_dict())
+    else:
+        model.load_state_dict(best_w)
+    return best_v, model
+# ======================================================================
+# GENERIC BENCHMARK RUNNER
+# ======================================================================
+def run_classification_benchmark(
+    dataset_name, target_col, featurizer_cls,
+    model_dir, summary_file, baseline_name, baseline_auc,
+    device
+):
+    """Run a full 5-seed ensemble classification benchmark."""
+    t0 = time.time()
+    # ── LOAD ─────────────────────────────────────────────────────────
+    print(f"\n  Loading {dataset_name}...")
+    from matminer.datasets import load_dataset
+    df = load_dataset(dataset_name)
+    targets_all = np.array(df[target_col].astype(float).tolist(), np.float32)
+    # Handle different column names
+    if 'composition' in df.columns:
+        comps_all = [Composition(c) for c in df['composition'].tolist()]
+    elif 'structure' in df.columns:
+        comps_all = [s.composition for s in df['structure'].tolist()]
+    elif 'formula' in df.columns:
+        comps_all = [Composition(str(f)) for f in df['formula'].tolist()]
+    else:
+        raise ValueError(f"Cannot find composition column in {df.columns.tolist()}")
+    n_pos = int(targets_all.sum())
+    n_neg = len(targets_all) - n_pos
+    print(f"  Dataset: {len(comps_all)} samples ({n_pos} positive, {n_neg} negative)")
+    print(f"  Class balance: {n_pos/len(targets_all)*100:.1f}% positive")
+    # ── FEATURIZE (once) ─────────────────────────────────────────────
+    t_feat = time.time()
+    feat = featurizer_cls()
+    X_all = feat.featurize_all(comps_all)
+    n_extra = feat.n_extra
+    print(f"  Features: {X_all.shape} (n_extra={n_extra})")
+    print(f"  Featurization: {time.time()-t_feat:.1f}s")
+    # ── FOLDS — exact matbench v0.1 splits ───────────────────────────
+    kfold = KFold(n_splits=5, shuffle=True, random_state=MATBENCH_FOLD_SEED)
+    folds = list(kfold.split(comps_all))
+    # Verify zero leakage
+    all_test_indices = []
+    for fi, (tv, te) in enumerate(folds):
+        assert len(set(tv) & set(te)) == 0, f"Fold {fi}: train/test overlap!"
+        all_test_indices.extend(te.tolist())
+    assert len(set(all_test_indices)) == len(comps_all), "Not all samples covered!"
+    assert len(all_test_indices) == len(comps_all), "Duplicate test samples!"
+    print(f"  5 folds verified: zero leakage, full coverage, no duplicates ✓\n")
+    # ── MODEL INFO ───────────────────────────────────────────────────
+    model_kw = dict(n_props=22, stat_dim=6, n_extra=n_extra,
+                    mat2vec_dim=200, **MODEL_CFG)
+    test_model = DeepHybridTRM(**model_kw)
+    n_params = test_model.count_parameters()
+    del test_model
+    print(f"  Model: {n_params:,} params (100K config)")
+    # ── TRAIN ALL SEEDS ──────────────────────────────────────────────
+    os.makedirs(model_dir, exist_ok=True)
+    all_seed_aucs = {}
+    all_fold_probs = {}
+    all_fold_targets = {}
+    for seed in SEEDS:
+        print(f"\n  {'─'*3} Seed {seed} {'─'*40}")
+        t_seed = time.time()
+        seed_aucs = {}
+        for fi, (tv_i, te_i) in enumerate(folds):
+            tri, vli = strat_split_cls(targets_all[tv_i], 0.15, seed + fi)
+            feat.fit_scaler(X_all[tv_i][tri])
+            tr_x = torch.tensor(feat.transform(X_all[tv_i][tri]), dtype=torch.float32).to(device)
+            tr_y = torch.tensor(targets_all[tv_i][tri], dtype=torch.float32).to(device)
+            vl_x = torch.tensor(feat.transform(X_all[tv_i][vli]), dtype=torch.float32).to(device)
+            vl_y = torch.tensor(targets_all[tv_i][vli], dtype=torch.float32).to(device)
+            te_x = torch.tensor(feat.transform(X_all[te_i]), dtype=torch.float32).to(device)
+            te_y = torch.tensor(targets_all[te_i], dtype=torch.float32).to(device)
+            tr_dl = FastTensorDataLoader(tr_x, tr_y, batch_size=BATCH_SIZE, shuffle=True)
+            vl_dl = FastTensorDataLoader(vl_x, vl_y, batch_size=BATCH_SIZE, shuffle=False)
+            te_dl = FastTensorDataLoader(te_x, te_y, batch_size=BATCH_SIZE, shuffle=False)
+            torch.manual_seed(seed + fi)
+            np.random.seed(seed + fi)
+            if device.type == 'cuda': torch.cuda.manual_seed(seed + fi)
+            model = DeepHybridTRM(**model_kw).to(device)
+            bv, model = train_fold(model, tr_dl, vl_dl, device,
+                                    epochs=300, swa_start=200,
+                                    fold=fi+1, seed=seed, label="44K")
+            probs = predict_proba(model, te_dl)
+            auc = roc_auc_score(te_y.cpu().numpy(), probs.numpy())
+            seed_aucs[fi] = auc
+            if fi not in all_fold_probs:
+                all_fold_probs[fi] = {}
+                all_fold_targets[fi] = te_y.cpu()
+            all_fold_probs[fi][seed] = probs
+            torch.save({
+                'model_state': model.state_dict(),
+                'test_auc': auc, 'fold': fi+1, 'seed': seed,
+                'n_extra': n_extra,
+            }, f'{model_dir}/{dataset_name}_100K_s{seed}_f{fi+1}.pt')
+            del model, tr_x, tr_y, vl_x, vl_y, te_x, te_y
+            if device.type == 'cuda': torch.cuda.empty_cache()
+        avg_s = np.mean(list(seed_aucs.values()))
+        all_seed_aucs[seed] = seed_aucs
+        dt = time.time() - t_seed
+        print(f"\n  Seed {seed}: avg={avg_s:.4f} | "
+              f"{[f'{seed_aucs[i]:.4f}' for i in range(5)]} ({dt:.0f}s)")
+    # ── ENSEMBLE ─────────────────────────────────────────────────────
+    ens_aucs = {}
+    for fi in range(5):
+        probs_stack = torch.stack([all_fold_probs[fi][s] for s in SEEDS])
+        ens_prob = probs_stack.mean(dim=0)
+        ens_aucs[fi] = roc_auc_score(
+            all_fold_targets[fi].numpy(), ens_prob.numpy())
+    single_avgs = [np.mean(list(all_seed_aucs[s].values())) for s in SEEDS]
+    single_mean = np.mean(single_avgs)
+    single_std = np.std(single_avgs)
+    ens_mean = np.mean(list(ens_aucs.values()))
+    ens_std = np.std(list(ens_aucs.values()))
+    tt = time.time() - t0
+    print(f"""
+{'='*72}
+  FINAL RESULTS — TRIADS on {dataset_name} (ROCAUC)
+{'='*72}
+  Per-seed results:""")
+    for seed in SEEDS:
+        sm = all_seed_aucs[seed]
+        avg_s = np.mean(list(sm.values()))
+        print(f"    Seed {seed:>4}: {avg_s:.4f} | "
+              f"{[f'{sm[i]:.4f}' for i in range(5)]}")
+    print(f"""
+    Single-seed avg: {single_mean:.4f} ± {single_std:.4f}
+    5-Seed Ensemble: {ens_mean:.4f} ± {ens_std:.4f}
+    Per-fold ens:    {[f'{ens_aucs[i]:.4f}' for i in range(5)]}
+  {'Model':<40} {'ROCAUC':>10}
+  {'─'*53}
+  {baseline_name:<40} {baseline_auc:>10}
+  {'TRIADS (44K, 5-seed ens)':<40} {f'{ens_mean:.4f}':>10} ← US
+  {'─'*53}
+  Total time: {tt/60:.1f} min
+  Saved: {model_dir}/
+""")
+    summary = {
+        'dataset': dataset_name,
+        'task': 'classification',
+        'metric': 'ROCAUC',
+        'samples': len(comps_all),
+        'class_balance': f'{n_pos} positive / {n_neg} negative',
+        'model_config': MODEL_CFG,
+        'params': n_params,
+        'seeds': SEEDS,
+        'fold_seed': MATBENCH_FOLD_SEED,
+        'per_seed': {str(s): {str(k): round(v, 4) for k, v in m.items()}
+                     for s, m in all_seed_aucs.items()},
+        'single_seed_avg': round(single_mean, 4),
+        'single_seed_std': round(single_std, 4),
+        'ensemble_aucs': {str(k): round(v, 4) for k, v in ens_aucs.items()},
+        'ensemble_avg': round(ens_mean, 4),
+        'ensemble_std': round(ens_std, 4),
+        'total_time_min': round(tt/60, 1),
+    }
+    with open(summary_file, 'w') as f:
+        json.dump(summary, f, indent=2)
+    print(f"  Saved: {summary_file}")
+    shutil.make_archive(model_dir, 'zip', '.', model_dir)
+    print(f"  Saved: {model_dir}.zip")
+    return ens_mean
+# ======================================================================
+# MAIN — RUN BOTH SEQUENTIALLY
+# ======================================================================
+if __name__ == '__main__':
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    if device.type == 'cuda':
+        gm = torch.cuda.get_device_properties(0).total_memory / 1e9
+        print(f"  GPU: {torch.cuda.get_device_name(0)} ({gm:.1f} GB)")
+        torch.backends.cuda.matmul.allow_tf32 = True
+        torch.backends.cudnn.benchmark = True
+    print(f"""
+  ╔══════════════════════════════════════════════════════════╗
+  ║  TRIADS Classification Benchmarks                       ║
+  ║  44K model | 5-Seed Ensemble | BCEWithLogitsLoss        ║
+  ║  Fold seed: {MATBENCH_FOLD_SEED} (matbench v0.1 standard)        ║
+  ╠══════════════════════════════════════════════════════════╣
+  ║  1. matbench_expt_is_metal (4,921 samples)              ║
+  ║  2. matbench_glass         (5,680 samples)              ║
+  ╚══════════════════════════════════════════════════════════╝
+    """)
+    t_total = time.time()
+    results = {}
+    # ── BENCHMARK 1: expt_is_metal ───────────────────────────────────
+    print("\n" + "█"*72)
+    print("  BENCHMARK 1/2: matbench_expt_is_metal")
+    print("█"*72)
+    auc1 = run_classification_benchmark(
+        dataset_name="matbench_expt_is_metal",
+        target_col="is_metal",
+        featurizer_cls=MetallicityFeaturizer,
+        model_dir="is_metal_models",
+        summary_file="is_metal_summary.json",
+        baseline_name="AMMExpress v2020",
+        baseline_auc="0.9209",
+        device=device,
+    )
+    results['is_metal'] = auc1
+    # ── BENCHMARK 2: glass ───────────────────────────────────────────
+    print("\n" + "█"*72)
+    print("  BENCHMARK 2/2: matbench_glass")
+    print("█"*72)
+    auc2 = run_classification_benchmark(
+        dataset_name="matbench_glass",
+        target_col="gfa",
+        featurizer_cls=GlassFeaturizer,
+        model_dir="glass_models",
+        summary_file="glass_summary.json",
+        baseline_name="MODNet v0.1.12",
+        baseline_auc="0.9603",
+        device=device,
+    )
+    results['glass'] = auc2
+    # ── COMBINED SUMMARY ─────────────────────────────────────────────
+    tt = time.time() - t_total
+    print(f"""
+{'='*72}
+  COMBINED RESULTS — ALL CLASSIFICATION BENCHMARKS
+{'='*72}
+  {'Dataset':<30} {'Baseline':>10} {'TRIADS':>10}
+  {'─'*53}
+  {'matbench_expt_is_metal':<30} {'0.9209':>10} {f'{auc1:.4f}':>10}
+  {'matbench_glass':<30} {'0.9603':>10} {f'{auc2:.4f}':>10}
+  {'─'*53}
+  Grand total time: {tt/60:.1f} min ({tt/3600:.1f} hrs)
+  ALL TRIADS BENCHMARKS:
+  ────��────────────────
+  steels:       91.20 MPa      (#1-2)
+  expt_gap:     0.3068 eV      (#2)
+  jdft2d:       35.89 meV/atom (#3)
+  is_metal:     {auc1:.4f} ROCAUC
+  glass:        {auc2:.4f} ROCAUC
+""")

model_code/expt_gap_model.py ADDED Viewed

	@@ -0,0 +1,579 @@

+"""
++=============================================================+
+|  TRIADS V3 on matbench_expt_gap                             |
+|  2x T4 GPU Parallel Training (auto-fallback to 1 GPU)      |
+|  4 Models: Steps(16,20) x Dropout(0.15,0.20)               |
+|  Proven arch: d_attn=64, d_hidden=96 | batch_size=64       |
+|  FastTensorDataLoader | Clean output                        |
++=============================================================+
+"""
+import os, copy, json, time, logging, warnings, urllib.request
+warnings.filterwarnings('ignore')
+import numpy as np
+import pandas as pd
+import matplotlib
+matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+from tqdm import tqdm
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.optim.swa_utils import AveragedModel, SWALR, update_bn
+from sklearn.model_selection import KFold
+from sklearn.preprocessing import StandardScaler
+from pymatgen.core import Composition
+from matminer.featurizers.composition import ElementProperty
+from gensim.models import Word2Vec
+logging.basicConfig(level=logging.INFO, format='%(name)s | %(message)s')
+log = logging.getLogger("TRIADS-V3")
+SEEDS = [42]
+BATCH_SIZE = 64
+BASELINES = {
+    'Darwin':              0.2865,
+    'Ax/SAASBO CrabNet':   0.3310,
+    'MODNet v0.1.12':      0.3327,
+    'AMMExpress v2020':    0.4161,
+    'CrabNet':             0.4427,
+    'RF-SCM/Magpie':       0.5205,
+    'Dummy':               1.0280,
+}
+V1_BEST = {'EG-A (V1)': 0.3510, 'EG-B (V1)': 0.3616}
+# Use ALL available CPU cores for PyTorch operations
+torch.set_num_threads(4)      # 4 vCPUs on Kaggle
+torch.set_num_interop_threads(2)  # 2 physical cores
+# ======================================================================
+# FAST TENSOR DATALOADER
+# ======================================================================
+class FastTensorDataLoader:
+    """Zero-CPU DataLoader. Entire dataset in GPU VRAM."""
+    def __init__(self, *tensors, batch_size=64, shuffle=False):
+        assert all(t.shape[0] == tensors[0].shape[0] for t in tensors)
+        self.tensors = tensors
+        self.dataset_len = tensors[0].shape[0]
+        self.batch_size = batch_size
+        self.shuffle = shuffle
+        self.n_batches = (self.dataset_len + batch_size - 1) // batch_size
+    def __iter__(self):
+        if self.shuffle:
+            idx = torch.randperm(self.dataset_len, device=self.tensors[0].device)
+            self.tensors = tuple(t[idx] for t in self.tensors)
+        self.i = 0
+        return self
+    def __next__(self):
+        if self.i >= self.dataset_len:
+            raise StopIteration
+        batch = tuple(t[self.i:self.i + self.batch_size] for t in self.tensors)
+        self.i += self.batch_size
+        return batch
+    def __len__(self):
+        return self.n_batches
+# ======================================================================
+# FEATURIZER
+# ======================================================================
+class ExpandedFeaturizer:
+    GCS = "https://storage.googleapis.com/mat2vec/"
+    FILES = ["pretrained_embeddings",
+             "pretrained_embeddings.wv.vectors.npy",
+             "pretrained_embeddings.trainables.syn1neg.npy"]
+    def __init__(self, cache="mat2vec_cache"):
+        from matminer.featurizers.composition import (
+            ElementFraction, Stoichiometry, ValenceOrbital,
+            IonProperty, BandCenter
+        )
+        from matminer.featurizers.base import MultipleFeaturizer
+        self.ep_magpie = ElementProperty.from_preset("magpie")
+        self.n_mg = len(self.ep_magpie.feature_labels())
+        self.extra_feats = MultipleFeaturizer([
+            ElementFraction(), Stoichiometry(), ValenceOrbital(),
+            IonProperty(), BandCenter(),
+        ])
+        self.n_extra = None
+        self.scaler = None
+        os.makedirs(cache, exist_ok=True)
+        for f in self.FILES:
+            p = os.path.join(cache, f)
+            if not os.path.exists(p):
+                log.info(f"  Downloading {f}...")
+                urllib.request.urlretrieve(self.GCS + f, p)
+        self.m2v = Word2Vec.load(os.path.join(cache, "pretrained_embeddings"))
+        self.emb = {w: self.m2v.wv[w] for w in self.m2v.wv.index_to_key}
+    def _pool(self, c):
+        v, t = np.zeros(200, np.float32), 0.0
+        for s, f in c.get_el_amt_dict().items():
+            if s in self.emb: v += f * self.emb[s]; t += f
+        return v / max(t, 1e-8)
+    def featurize_all(self, comps):
+        out = []
+        for c in tqdm(comps, desc="  Featurizing", leave=False):
+            try: mg = np.array(self.ep_magpie.featurize(c), np.float32)
+            except: mg = np.zeros(self.n_mg, np.float32)
+            try: ex = np.array(self.extra_feats.featurize(c), np.float32)
+            except: ex = np.zeros(self.n_extra or 200, np.float32)
+            if self.n_extra is None:
+                self.n_extra = len(ex)
+                log.info(f"Features: {self.n_mg} Magpie + {self.n_extra} Extra + 200 Mat2Vec")
+            out.append(np.concatenate([
+                np.nan_to_num(mg, nan=0.0),
+                np.nan_to_num(ex, nan=0.0),
+                self._pool(c)
+            ]))
+        return np.array(out)
+    def fit_scaler(self, X): self.scaler = StandardScaler().fit(X)
+    def transform(self, X):
+        if not self.scaler: return X
+        return np.nan_to_num(self.scaler.transform(X), nan=0.0).astype(np.float32)
+# ======================================================================
+# MODEL — DeepHybridTRM (V13A proven architecture)
+# ======================================================================
+class DeepHybridTRM(nn.Module):
+    def __init__(self, n_props=22, stat_dim=6, n_extra=0, mat2vec_dim=200,
+                 d_attn=64, nhead=4, d_hidden=96, ff_dim=150,
+                 dropout=0.2, max_steps=20, **kw):
+        super().__init__()
+        self.max_steps, self.D = max_steps, d_hidden
+        self.n_props, self.stat_dim, self.n_extra = n_props, stat_dim, n_extra
+        self.tok_proj = nn.Sequential(
+            nn.Linear(stat_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        self.m2v_proj = nn.Sequential(
+            nn.Linear(mat2vec_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        self.sa1 = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa1_n = nn.LayerNorm(d_attn)
+        self.sa1_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa1_fn = nn.LayerNorm(d_attn)
+        self.sa2 = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa2_n = nn.LayerNorm(d_attn)
+        self.sa2_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa2_fn = nn.LayerNorm(d_attn)
+        self.ca = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.ca_n = nn.LayerNorm(d_attn)
+        pool_in = d_attn + (n_extra if n_extra > 0 else 0)
+        self.pool = nn.Sequential(
+            nn.Linear(pool_in, d_hidden), nn.LayerNorm(d_hidden), nn.GELU())
+        self.z_up = nn.Sequential(
+            nn.Linear(d_hidden*3, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.y_up = nn.Sequential(
+            nn.Linear(d_hidden*2, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.head = nn.Linear(d_hidden, 1)
+        self._init()
+    def _init(self):
+        for m in self.modules():
+            if isinstance(m, nn.Linear):
+                nn.init.xavier_uniform_(m.weight)
+                if m.bias is not None: nn.init.zeros_(m.bias)
+    def _attention(self, x):
+        B = x.size(0)
+        mg_dim = self.n_props * self.stat_dim
+        if self.n_extra > 0:
+            extra = x[:, mg_dim:mg_dim + self.n_extra]
+            m2v = x[:, mg_dim + self.n_extra:]
+        else:
+            extra, m2v = None, x[:, mg_dim:]
+        tok = self.tok_proj(x[:, :mg_dim].view(B, self.n_props, self.stat_dim))
+        ctx = self.m2v_proj(m2v).unsqueeze(1)
+        tok = self.sa1_n(tok + self.sa1(tok, tok, tok)[0])
+        tok = self.sa1_fn(tok + self.sa1_ff(tok))
+        tok = self.sa2_n(tok + self.sa2(tok, tok, tok)[0])
+        tok = self.sa2_fn(tok + self.sa2_ff(tok))
+        tok = self.ca_n(tok + self.ca(tok, ctx, ctx)[0])
+        pooled = tok.mean(dim=1)
+        if extra is not None:
+            pooled = torch.cat([pooled, extra], dim=-1)
+        return self.pool(pooled)
+    def forward(self, x, deep_supervision=False):
+        B = x.size(0)
+        xp = self._attention(x)
+        z = torch.zeros(B, self.D, device=x.device)
+        y = torch.zeros(B, self.D, device=x.device)
+        step_preds = []
+        for s in range(self.max_steps):
+            z = z + self.z_up(torch.cat([xp, y, z], -1))
+            y = y + self.y_up(torch.cat([y, z], -1))
+            step_preds.append(self.head(y).squeeze(1))
+        return step_preds if deep_supervision else step_preds[-1]
+    def count_parameters(self):
+        return sum(p.numel() for p in self.parameters() if p.requires_grad)
+# ======================================================================
+# LOSS + UTILS
+# ======================================================================
+def deep_supervision_loss(step_preds, targets):
+    n = len(step_preds)
+    weights = [(i+1) for i in range(n)]
+    tw = sum(weights)
+    return sum((w/tw) * F.l1_loss(p, targets) for p, w in zip(step_preds, weights))
+def strat_split(targets, val_size=0.15, seed=42):
+    bins = np.percentile(targets, [25, 50, 75])
+    lbl = np.digitize(targets, bins)
+    tr, vl = [], []
+    rng = np.random.RandomState(seed)
+    for b in range(4):
+        m = np.where(lbl == b)[0]
+        if len(m) == 0: continue
+        n = max(1, int(len(m) * val_size))
+        c = rng.choice(m, n, replace=False)
+        vl.extend(c.tolist()); tr.extend(np.setdiff1d(m, c).tolist())
+    return np.array(tr), np.array(vl)
+def predict(model, dl):
+    model.eval(); preds = []
+    with torch.no_grad():
+        for bx, _ in dl:
+            preds.append(model(bx).cpu())
+    return torch.cat(preds)
+# ======================================================================
+# TRAINING — clean, simple, V1-style
+# ======================================================================
+def train_fold(model, tr_dl, vl_dl, device,
+               epochs=300, swa_start=200, fold=1, name="", gpu_tag=""):
+    opt = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
+    sch = torch.optim.lr_scheduler.CosineAnnealingLR(opt, T_max=swa_start, eta_min=1e-4)
+    swa_m = AveragedModel(model)
+    swa_s = SWALR(opt, swa_lr=5e-4)
+    swa_on = False
+    best_v, best_w = float('inf'), copy.deepcopy(model.state_dict())
+    hist = {'train': [], 'val': []}
+    use_amp = (device.type == 'cuda')
+    scaler = torch.amp.GradScaler('cuda', enabled=use_amp)
+    pbar = tqdm(range(epochs), desc=f"  {gpu_tag}[{name}] F{fold}/5",
+                leave=False, ncols=120)
+    for ep in pbar:
+        model.train(); tl = 0.0
+        for bx, by in tr_dl:
+            with torch.amp.autocast('cuda', enabled=use_amp):
+                sp = model(bx, deep_supervision=True)
+                loss = deep_supervision_loss(sp, by)
+            opt.zero_grad(set_to_none=True)
+            scaler.scale(loss).backward()
+            scaler.unscale_(opt)
+            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+            scaler.step(opt)
+            scaler.update()
+            tl += F.l1_loss(sp[-1], by).item() * len(by)
+        tl /= tr_dl.dataset_len
+        model.eval(); vl = 0.0
+        with torch.no_grad():
+            with torch.amp.autocast('cuda', enabled=use_amp):
+                for bx, by in vl_dl:
+                    vl += F.l1_loss(model(bx), by).item() * len(by)
+        vl /= vl_dl.dataset_len
+        hist['train'].append(tl); hist['val'].append(vl)
+        if ep < swa_start:
+            sch.step()
+            if vl < best_v:
+                best_v = vl
+                best_w = copy.deepcopy(model.state_dict())
+        else:
+            if not swa_on: swa_on = True
+            swa_m.update_parameters(model); swa_s.step()
+        pbar.set_postfix(Best=f'{best_v:.4f}', Ph='SWA' if swa_on else 'COS',
+                        Tr=f'{tl:.4f}', Val=f'{vl:.4f}')
+    if swa_on:
+        update_bn(tr_dl, swa_m, device=device)
+        model.load_state_dict(swa_m.module.state_dict())
+    else:
+        model.load_state_dict(best_w)
+    return best_v, model, hist
+# ======================================================================
+# GPU WORKER — trains assigned models on one GPU
+# ======================================================================
+def gpu_worker(gpu_id, config_list, X_all, targets_all, folds, n_extra,
+               result_file):
+    device = torch.device(f'cuda:{gpu_id}')
+    torch.cuda.set_device(gpu_id)
+    tag = f"[GPU{gpu_id}] "
+    print(f"\n  {tag}Started on {torch.cuda.get_device_name(gpu_id)}")
+    print(f"  {tag}Models: {[c[0] for c in config_list]}")
+    feat = ExpandedFeaturizer()
+    results = {}
+    for ci, (cname, model_kw) in enumerate(config_list):
+        print(f"\n  {tag}{'='*50}")
+        print(f"  {tag}[{ci+1}/{len(config_list)}] {cname}")
+        print(f"  {tag}{'='*50}")
+        seed = SEEDS[0]
+        fold_maes = []
+        for fi, (tv_i, te_i) in enumerate(folds):
+            print(f"\n  {tag}-- [{cname}] Fold {fi+1}/5 " + "-"*20)
+            tri, vli = strat_split(targets_all[tv_i], 0.15, seed + fi)
+            feat.fit_scaler(X_all[tv_i][tri])
+            tr_x = torch.tensor(feat.transform(X_all[tv_i][tri]), dtype=torch.float32).to(device)
+            tr_y = torch.tensor(targets_all[tv_i][tri], dtype=torch.float32).to(device)
+            vl_x = torch.tensor(feat.transform(X_all[tv_i][vli]), dtype=torch.float32).to(device)
+            vl_y = torch.tensor(targets_all[tv_i][vli], dtype=torch.float32).to(device)
+            te_x = torch.tensor(feat.transform(X_all[te_i]), dtype=torch.float32).to(device)
+            te_y = torch.tensor(targets_all[te_i], dtype=torch.float32).to(device)
+            tr_dl = FastTensorDataLoader(tr_x, tr_y, batch_size=BATCH_SIZE, shuffle=True)
+            vl_dl = FastTensorDataLoader(vl_x, vl_y, batch_size=BATCH_SIZE, shuffle=False)
+            te_dl = FastTensorDataLoader(te_x, te_y, batch_size=BATCH_SIZE, shuffle=False)
+            torch.manual_seed(seed + fi)
+            np.random.seed(seed + fi)
+            torch.cuda.manual_seed(seed + fi)
+            model = DeepHybridTRM(**model_kw).to(device)
+            if fi == 0:
+                print(f"  {tag}Params: {model.count_parameters():,}")
+            bv, model, hist = train_fold(
+                model, tr_dl, vl_dl, device,
+                epochs=300, swa_start=200, fold=fi+1, name=cname, gpu_tag=tag)
+            pred = predict(model, te_dl)
+            mae = F.l1_loss(pred, te_y.cpu()).item()
+            print(f"  {tag}Fold {fi+1} TEST: {mae:.4f} eV (val best: {bv:.4f})")
+            fold_maes.append(mae)
+            os.makedirs('expt_gap_models_v3', exist_ok=True)
+            torch.save({
+                'model_state': model.state_dict(),
+                'test_mae': mae, 'config': cname, 'seed': seed,
+                'fold': fi+1, 'n_extra': n_extra,
+            }, f'expt_gap_models_v3/{cname}_s{seed}_f{fi+1}.pt')
+            del model, tr_x, tr_y, vl_x, vl_y, te_x, te_y
+            torch.cuda.empty_cache()
+        avg = float(np.mean(fold_maes))
+        std = float(np.std(fold_maes))
+        results[cname] = {'avg': avg, 'std': std, 'folds': fold_maes}
+        print(f"\n  {tag}=== {cname} ===")
+        print(f"  {tag}    5-Fold Avg MAE: {avg:.4f} +/- {std:.4f} eV")
+        print(f"  {tag}    Per-fold: {[f'{m:.4f}' for m in fold_maes]}")
+    with open(result_file, 'w') as f:
+        json.dump(results, f)
+    print(f"\n  {tag}DONE. Saved to {result_file}")
+# ======================================================================
+# MAIN
+# ======================================================================
+def run_benchmark():
+    t0 = time.time()
+    print(f"""
+  +==========================================================+
+  |  TRIADS V3 -- P100 | FastTensorDataLoader                |
+  |  4 Models: Steps(16,20) x Dropout(0.15,0.20)            |
+  |  d_attn=64, d_hidden=96 (proven V1 arch)                |
+  |  batch_size={BATCH_SIZE} | All CPU cores active                  |
+  +==========================================================+
+    """)
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    if device.type == 'cuda':
+        try: gm = torch.cuda.get_device_properties(0).total_memory / 1e9
+        except: gm = 0
+        print(f"  GPU: {torch.cuda.get_device_name(0)} ({gm:.1f} GB)")
+        print(f"  CPU threads: {torch.get_num_threads()} | Interop: {torch.get_num_interop_threads()}")
+        torch.backends.cuda.matmul.allow_tf32 = True
+        torch.backends.cudnn.benchmark = True
+    # ---- LOAD + FEATURIZE ----
+    print("\n  Loading matbench_expt_gap...")
+    from matminer.datasets import load_dataset
+    df = load_dataset("matbench_expt_gap")
+    targets_all = np.array(df['gap expt'].tolist(), np.float32)
+    comps_all = [Composition(c) for c in df['composition'].tolist()]
+    print(f"  Dataset: {len(comps_all)} samples")
+    feat = ExpandedFeaturizer()
+    X_all = feat.featurize_all(comps_all)
+    n_extra = feat.n_extra
+    print(f"  Features: {X_all.shape}")
+    kfold = KFold(n_splits=5, shuffle=True, random_state=18012019)
+    folds = list(kfold.split(comps_all))
+    for fi, (tv, te) in enumerate(folds):
+        assert len(set(tv) & set(te)) == 0
+    print("  5 folds verified: zero leakage")
+    # ---- CONFIGS ----
+    base = dict(n_props=22, stat_dim=6, n_extra=n_extra, mat2vec_dim=200,
+                d_attn=64, nhead=4, d_hidden=96, ff_dim=150)
+    all_configs = [
+        ('V3-S16-D15', {**base, 'max_steps': 16, 'dropout': 0.15}),
+        ('V3-S16-D20', {**base, 'max_steps': 16, 'dropout': 0.20}),
+        ('V3-S20-D15', {**base, 'max_steps': 20, 'dropout': 0.15}),
+        ('V3-S20-D20', {**base, 'max_steps': 20, 'dropout': 0.20}),
+    ]
+    print(f"\n  {'Config':<16} {'Params':>10} {'Steps':>6} {'Drop':>6}")
+    for cn, kw in all_configs:
+        m = DeepHybridTRM(**kw); print(f"  {cn:<16} {m.count_parameters():>10,} {kw['max_steps']:>6} {kw['dropout']:>6.2f}"); del m
+    # ---- TRAIN ----
+    all_results = {}
+    for ci, (cname, model_kw) in enumerate(all_configs):
+        print(f"\n  {'='*60}")
+        print(f"  [{ci+1}/4] {cname}")
+        print(f"  {'='*60}")
+        seed = SEEDS[0]
+        fold_maes = []
+        for fi, (tv_i, te_i) in enumerate(folds):
+            print(f"\n  -- [{cname}] Fold {fi+1}/5 " + "-"*30)
+            tri, vli = strat_split(targets_all[tv_i], 0.15, seed + fi)
+            feat.fit_scaler(X_all[tv_i][tri])
+            tr_x = torch.tensor(feat.transform(X_all[tv_i][tri]), dtype=torch.float32).to(device)
+            tr_y = torch.tensor(targets_all[tv_i][tri], dtype=torch.float32).to(device)
+            vl_x = torch.tensor(feat.transform(X_all[tv_i][vli]), dtype=torch.float32).to(device)
+            vl_y = torch.tensor(targets_all[tv_i][vli], dtype=torch.float32).to(device)
+            te_x = torch.tensor(feat.transform(X_all[te_i]), dtype=torch.float32).to(device)
+            te_y = torch.tensor(targets_all[te_i], dtype=torch.float32).to(device)
+            tr_dl = FastTensorDataLoader(tr_x, tr_y, batch_size=BATCH_SIZE, shuffle=True)
+            vl_dl = FastTensorDataLoader(vl_x, vl_y, batch_size=BATCH_SIZE, shuffle=False)
+            te_dl = FastTensorDataLoader(te_x, te_y, batch_size=BATCH_SIZE, shuffle=False)
+            torch.manual_seed(seed + fi); np.random.seed(seed + fi)
+            if device.type == 'cuda': torch.cuda.manual_seed(seed + fi)
+            model = DeepHybridTRM(**model_kw).to(device)
+            if fi == 0: print(f"  Params: {model.count_parameters():,}")
+            bv, model, hist = train_fold(model, tr_dl, vl_dl, device,
+                epochs=300, swa_start=200, fold=fi+1, name=cname)
+            pred = predict(model, te_dl)
+            mae = F.l1_loss(pred, te_y.cpu()).item()
+            print(f"  Fold {fi+1} TEST: {mae:.4f} eV (val: {bv:.4f})")
+            fold_maes.append(mae)
+            os.makedirs('expt_gap_models_v3', exist_ok=True)
+            torch.save({
+                'model_state': model.state_dict(),
+                'test_mae': mae, 'config': cname, 'seed': seed,
+                'fold': fi+1, 'n_extra': n_extra,
+            }, f'expt_gap_models_v3/{cname}_s{seed}_f{fi+1}.pt')
+            del model, tr_x, tr_y, vl_x, vl_y, te_x, te_y
+            if device.type == 'cuda': torch.cuda.empty_cache()
+        avg = float(np.mean(fold_maes))
+        std = float(np.std(fold_maes))
+        all_results[cname] = {'avg': avg, 'std': std, 'folds': fold_maes}
+        print(f"\n  === {cname}: {avg:.4f} +/- {std:.4f} eV ===")
+    # ======== FINAL RESULTS ========
+    tt = time.time() - t0
+    print(f"\n{'='*72}")
+    print(f"  FINAL LEADERBOARD -- TRIADS V3 (5-Fold Avg MAE, eV)")
+    print(f"{'='*72}")
+    print(f"  {'Model':<20} {'MAE':>10} {'Std':>8}  Notes")
+    print(f"  {'-'*60}")
+    for n, r in sorted(all_results.items(), key=lambda x: x[1]['avg']):
+        tag = (" <-- DARWIN BEATEN!" if r['avg'] < 0.2865 else
+               " <-- Top 3!"        if r['avg'] < 0.3327 else
+               " <-- Beats V1!"     if r['avg'] < 0.3510 else
+               " <-- Beats AMMExp"  if r['avg'] < 0.4161 else "")
+        print(f"  {n:<20} {r['avg']:>10.4f} {r['std']:>8.4f}{tag}")
+    print(f"  {'-'*60}")
+    for vn, vm in sorted(V1_BEST.items(), key=lambda x: x[1]):
+        print(f"  {vn:<20} {vm:>10.4f}  (V1)")
+    for bn, bv in sorted(BASELINES.items(), key=lambda x: x[1]):
+        print(f"  {bn:<20} {bv:>10.4f}")
+    # Per-fold
+    names = sorted(all_results.keys())
+    print(f"\n  PER-FOLD:")
+    hdr = f"  {'Fold':<6}"; [hdr := hdr + f"  {cn:>14}" for cn in names]
+    print(hdr)
+    for fi in range(5):
+        row = f"  F{fi+1:<5}"; [row := row + f"  {all_results[cn]['folds'][fi]:>14.4f}" for cn in names]
+        print(row)
+    print(f"\n  HP GRID:  {'D=0.15':>10} {'D=0.20':>10}")
+    for s in [16, 20]:
+        d15 = all_results.get(f'V3-S{s}-D15', {}).get('avg', 0)
+        d20 = all_results.get(f'V3-S{s}-D20', {}).get('avg', 0)
+        print(f"  S={s:>2}   {d15:>10.4f}  {d20:>10.4f}")
+    print(f"\n  Total: {tt/60:.1f} min")
+    s = {'version': 'EG-V3', 'batch_size': BATCH_SIZE,
+         'total_min': round(tt/60, 1), 'models': all_results,
+         'baselines': BASELINES, 'v1': V1_BEST}
+    with open('expt_gap_summary_v3.json', 'w') as f:
+        json.dump(s, f, indent=2)
+    print("  Saved: expt_gap_summary_v3.json")
+if __name__ == '__main__':
+    run_benchmark()

model_code/jdft2d_model.py ADDED Viewed

	@@ -0,0 +1,589 @@

+"""
++=============================================================+
+|  TRIADS V4 on matbench_jdft2d — 5-Seed Ensemble            |
+|  Exfoliation Energy (meV/atom) — 636 samples                |
+|                                                             |
+|  Structural + Composition features (~361d)                  |
+|  75K model (d_attn=32, d_hidden=64) | dropout=0.20          |
+|  Seeds: [42, 123, 456, 789, 1024]                           |
+|  Target: Kaggle P100 | ~30 min                              |
++=============================================================+
+"""
+import os, copy, json, time, logging, warnings, urllib.request, shutil
+warnings.filterwarnings('ignore')
+import numpy as np
+import pandas as pd
+from tqdm import tqdm
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.optim.swa_utils import AveragedModel, SWALR, update_bn
+from sklearn.model_selection import KFold
+from sklearn.preprocessing import StandardScaler
+from pymatgen.core import Composition
+from pymatgen.symmetry.analyzer import SpacegroupAnalyzer
+from matminer.featurizers.composition import ElementProperty
+from gensim.models import Word2Vec
+logging.basicConfig(level=logging.INFO, format='%(name)s | %(message)s')
+log = logging.getLogger("TRIADS-jdft2d")
+BATCH_SIZE = 64
+SEEDS = [42, 123, 456, 789, 1024]
+# 75K config — best for 636 samples
+MODEL_CFG = dict(
+    d_attn=32, nhead=4, d_hidden=64, ff_dim=96,
+    dropout=0.20, max_steps=16,
+)
+V1_BEST = {'V1 (100K, comp-only)': 45.8045}
+V2_BEST = {'V2 (44K, comp-only)': 46.5889}
+V3_BEST = {'V3 (75K, +struct, single)': 37.0033}
+# ======================================================================
+# FAST TENSOR DATALOADER
+# ======================================================================
+class FastTensorDataLoader:
+    def __init__(self, *tensors, batch_size=64, shuffle=False):
+        assert all(t.shape[0] == tensors[0].shape[0] for t in tensors)
+        self.tensors = tensors
+        self.dataset_len = tensors[0].shape[0]
+        self.batch_size = batch_size
+        self.shuffle = shuffle
+        self.n_batches = (self.dataset_len + batch_size - 1) // batch_size
+    def __iter__(self):
+        if self.shuffle:
+            idx = torch.randperm(self.dataset_len, device=self.tensors[0].device)
+            self.tensors = tuple(t[idx] for t in self.tensors)
+        self.i = 0
+        return self
+    def __next__(self):
+        if self.i >= self.dataset_len:
+            raise StopIteration
+        batch = tuple(t[self.i:self.i + self.batch_size] for t in self.tensors)
+        self.i += self.batch_size
+        return batch
+    def __len__(self):
+        return self.n_batches
+# ======================================================================
+# FEATURIZER — Composition + Structural (~361d)
+# ======================================================================
+def _extract_structural_features(structure):
+    feats = []
+    try:
+        lat = structure.lattice
+        feats.extend([lat.a, lat.b, lat.c, lat.alpha, lat.beta, lat.gamma])
+        feats.append(structure.volume / max(len(structure), 1))
+        feats.append(structure.density)
+        feats.append(float(len(structure)))
+        try:
+            sga = SpacegroupAnalyzer(structure, symprec=0.1)
+            feats.append(float(sga.get_space_group_number()))
+        except:
+            feats.append(0.0)
+        try:
+            total_vol = sum(
+                (4/3) * np.pi * site.specie.atomic_radius**3
+                for site in structure if hasattr(site.specie, 'atomic_radius')
+                and site.specie.atomic_radius is not None
+            )
+            feats.append(total_vol / structure.volume if structure.volume > 0 else 0.0)
+        except:
+            feats.append(0.0)
+    except:
+        feats = [0.0] * 11
+    return np.array(feats, dtype=np.float32)
+class ExfoliationFeaturizer:
+    GCS = "https://storage.googleapis.com/mat2vec/"
+    FILES = ["pretrained_embeddings",
+             "pretrained_embeddings.wv.vectors.npy",
+             "pretrained_embeddings.trainables.syn1neg.npy"]
+    def __init__(self, cache="mat2vec_cache"):
+        from matminer.featurizers.composition import (
+            Stoichiometry, ValenceOrbital, IonProperty
+        )
+        from matminer.featurizers.composition.element import TMetalFraction
+        self.ep_magpie = ElementProperty.from_preset("magpie")
+        self.n_mg = len(self.ep_magpie.feature_labels())
+        self.extra_featurizers = [
+            ("Stoichiometry",  Stoichiometry()),
+            ("ValenceOrbital", ValenceOrbital()),
+            ("IonProperty",    IonProperty()),
+            ("TMetalFraction", TMetalFraction()),
+        ]
+        self._extra_sizes = {}
+        for name, ftzr in self.extra_featurizers:
+            try: self._extra_sizes[name] = len(ftzr.feature_labels())
+            except: self._extra_sizes[name] = None
+        self.n_extra = None
+        self.scaler = None
+        os.makedirs(cache, exist_ok=True)
+        for f in self.FILES:
+            p = os.path.join(cache, f)
+            if not os.path.exists(p):
+                log.info(f"  Downloading {f}...")
+                urllib.request.urlretrieve(self.GCS + f, p)
+        self.m2v = Word2Vec.load(os.path.join(cache, "pretrained_embeddings"))
+        self.emb = {w: self.m2v.wv[w] for w in self.m2v.wv.index_to_key}
+    def _pool(self, c):
+        v, t = np.zeros(200, np.float32), 0.0
+        for s, f in c.get_el_amt_dict().items():
+            if s in self.emb: v += f * self.emb[s]; t += f
+        return v / max(t, 1e-8)
+    def _featurize_extra(self, comp, structure=None):
+        parts = []
+        for name, ftzr in self.extra_featurizers:
+            try:
+                vals = np.array(ftzr.featurize(comp), np.float32)
+                parts.append(np.nan_to_num(vals, nan=0.0))
+                if self._extra_sizes.get(name) is None:
+                    self._extra_sizes[name] = len(vals)
+            except:
+                sz = self._extra_sizes.get(name, 0) or 1
+                parts.append(np.zeros(sz, np.float32))
+        if structure is not None:
+            parts.append(_extract_structural_features(structure))
+        else:
+            parts.append(np.zeros(11, np.float32))
+        return np.concatenate(parts)
+    def featurize_all(self, comps, structures=None):
+        out = []
+        test_struct = structures[0] if structures else None
+        test_ex = self._featurize_extra(comps[0], test_struct)
+        self.n_extra = len(test_ex)
+        total = self.n_mg + self.n_extra + 200
+        comp_extras = sum(self._extra_sizes.get(n, 0) or 0
+                         for n, _ in self.extra_featurizers)
+        log.info(f"Features: {self.n_mg} Magpie + {comp_extras} CompExtra + "
+                 f"11 Structural + 200 Mat2Vec = {total}d")
+        for i, c in enumerate(tqdm(comps, desc="  Featurizing", leave=False)):
+            struct = structures[i] if structures else None
+            try: mg = np.array(self.ep_magpie.featurize(c), np.float32)
+            except: mg = np.zeros(self.n_mg, np.float32)
+            ex = self._featurize_extra(c, struct)
+            out.append(np.concatenate([
+                np.nan_to_num(mg, nan=0.0),
+                np.nan_to_num(ex, nan=0.0),
+                self._pool(c)
+            ]))
+        return np.array(out)
+    def fit_scaler(self, X): self.scaler = StandardScaler().fit(X)
+    def transform(self, X):
+        if not self.scaler: return X
+        return np.nan_to_num(self.scaler.transform(X), nan=0.0).astype(np.float32)
+# ======================================================================
+# MODEL
+# ======================================================================
+class DeepHybridTRM(nn.Module):
+    def __init__(self, n_props=22, stat_dim=6, n_extra=0, mat2vec_dim=200,
+                 d_attn=32, nhead=4, d_hidden=64, ff_dim=96,
+                 dropout=0.15, max_steps=16, **kw):
+        super().__init__()
+        self.max_steps, self.D = max_steps, d_hidden
+        self.n_props, self.stat_dim, self.n_extra = n_props, stat_dim, n_extra
+        self.tok_proj = nn.Sequential(
+            nn.Linear(stat_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        self.m2v_proj = nn.Sequential(
+            nn.Linear(mat2vec_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        self.sa1 = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa1_n = nn.LayerNorm(d_attn)
+        self.sa1_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa1_fn = nn.LayerNorm(d_attn)
+        self.sa2 = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa2_n = nn.LayerNorm(d_attn)
+        self.sa2_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa2_fn = nn.LayerNorm(d_attn)
+        self.ca = nn.MultiheadAttention(d_attn, nhead, dropout=dropout, batch_first=True)
+        self.ca_n = nn.LayerNorm(d_attn)
+        pool_in = d_attn + (n_extra if n_extra > 0 else 0)
+        self.pool = nn.Sequential(
+            nn.Linear(pool_in, d_hidden), nn.LayerNorm(d_hidden), nn.GELU())
+        self.z_up = nn.Sequential(
+            nn.Linear(d_hidden*3, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.y_up = nn.Sequential(
+            nn.Linear(d_hidden*2, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.head = nn.Linear(d_hidden, 1)
+        self._init()
+    def _init(self):
+        for m in self.modules():
+            if isinstance(m, nn.Linear):
+                nn.init.xavier_uniform_(m.weight)
+                if m.bias is not None: nn.init.zeros_(m.bias)
+    def _attention(self, x):
+        B = x.size(0)
+        mg_dim = self.n_props * self.stat_dim
+        if self.n_extra > 0:
+            extra = x[:, mg_dim:mg_dim + self.n_extra]
+            m2v = x[:, mg_dim + self.n_extra:]
+        else:
+            extra, m2v = None, x[:, mg_dim:]
+        tok = self.tok_proj(x[:, :mg_dim].view(B, self.n_props, self.stat_dim))
+        ctx = self.m2v_proj(m2v).unsqueeze(1)
+        tok = self.sa1_n(tok + self.sa1(tok, tok, tok)[0])
+        tok = self.sa1_fn(tok + self.sa1_ff(tok))
+        tok = self.sa2_n(tok + self.sa2(tok, tok, tok)[0])
+        tok = self.sa2_fn(tok + self.sa2_ff(tok))
+        tok = self.ca_n(tok + self.ca(tok, ctx, ctx)[0])
+        pooled = tok.mean(dim=1)
+        if extra is not None:
+            pooled = torch.cat([pooled, extra], dim=-1)
+        return self.pool(pooled)
+    def forward(self, x, deep_supervision=False):
+        B = x.size(0)
+        xp = self._attention(x)
+        z = torch.zeros(B, self.D, device=x.device)
+        y = torch.zeros(B, self.D, device=x.device)
+        step_preds = []
+        for s in range(self.max_steps):
+            z = z + self.z_up(torch.cat([xp, y, z], -1))
+            y = y + self.y_up(torch.cat([y, z], -1))
+            step_preds.append(self.head(y).squeeze(1))
+        return step_preds if deep_supervision else step_preds[-1]
+    def count_parameters(self):
+        return sum(p.numel() for p in self.parameters() if p.requires_grad)
+# ======================================================================
+# LOSS + UTILS
+# ======================================================================
+def deep_supervision_loss(step_preds, targets):
+    preds = torch.stack(step_preds)
+    n = preds.shape[0]
+    w = torch.arange(1, n + 1, device=preds.device, dtype=preds.dtype)
+    w = w / w.sum()
+    per_step = (preds - targets.unsqueeze(0)).abs().mean(dim=1)
+    return (w * per_step).sum()
+def strat_split(targets, val_size=0.15, seed=42):
+    bins = np.percentile(targets, [25, 50, 75])
+    lbl = np.digitize(targets, bins)
+    tr, vl = [], []
+    rng = np.random.RandomState(seed)
+    for b in range(4):
+        m = np.where(lbl == b)[0]
+        if len(m) == 0: continue
+        n = max(1, int(len(m) * val_size))
+        c = rng.choice(m, n, replace=False)
+        vl.extend(c.tolist()); tr.extend(np.setdiff1d(m, c).tolist())
+    return np.array(tr), np.array(vl)
+@torch.inference_mode()
+def predict(model, dl):
+    model.eval()
+    preds = []
+    for bx, _ in dl:
+        preds.append(model(bx).cpu())
+    return torch.cat(preds)
+# ======================================================================
+# TRAINING
+# ======================================================================
+def train_fold(model, tr_dl, vl_dl, device,
+               epochs=300, swa_start=200, fold=1, seed=42):
+    opt = torch.optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
+    sch = torch.optim.lr_scheduler.CosineAnnealingLR(
+        opt, T_max=swa_start, eta_min=1e-4)
+    swa_m = AveragedModel(model)
+    swa_s = SWALR(opt, swa_lr=5e-4)
+    swa_on = False
+    best_v, best_w = float('inf'), None
+    pbar = tqdm(range(epochs), desc=f"  [75K|s{seed}] F{fold}/5",
+                leave=False, ncols=120)
+    for ep in pbar:
+        model.train()
+        epoch_loss = torch.tensor(0.0, device=device)
+        n_samples = 0
+        for bx, by in tr_dl:
+            sp = model(bx, deep_supervision=True)
+            loss = deep_supervision_loss(sp, by)
+            opt.zero_grad(set_to_none=True)
+            loss.backward()
+            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+            opt.step()
+            with torch.no_grad():
+                epoch_loss += (sp[-1] - by).abs().sum()
+            n_samples += len(by)
+        model.eval()
+        val_loss = torch.tensor(0.0, device=device)
+        val_n = 0
+        with torch.inference_mode():
+            for bx, by in vl_dl:
+                val_loss += (model(bx) - by).abs().sum()
+                val_n += len(by)
+        tl = epoch_loss.item() / n_samples
+        vl = val_loss.item() / val_n
+        if ep < swa_start:
+            sch.step()
+            if vl < best_v:
+                best_v = vl
+                best_w = copy.deepcopy(model.state_dict())
+        else:
+            if not swa_on: swa_on = True
+            swa_m.update_parameters(model); swa_s.step()
+        if ep % 10 == 0 or ep == epochs - 1:
+            pbar.set_postfix(Best=f'{best_v:.2f}', Ph='SWA' if swa_on else 'COS',
+                            Tr=f'{tl:.2f}', Val=f'{vl:.2f}')
+    if swa_on:
+        update_bn(tr_dl, swa_m, device=device)
+        model.load_state_dict(swa_m.module.state_dict())
+    else:
+        model.load_state_dict(best_w)
+    return best_v, model
+# ======================================================================
+# MAIN — 5-SEED ENSEMBLE
+# ======================================================================
+def run_benchmark():
+    t0 = time.time()
+    print(f"""
+  +==========================================================+
+  |  TRIADS V4 — matbench_jdft2d (5-Seed Ensemble)          |
+  |  Structural + Composition features (~361d)               |
+  |  75K model | dropout=0.20                                |
+  |  Seeds: {SEEDS}                       |
+  +==========================================================+
+    """)
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    if device.type == 'cuda':
+        gm = torch.cuda.get_device_properties(0).total_memory / 1e9
+        print(f"  GPU: {torch.cuda.get_device_name(0)} ({gm:.1f} GB)")
+        torch.backends.cuda.matmul.allow_tf32 = True
+        torch.backends.cudnn.benchmark = True
+    # ── LOAD DATASET ──────────────────────────────────────────────────
+    print("\n  Loading matbench_jdft2d...")
+    from matminer.datasets import load_dataset
+    df = load_dataset("matbench_jdft2d")
+    targets_all = np.array(df['exfoliation_en'].tolist(), np.float32)
+    structures_all = df['structure'].tolist()
+    comps_all = [s.composition for s in structures_all]
+    print(f"  Dataset: {len(comps_all)} samples")
+    # ── FEATURIZE (once) ─────────────────────────────────────────────
+    t_feat = time.time()
+    feat = ExfoliationFeaturizer()
+    X_all = feat.featurize_all(comps_all, structures_all)
+    n_extra = feat.n_extra
+    print(f"  Features: {X_all.shape} (n_extra={n_extra})")
+    print(f"  Featurization: {time.time()-t_feat:.1f}s")
+    # ── FOLDS ────────────────────────────────────────────────────────
+    kfold = KFold(n_splits=5, shuffle=True, random_state=18012019)
+    folds = list(kfold.split(comps_all))
+    for fi, (tv, te) in enumerate(folds):
+        assert len(set(tv) & set(te)) == 0
+    print("  5 folds verified: zero leakage\n")
+    # ── MODEL INFO ───────────────────────────────────────────────────
+    model_kw = dict(n_props=22, stat_dim=6, n_extra=n_extra,
+                    mat2vec_dim=200, **MODEL_CFG)
+    test_model = DeepHybridTRM(**model_kw)
+    n_params = test_model.count_parameters()
+    del test_model
+    print(f"  Model: {n_params:,} params")
+    print(f"  Config: d_attn={MODEL_CFG['d_attn']}, d_hidden={MODEL_CFG['d_hidden']}, "
+          f"ff_dim={MODEL_CFG['ff_dim']}, dropout={MODEL_CFG['dropout']}\n")
+    # ── TRAIN ALL SEEDS ──────────────────────────────────────────────
+    model_dir = 'jdft2d_models_v4'
+    os.makedirs(model_dir, exist_ok=True)
+    # Store predictions and MAEs per seed
+    all_seed_maes = {}        # {seed: {fold: mae}}
+    all_fold_preds = {}       # {fold: {seed: predictions}}
+    all_fold_targets = {}     # {fold: targets}
+    for seed in SEEDS:
+        print(f"\n  {'─'*3} Seed {seed} {'─'*40}")
+        t_seed = time.time()
+        seed_maes = {}
+        for fi, (tv_i, te_i) in enumerate(folds):
+            tri, vli = strat_split(targets_all[tv_i], 0.15, seed + fi)
+            feat.fit_scaler(X_all[tv_i][tri])
+            tr_x = torch.tensor(feat.transform(X_all[tv_i][tri]), dtype=torch.float32).to(device)
+            tr_y = torch.tensor(targets_all[tv_i][tri], dtype=torch.float32).to(device)
+            vl_x = torch.tensor(feat.transform(X_all[tv_i][vli]), dtype=torch.float32).to(device)
+            vl_y = torch.tensor(targets_all[tv_i][vli], dtype=torch.float32).to(device)
+            te_x = torch.tensor(feat.transform(X_all[te_i]), dtype=torch.float32).to(device)
+            te_y = torch.tensor(targets_all[te_i], dtype=torch.float32).to(device)
+            tr_dl = FastTensorDataLoader(tr_x, tr_y, batch_size=BATCH_SIZE, shuffle=True)
+            vl_dl = FastTensorDataLoader(vl_x, vl_y, batch_size=BATCH_SIZE, shuffle=False)
+            te_dl = FastTensorDataLoader(te_x, te_y, batch_size=BATCH_SIZE, shuffle=False)
+            torch.manual_seed(seed + fi)
+            np.random.seed(seed + fi)
+            if device.type == 'cuda': torch.cuda.manual_seed(seed + fi)
+            model = DeepHybridTRM(**model_kw).to(device)
+            bv, model = train_fold(model, tr_dl, vl_dl, device,
+                                    epochs=300, swa_start=200,
+                                    fold=fi+1, seed=seed)
+            pred = predict(model, te_dl)
+            mae = F.l1_loss(pred, te_y.cpu()).item()
+            seed_maes[fi] = mae
+            # Store for ensemble
+            if fi not in all_fold_preds:
+                all_fold_preds[fi] = {}
+                all_fold_targets[fi] = te_y.cpu()
+            all_fold_preds[fi][seed] = pred
+            torch.save({
+                'model_state': model.state_dict(),
+                'test_mae': mae, 'fold': fi+1, 'seed': seed,
+                'n_extra': n_extra,
+            }, f'{model_dir}/jdft2d_75K_s{seed}_f{fi+1}.pt')
+            del model, tr_x, tr_y, vl_x, vl_y, te_x, te_y
+            if device.type == 'cuda': torch.cuda.empty_cache()
+        avg_s = np.mean(list(seed_maes.values()))
+        all_seed_maes[seed] = seed_maes
+        dt = time.time() - t_seed
+        print(f"\n  Seed {seed}: avg={avg_s:.4f} | "
+              f"{[f'{seed_maes[i]:.4f}' for i in range(5)]} ({dt:.0f}s)")
+    # ── ENSEMBLE ─────────────────────────────────────────────────────
+    ens_maes = {}
+    for fi in range(5):
+        preds_stack = torch.stack([all_fold_preds[fi][s] for s in SEEDS])
+        ens_pred = preds_stack.mean(dim=0)
+        ens_maes[fi] = F.l1_loss(ens_pred, all_fold_targets[fi]).item()
+    single_avgs = [np.mean(list(all_seed_maes[s].values())) for s in SEEDS]
+    single_mean = np.mean(single_avgs)
+    single_std = np.std(single_avgs)
+    ens_mean = np.mean(list(ens_maes.values()))
+    ens_std = np.std(list(ens_maes.values()))
+    ens_drop = (1 - ens_mean / single_mean) * 100
+    # ── RESULTS ──────────────────────────────────────────────────────
+    tt = time.time() - t0
+    print(f"""
+{'='*72}
+  FINAL RESULTS — TRIADS V4 on matbench_jdft2d
+{'='*72}
+  Per-seed results:""")
+    for seed in SEEDS:
+        sm = all_seed_maes[seed]
+        avg_s = np.mean(list(sm.values()))
+        print(f"    Seed {seed:>4}: {avg_s:.4f} | "
+              f"{[f'{sm[i]:.4f}' for i in range(5)]}")
+    print(f"""
+    Single-seed avg: {single_mean:.4f} ± {single_std:.4f}
+    5-Seed Ensemble: {ens_mean:.4f} ± {ens_std:.4f} (↓{ens_drop:.1f}% from single)
+    Per-fold ens:    {[f'{ens_maes[i]:.4f}' for i in range(5)]}
+  {'Model':<40} {'MAE(meV/atom)':>15}
+  {'─'*58}
+  {'MODNet v0.1.12':<40} {'33.1918':>15}
+  {'TRIADS V3 (75K, +struct, single)':<40} {'37.0033':>15}
+  {'TRIADS V4 (75K, +struct, 5-seed ens)':<40} {f'{ens_mean:.4f}':>15} ← NEW
+  {'TRIADS V1 (100K, comp-only)':<40} {'45.8045':>15}
+  {'─'*58}
+  Total time: {tt/60:.1f} min
+  Saved: {model_dir}/
+""")
+    # ── SAVE ─────────────────────────────────────────────────────────
+    summary = {
+        'version': 'jdft2d-V4-ensemble',
+        'dataset': 'matbench_jdft2d',
+        'samples': len(comps_all),
+        'target_unit': 'meV/atom',
+        'model_config': MODEL_CFG,
+        'params': n_params,
+        'seeds': SEEDS,
+        'per_seed': {str(s): {str(k): round(v, 4) for k, v in m.items()}
+                     for s, m in all_seed_maes.items()},
+        'single_seed_avg': round(single_mean, 4),
+        'single_seed_std': round(single_std, 4),
+        'ensemble_maes': {str(k): round(v, 4) for k, v in ens_maes.items()},
+        'ensemble_avg': round(ens_mean, 4),
+        'ensemble_std': round(ens_std, 4),
+        'ensemble_improvement': f'{ens_drop:.1f}%',
+        'total_time_min': round(tt/60, 1),
+    }
+    with open('jdft2d_summary_v4.json', 'w') as f:
+        json.dump(summary, f, indent=2)
+    print("  Saved: jdft2d_summary_v4.json")
+    # Zip models
+    shutil.make_archive(model_dir, 'zip', '.', model_dir)
+    print(f"  Saved: {model_dir}.zip (download this!)")
+if __name__ == '__main__':
+    run_benchmark()

model_code/phonons_dataset_builder.py ADDED Viewed

	@@ -0,0 +1,749 @@

+"""
++=============================================================+
+|  V6 Physics-Featurized Phonon Dataset Builder                |
+|  Architecture-Agnostic | Rich Physics | 3-Order Graphs       |
+|                                                              |
+|  Features per atom:  18d (element physics + coords + local)  |
+|  Features per bond:   8d physics + 40d RBF + 3d direction    |
+|  Order 2 (angles):    8d angle RBF                           |
+|  Order 3 (dihedrals): 8d dihedral RBF                        |
+|  Composition:         MAGPIE + mat2vec + matminer extras     |
+|  Global physics:      Debye temp, force constants, etc.      |
+|                                                              |
+|  ⚠ NO SCALING — raw features. Scale at training time only.  |
++=============================================================+
+DEPENDENCIES:
+    pip install matminer pymatgen gensim tqdm scikit-learn torch numpy
+USAGE:
+    python build_phonons_v6_dataset.py
+    -> Outputs: phonons_v6_dataset.pt
+"""
+import os, time, math, warnings, urllib.request, logging
+from collections import defaultdict
+warnings.filterwarnings('ignore')
+import numpy as np
+import torch
+from tqdm import tqdm
+from sklearn.model_selection import KFold
+logging.basicConfig(level=logging.INFO, format='%(name)s | %(message)s')
+log = logging.getLogger("V6-BUILD")
+# ═══════════════════════════════════════════════════════════════
+# CONFIGURATION
+# ═══════════════════════════════════════════════════════════════
+CUTOFF          = 8.0
+MAX_NEIGHBORS   = 12
+N_RBF_DIST      = 40
+N_RBF_ANGLE     = 8
+N_RBF_DIHEDRAL  = 8
+MAX_QUADS       = 50000     # cap dihedrals per crystal for memory
+FOLD_SEED       = 18012019  # matbench v0.1 protocol
+N_FOLDS         = 5
+N_ELEM_FEAT     = 12        # from lookup table
+N_ATOM_COMPUTED = 6         # frac_coords(3) + coord_num(1) + avg_nn(1) + std_nn(1)
+N_ATOM_FEAT     = N_ELEM_FEAT + N_ATOM_COMPUTED  # 18
+N_BOND_PHYSICS  = 8
+N_GLOBAL_PHYS   = 15
+# ═══════════════════════════════════════════════════════════════
+# GAUSSIAN RADIAL BASIS FUNCTIONS
+# ═══════════════════════════════════════════════════════════════
+def gaussian_rbf(values, n_bins, vmin, vmax):
+    """Fixed Gaussian expansion. No learnable parameters."""
+    centers = torch.linspace(vmin, vmax, n_bins)
+    gamma = 1.0 / ((vmax - vmin) / n_bins) ** 2
+    return torch.exp(-gamma * (values.unsqueeze(-1) - centers.unsqueeze(0)) ** 2)
+# ═══════════════════════════════════════════════════════════════
+# ELEMENT PHYSICS LOOKUP TABLE
+# ═══════════════════════════════════════════════════════════════
+def build_element_table():
+    """
+    Build [103, 12] lookup table of per-element physical properties.
+    Z=0 is padding. Uses pymatgen Element data.
+    Columns: mass, 1/sqrt(mass), electronegativity, atomic_radius,
+             covalent_radius, ionization_energy, electron_affinity,
+             valence_electrons, group, period, block, is_metal
+    """
+    from pymatgen.core.periodic_table import Element
+    block_map = {'s': 0., 'p': 1., 'd': 2., 'f': 3.}
+    table = torch.zeros(103, N_ELEM_FEAT)
+    for z in range(1, 103):
+        try:
+            el = Element.from_Z(z)
+            mass = float(el.atomic_mass) if el.atomic_mass else 1.0
+            chi = float(el.X) if el.X is not None else 0.0
+            ar = float(el.atomic_radius) if el.atomic_radius is not None else 1.5
+            # Covalent radius proxy
+            try:
+                cr = float(el.average_ionic_radius) if el.average_ionic_radius and float(el.average_ionic_radius) > 0 else ar
+            except:
+                cr = ar
+            # First ionization energy
+            ie = 0.0
+            try:
+                ies = el.ionization_energies
+                if isinstance(ies, dict) and 1 in ies and ies[1] is not None:
+                    ie = float(ies[1])
+                elif isinstance(ies, (list, tuple)) and len(ies) > 1 and ies[1] is not None:
+                    ie = float(ies[1])
+            except:
+                pass
+            # Electron affinity
+            ea = 0.0
+            try:
+                if el.electron_affinity is not None:
+                    ea = float(el.electron_affinity)
+            except:
+                pass
+            # Group, period, valence electrons
+            g = int(el.group) if el.group is not None else 0
+            p = int(el.row) if el.row is not None else 0
+            ve = g if g <= 2 else (g - 10 if g >= 13 else 2)
+            bl = block_map.get(el.block, 0.) if hasattr(el, 'block') and el.block else 0.
+            im = 1.0 if el.is_metal else 0.0
+            table[z] = torch.tensor([
+                mass, 1.0 / math.sqrt(max(mass, 0.01)), chi, ar, cr,
+                ie, ea, float(ve), float(g), float(p), bl, im
+            ])
+        except:
+            table[z] = torch.tensor([1., 1., 0., 1.5, 1.5, 0., 0., 0., 0., 0., 0., 0.])
+    return table
+# ═══════════════════════════════════════════════════════════════
+# CRYSTAL GRAPH BUILDER (Orders 1, 2, 3)
+# ═══════════════════════════════════════════════════════════════
+def _empty_graph(atom_z, atom_features, n_atoms):
+    """Fallback for crystals with no neighbors found."""
+    return {
+        'atom_z': atom_z,
+        'atom_features': atom_features,
+        'n_atoms': n_atoms,
+        'edge_index': torch.zeros(2, 1, dtype=torch.long),
+        'edge_dist': torch.zeros(1),
+        'edge_rbf': torch.zeros(1, N_RBF_DIST),
+        'edge_vec': torch.zeros(1, 3),
+        'edge_physics': torch.zeros(1, N_BOND_PHYSICS),
+        'n_edges': 1,
+        'triplet_index': torch.zeros(2, 0, dtype=torch.long),
+        'angle_rbf': torch.zeros(0, N_RBF_ANGLE),
+        'n_triplets': 0,
+        'quad_index': torch.zeros(2, 0, dtype=torch.long),
+        'dihedral_rbf': torch.zeros(0, N_RBF_DIHEDRAL),
+        'n_quads': 0,
+    }
+def build_crystal_graph(structure, elem_table):
+    """
+    Build a complete 3-order crystal graph for a single structure.
+    Returns dict with atom features, edge features + physics,
+    triplets (angles), and quads (dihedrals).
+    ✅ ZERO DATA LEAKAGE: uses ONLY this structure's geometry.
+    """
+    n_atoms = len(structure)
+    atom_z = torch.tensor([site.specie.Z for site in structure], dtype=torch.long)
+    # Element lookup features [N, 12]
+    atom_elem_feat = elem_table[atom_z.clamp(0, 102)]
+    # Fractional coordinates [N, 3]
+    frac_coords = torch.tensor(
+        [site.frac_coords for site in structure], dtype=torch.float32
+    )
+    # ── NEIGHBOR FINDING ──────────────────────────────────────
+    src_list, dst_list, dist_list, vec_list = [], [], [], []
+    nn_dists_per_atom = defaultdict(list)
+    try:
+        all_nbrs = structure.get_all_neighbors(CUTOFF)
+        for i, nbrs in enumerate(all_nbrs):
+            nbrs_sorted = sorted(nbrs, key=lambda x: x.nn_distance)[:MAX_NEIGHBORS]
+            for nbr in nbrs_sorted:
+                src_list.append(i)
+                dst_list.append(nbr.index)
+                dist_list.append(nbr.nn_distance)
+                vec_list.append(nbr.coords - structure[i].coords)
+                nn_dists_per_atom[i].append(nbr.nn_distance)
+    except Exception as e:
+        log.warning(f"  Neighbor finding failed: {e}")
+    # Per-atom coordination stats
+    coord_nums = torch.zeros(n_atoms)
+    avg_nn_dists = torch.zeros(n_atoms)
+    std_nn_dists = torch.zeros(n_atoms)
+    for i in range(n_atoms):
+        ds = nn_dists_per_atom.get(i, [])
+        coord_nums[i] = len(ds)
+        if ds:
+            avg_nn_dists[i] = np.mean(ds)
+            std_nn_dists[i] = np.std(ds) if len(ds) > 1 else 0.0
+    # Combined atom features [N, 18]
+    atom_features = torch.cat([
+        atom_elem_feat,                                          # [N, 12]
+        frac_coords,                                             # [N, 3]
+        coord_nums.unsqueeze(-1),                                # [N, 1]
+        avg_nn_dists.unsqueeze(-1),                              # [N, 1]
+        std_nn_dists.unsqueeze(-1),                              # [N, 1]
+    ], dim=-1)  # [N, 18]
+    if len(src_list) == 0:
+        return _empty_graph(atom_z, atom_features, n_atoms)
+    # ── EDGE FEATURES (Order 1) ───────────────────────────────
+    edge_index = torch.tensor([src_list, dst_list], dtype=torch.long)
+    edge_dist = torch.tensor(dist_list, dtype=torch.float32)
+    raw_vecs = torch.tensor(np.array(vec_list), dtype=torch.float32)
+    n_edges = edge_index.shape[1]
+    edge_rbf = gaussian_rbf(edge_dist, N_RBF_DIST, 0.0, CUTOFF)
+    norms = raw_vecs.norm(dim=-1, keepdim=True).clamp(min=1e-8)
+    edge_vec = raw_vecs / norms
+    # ── BOND PHYSICS FEATURES [E, 8] ─────────────────────────
+    z_src = atom_z[edge_index[0]]  # [E]
+    z_dst = atom_z[edge_index[1]]  # [E]
+    m_src = elem_table[z_src.clamp(0, 102), 0]   # mass
+    m_dst = elem_table[z_dst.clamp(0, 102), 0]
+    chi_src = elem_table[z_src.clamp(0, 102), 2]  # electronegativity
+    chi_dst = elem_table[z_dst.clamp(0, 102), 2]
+    r_src = elem_table[z_src.clamp(0, 102), 3]    # atomic radius
+    r_dst = elem_table[z_dst.clamp(0, 102), 3]
+    d = edge_dist.clamp(min=0.01)
+    # Vectorized bond physics computation
+    chi_prod = (chi_src * chi_dst).clamp(min=0.01)
+    k_est = torch.sqrt(chi_prod) / (d * d)                       # force constant
+    mu = (m_src * m_dst) / (m_src + m_dst).clamp(min=0.01)       # reduced mass
+    omega = torch.sqrt(k_est / mu.clamp(min=0.01))               # Einstein freq
+    delta_chi = (chi_src - chi_dst).abs()                         # EN difference
+    ionicity = delta_chi * delta_chi                              # bond ionicity
+    r_ratio = (r_src + r_dst) / d                                 # radius sum ratio
+    m_ratio = torch.min(m_src, m_dst) / torch.max(m_src, m_dst).clamp(min=0.01)
+    inv_d = 1.0 / d                                               # inverse distance
+    edge_physics = torch.stack([
+        k_est, mu, omega, delta_chi, ionicity, r_ratio, m_ratio, inv_d
+    ], dim=-1)  # [E, 8]
+    # ── TRIPLETS / ANGLES (Order 2) ───────────────────────────
+    dst_np = edge_index[1].numpy()
+    dest_to_edges = defaultdict(list)
+    for e_idx in range(n_edges):
+        dest_to_edges[int(dst_np[e_idx])].append(e_idx)
+    trip_ij, trip_kj = [], []
+    for j, edge_list in dest_to_edges.items():
+        for idx_ij in edge_list:
+            for idx_kj in edge_list:
+                if idx_ij != idx_kj:
+                    trip_ij.append(idx_ij)
+                    trip_kj.append(idx_kj)
+    if trip_ij:
+        triplet_index = torch.tensor([trip_ij, trip_kj], dtype=torch.long)
+        v_ij = edge_vec[triplet_index[0]]
+        v_kj = edge_vec[triplet_index[1]]
+        cos_theta = (v_ij * v_kj).sum(-1).clamp(-1 + 1e-7, 1 - 1e-7)
+        angles = torch.acos(cos_theta)
+        angle_rbf_t = gaussian_rbf(angles, N_RBF_ANGLE, 0.0, math.pi)
+        n_triplets = triplet_index.shape[1]
+    else:
+        triplet_index = torch.zeros(2, 0, dtype=torch.long)
+        angle_rbf_t = torch.zeros(0, N_RBF_ANGLE)
+        n_triplets = 0
+    # ── QUADS / DIHEDRALS (Order 3) ───────────────────────────
+    quad_index, dihedral_rbf_t, n_quads = _compute_quads(
+        triplet_index, n_triplets, edge_vec, trip_ij, trip_kj
+    )
+    return {
+        'atom_z': atom_z,
+        'atom_features': atom_features,
+        'n_atoms': n_atoms,
+        'edge_index': edge_index,
+        'edge_dist': edge_dist,
+        'edge_rbf': edge_rbf,
+        'edge_vec': edge_vec,
+        'edge_physics': edge_physics,
+        'n_edges': n_edges,
+        'triplet_index': triplet_index,
+        'angle_rbf': angle_rbf_t,
+        'n_triplets': n_triplets,
+        'quad_index': quad_index,
+        'dihedral_rbf': dihedral_rbf_t,
+        'n_quads': n_quads,
+    }
+def _compute_quads(triplet_index, n_triplets, edge_vec, trip_ij, trip_kj):
+    """Compute Order 3: pairs of triplets sharing a bond (dihedrals)."""
+    if n_triplets == 0:
+        return (torch.zeros(2, 0, dtype=torch.long),
+                torch.zeros(0, N_RBF_DIHEDRAL), 0)
+    # For each edge, which triplets reference it?
+    edge_to_trips = defaultdict(list)
+    for t_idx in range(n_triplets):
+        edge_to_trips[trip_ij[t_idx]].append(t_idx)
+        edge_to_trips[trip_kj[t_idx]].append(t_idx)
+    quad_src, quad_dst = [], []
+    for edge_idx, tlist in edge_to_trips.items():
+        for i in range(len(tlist)):
+            for j in range(len(tlist)):
+                if tlist[i] != tlist[j]:
+                    quad_src.append(tlist[i])
+                    quad_dst.append(tlist[j])
+                    if len(quad_src) >= MAX_QUADS:
+                        break
+            if len(quad_src) >= MAX_QUADS:
+                break
+        if len(quad_src) >= MAX_QUADS:
+            break
+    if not quad_src:
+        return (torch.zeros(2, 0, dtype=torch.long),
+                torch.zeros(0, N_RBF_DIHEDRAL), 0)
+    quad_index = torch.tensor([quad_src, quad_dst], dtype=torch.long)
+    # Dihedral angle = angle between planes of the two triplets
+    v_a1 = edge_vec[triplet_index[0, quad_index[0]]]
+    v_a2 = edge_vec[triplet_index[1, quad_index[0]]]
+    v_b1 = edge_vec[triplet_index[0, quad_index[1]]]
+    v_b2 = edge_vec[triplet_index[1, quad_index[1]]]
+    n_a = torch.cross(v_a1, v_a2, dim=-1)
+    n_b = torch.cross(v_b1, v_b2, dim=-1)
+    n_a = n_a / n_a.norm(dim=-1, keepdim=True).clamp(min=1e-8)
+    n_b = n_b / n_b.norm(dim=-1, keepdim=True).clamp(min=1e-8)
+    cos_dih = (n_a * n_b).sum(-1).clamp(-1 + 1e-7, 1 - 1e-7)
+    dihedrals = torch.acos(cos_dih)
+    dihedral_rbf_t = gaussian_rbf(dihedrals, N_RBF_DIHEDRAL, 0.0, math.pi)
+    return quad_index, dihedral_rbf_t, quad_index.shape[1]
+# ═══════════════════════════════════════════════════════════════
+# GLOBAL PHYSICS FEATURES (per crystal)
+# ═══════════════════════════════════════════════════════════════
+def compute_global_physics(graph, structure, elem_table):
+    """
+    Compute 15 global physics features from a crystal graph.
+    Features:
+      0: avg_force_constant       7: avg_coordination
+      1: std_force_constant       8: density
+      2: avg_reduced_mass         9: volume_per_atom
+      3: mass_variance           10: packing_fraction
+      4: avg_einstein_freq       11: avg_bond_length
+      5: electronegativity_var   12: std_bond_length
+      6: debye_temp_estimate     13: max_atomic_mass
+                                 14: min_atomic_mass
+    """
+    ep = graph['edge_physics']  # [E, 8]
+    n_atoms = graph['n_atoms']
+    atom_z = graph['atom_z']
+    # From bond physics
+    k_vals = ep[:, 0]   # force constants
+    mu_vals = ep[:, 1]   # reduced masses
+    omega_vals = ep[:, 2]  # Einstein frequencies
+    dists = graph['edge_dist']
+    feats = torch.zeros(N_GLOBAL_PHYS)
+    if graph['n_edges'] > 0 and dists.shape[0] > 0:
+        feats[0] = k_vals.mean()
+        feats[1] = k_vals.std() if k_vals.shape[0] > 1 else 0.0
+        feats[2] = mu_vals.mean()
+        feats[4] = omega_vals.mean()
+        feats[11] = dists.mean()
+        feats[12] = dists.std() if dists.shape[0] > 1 else 0.0
+    # Mass statistics
+    masses = elem_table[atom_z.clamp(0, 102), 0]
+    feats[3] = masses.var() if n_atoms > 1 else 0.0
+    feats[13] = masses.max()
+    feats[14] = masses.min()
+    # Electronegativity variance
+    chis = elem_table[atom_z.clamp(0, 102), 2]
+    feats[5] = chis.var() if n_atoms > 1 else 0.0
+    # Debye temperature estimate: Θ_D ∝ sqrt(k_avg / m_avg)
+    m_avg = masses.mean()
+    k_avg = feats[0]
+    feats[6] = math.sqrt(float(k_avg / max(m_avg, 0.01)))
+    # Coordination
+    feats[7] = graph['atom_features'][:, N_ELEM_FEAT + 3].mean()  # coord_num column
+    # Structural
+    try:
+        feats[8] = structure.density
+        feats[9] = structure.volume / max(n_atoms, 1)
+        # Packing fraction
+        total_vol = sum(
+            (4 / 3) * math.pi * (float(site.specie.atomic_radius) ** 3)
+            for site in structure
+            if hasattr(site.specie, 'atomic_radius') and site.specie.atomic_radius is not None
+        )
+        feats[10] = total_vol / structure.volume if structure.volume > 0 else 0.0
+    except:
+        pass
+    return feats
+# ═══════════════════════════════════════════════════════════════
+# STRUCTURAL FEATURES (per crystal)
+# ═══════════════════════════════════════════════════════════════
+def compute_structural_features(structure):
+    """
+    Compute 11 structural features: lattice params + symmetry.
+    Same as previous versions for backward compatibility.
+    """
+    from pymatgen.symmetry.analyzer import SpacegroupAnalyzer
+    feats = np.zeros(11, dtype=np.float32)
+    try:
+        lat = structure.lattice
+        feats[0:6] = [lat.a, lat.b, lat.c, lat.alpha, lat.beta, lat.gamma]
+        feats[6] = structure.volume / max(len(structure), 1)
+        feats[7] = structure.density
+        feats[8] = float(len(structure))
+        try:
+            sga = SpacegroupAnalyzer(structure, symprec=0.1)
+            feats[9] = float(sga.get_space_group_number())
+        except:
+            feats[9] = 0.0
+        try:
+            total_vol = sum(
+                (4 / 3) * np.pi * site.specie.atomic_radius ** 3
+                for site in structure
+                if hasattr(site.specie, 'atomic_radius') and site.specie.atomic_radius is not None
+            )
+            feats[10] = total_vol / structure.volume if structure.volume > 0 else 0.0
+        except:
+            feats[10] = 0.0
+    except:
+        pass
+    return feats
+# ═══════════════════════════════════════════════════════════════
+# COMPOSITION FEATURIZER (MAGPIE + mat2vec + matminer extras)
+# ═══════════════════════════════════════════════════════════════
+class CompositionFeaturizer:
+    """
+    Builds rich composition features per crystal:
+    - MAGPIE elemental properties (132d: 22 props × 6 stats)
+    - Extra matminer (Stoichiometry, ValenceOrbital, IonProperty, TMetalFraction)
+    - Structural features (11d)
+    - mat2vec embeddings (200d)
+    ✅ ALL features are deterministic per-sample. No cross-sample info.
+    """
+    M2V_URL = "https://storage.googleapis.com/mat2vec/"
+    M2V_FILES = [
+        "pretrained_embeddings",
+        "pretrained_embeddings.wv.vectors.npy",
+        "pretrained_embeddings.trainables.syn1neg.npy",
+    ]
+    def __init__(self, cache="mat2vec_cache"):
+        from matminer.featurizers.composition import (
+            ElementProperty, Stoichiometry, ValenceOrbital, IonProperty
+        )
+        from matminer.featurizers.composition.element import TMetalFraction
+        from gensim.models import Word2Vec
+        self.ep_magpie = ElementProperty.from_preset("magpie")
+        self.n_magpie = len(self.ep_magpie.feature_labels())
+        self.extra_ftzrs = [
+            ("Stoichiometry", Stoichiometry()),
+            ("ValenceOrbital", ValenceOrbital()),
+            ("IonProperty", IonProperty()),
+            ("TMetalFraction", TMetalFraction()),
+        ]
+        self._extra_sizes = {}
+        for name, ft in self.extra_ftzrs:
+            try:
+                self._extra_sizes[name] = len(ft.feature_labels())
+            except:
+                self._extra_sizes[name] = None
+        # Download mat2vec
+        os.makedirs(cache, exist_ok=True)
+        for f in self.M2V_FILES:
+            p = os.path.join(cache, f)
+            if not os.path.exists(p):
+                log.info(f"  Downloading mat2vec: {f}...")
+                urllib.request.urlretrieve(self.M2V_URL + f, p)
+        m2v = Word2Vec.load(os.path.join(cache, "pretrained_embeddings"))
+        self.emb = {w: m2v.wv[w] for w in m2v.wv.index_to_key}
+        self.n_extra = None  # determined on first call
+    def _pool_m2v(self, comp):
+        v, t = np.zeros(200, np.float32), 0.0
+        for s, f in comp.get_el_amt_dict().items():
+            if s in self.emb:
+                v += f * self.emb[s]
+                t += f
+        return v / max(t, 1e-8)
+    def _featurize_extras(self, comp):
+        parts = []
+        for name, ft in self.extra_ftzrs:
+            try:
+                vals = np.array(ft.featurize(comp), np.float32)
+                parts.append(np.nan_to_num(vals, nan=0.0))
+                if self._extra_sizes.get(name) is None:
+                    self._extra_sizes[name] = len(vals)
+            except:
+                sz = self._extra_sizes.get(name, 0) or 1
+                parts.append(np.zeros(sz, np.float32))
+        return np.concatenate(parts)
+    def featurize_all(self, compositions, structures):
+        """Return [N, D_comp] array of all composition features."""
+        # Determine dimensions from first sample
+        test_extras = self._featurize_extras(compositions[0])
+        self.n_extra = len(test_extras)
+        struct_feats_dim = 11
+        total_dim = self.n_magpie + self.n_extra + struct_feats_dim + 200
+        log.info(f"  Composition features: {self.n_magpie} MAGPIE + "
+                 f"{self.n_extra} Extras + 11 Structural + 200 mat2vec = {total_dim}d")
+        out = []
+        for i, comp in enumerate(tqdm(compositions, desc="  Featurizing compositions", leave=False)):
+            # MAGPIE
+            try:
+                mg = np.array(self.ep_magpie.featurize(comp), np.float32)
+            except:
+                mg = np.zeros(self.n_magpie, np.float32)
+            mg = np.nan_to_num(mg, nan=0.0)
+            # Extra matminer
+            ex = self._featurize_extras(comp)
+            # Structural
+            sf = compute_structural_features(structures[i])
+            # mat2vec
+            m2v = self._pool_m2v(comp)
+            out.append(np.concatenate([mg, ex, sf, m2v]))
+        return np.array(out, dtype=np.float32)
+# ═══════════════════════════════════════════════════════════════
+# MAIN — BUILD AND SAVE
+# ═══════════════════════════════════════════════════════════════
+def main():
+    t0 = time.time()
+    print("""
+  +==========================================================+
+  |  V6 Physics-Featurized Phonon Dataset Builder             |
+  |  3-Order Graphs | Bond Physics | Architecture-Agnostic    |
+  |  ⚠ NO SCALING — raw features only                       |
+  +==========================================================+
+    """)
+    # ── LOAD MATBENCH DATA ────────────────────────────────────
+    print("  Loading matbench_phonons...")
+    from matminer.datasets import load_dataset
+    df = load_dataset("matbench_phonons")
+    targets = np.array(df['last phdos peak'].tolist(), np.float32)
+    structures = df['structure'].tolist()
+    compositions = [s.composition for s in structures]
+    N = len(structures)
+    print(f"  Loaded: {N} samples")
+    print(f"  Target range: {targets.min():.1f} – {targets.max():.1f} cm⁻¹")
+    # ── BUILD ELEMENT TABLE ───────────────────────────────────
+    print("\n  Building element physics table...")
+    elem_table = build_element_table()
+    print(f"  Element table: {elem_table.shape} (Z=0..102, {N_ELEM_FEAT} features)")
+    # ── BUILD CRYSTAL GRAPHS ─────────────────────────────────
+    print(f"\n  Building 3-order crystal graphs ({MAX_NEIGHBORS}-NN, cutoff={CUTOFF}Å)...")
+    graphs = []
+    global_physics_list = []
+    for i, struct in enumerate(tqdm(structures, desc="  Building graphs")):
+        g = build_crystal_graph(struct, elem_table)
+        gp = compute_global_physics(g, struct, elem_table)
+        graphs.append(g)
+        global_physics_list.append(gp)
+    # Stats
+    n_atoms_list = [g['n_atoms'] for g in graphs]
+    n_edges_list = [g['n_edges'] for g in graphs]
+    n_trips_list = [g['n_triplets'] for g in graphs]
+    n_quads_list = [g['n_quads'] for g in graphs]
+    print(f"  Graphs built:")
+    print(f"    Atoms/crystal:    min={min(n_atoms_list)}, max={max(n_atoms_list)}, "
+          f"mean={np.mean(n_atoms_list):.1f}")
+    print(f"    Edges/crystal:    min={min(n_edges_list)}, max={max(n_edges_list)}, "
+          f"mean={np.mean(n_edges_list):.1f}")
+    print(f"    Triplets/crystal: min={min(n_trips_list)}, max={max(n_trips_list)}, "
+          f"mean={np.mean(n_trips_list):.1f}")
+    print(f"    Quads/crystal:    min={min(n_quads_list)}, max={max(n_quads_list)}, "
+          f"mean={np.mean(n_quads_list):.1f}")
+    global_physics = torch.stack(global_physics_list)
+    print(f"  Global physics: {global_physics.shape}")
+    # ── COMPOSITION FEATURES ─────────────────────────────────
+    print("\n  Computing composition features...")
+    feat = CompositionFeaturizer()
+    comp_features = feat.featurize_all(compositions, structures)
+    print(f"  Composition features shape: {comp_features.shape}")
+    # ── FOLD INDICES (strict matbench protocol) ──────────────
+    print(f"\n  Computing 5-fold split indices (seed={FOLD_SEED})...")
+    kf = KFold(N_FOLDS, shuffle=True, random_state=FOLD_SEED)
+    fold_indices = [(train_idx.tolist(), test_idx.tolist())
+                    for train_idx, test_idx in kf.split(range(N))]
+    # Verify zero leakage
+    for fi, (tr, te) in enumerate(fold_indices):
+        overlap = set(tr) & set(te)
+        assert len(overlap) == 0, f"DATA LEAK in fold {fi}: {len(overlap)} shared indices!"
+        assert len(tr) + len(te) == N, f"Fold {fi}: missing samples!"
+    print("  ✅ All folds verified: ZERO data leakage")
+    # ── FEATURE DIMENSION INFO ───────────────────────────────
+    n_magpie = feat.n_magpie
+    n_extra = feat.n_extra
+    feature_info = {
+        'atom_features_dim': N_ATOM_FEAT,
+        'atom_features_layout': [
+            'mass', '1/sqrt_mass', 'electronegativity', 'atomic_radius',
+            'covalent_radius', 'ionization_energy', 'electron_affinity',
+            'valence_electrons', 'group', 'period', 'block', 'is_metal',
+            'frac_x', 'frac_y', 'frac_z',
+            'coordination_num', 'avg_nn_dist', 'std_nn_dist',
+        ],
+        'edge_physics_dim': N_BOND_PHYSICS,
+        'edge_physics_layout': [
+            'force_constant', 'reduced_mass', 'einstein_freq',
+            'en_difference', 'ionicity', 'radius_sum_ratio',
+            'mass_ratio', 'inverse_distance',
+        ],
+        'edge_rbf_dim': N_RBF_DIST,
+        'angle_rbf_dim': N_RBF_ANGLE,
+        'dihedral_rbf_dim': N_RBF_DIHEDRAL,
+        'global_physics_dim': N_GLOBAL_PHYS,
+        'global_physics_layout': [
+            'avg_force_constant', 'std_force_constant', 'avg_reduced_mass',
+            'mass_variance', 'avg_einstein_freq', 'en_variance',
+            'debye_temp_estimate', 'avg_coordination', 'density',
+            'volume_per_atom', 'packing_fraction', 'avg_bond_length',
+            'std_bond_length', 'max_atomic_mass', 'min_atomic_mass',
+        ],
+        'comp_magpie_range': (0, n_magpie),
+        'comp_extras_range': (n_magpie, n_magpie + n_extra),
+        'comp_structural_range': (n_magpie + n_extra, n_magpie + n_extra + 11),
+        'comp_mat2vec_range': (n_magpie + n_extra + 11, n_magpie + n_extra + 11 + 200),
+        'comp_total_dim': comp_features.shape[1],
+    }
+    # ── SAVE ─────────────────────────────────────────────────
+    save_path = "phonons_v6_dataset.pt"
+    save_data = {
+        # Per-crystal data
+        'graphs': graphs,
+        'comp_features': torch.tensor(comp_features, dtype=torch.float32),
+        'global_physics': global_physics,
+        'targets': torch.tensor(targets, dtype=torch.float32),
+        # Fold indices
+        'fold_indices': fold_indices,
+        'fold_seed': FOLD_SEED,
+        # Metadata
+        'n_samples': N,
+        'feature_info': feature_info,
+        'element_table': elem_table,
+        'config': {
+            'cutoff': CUTOFF,
+            'max_neighbors': MAX_NEIGHBORS,
+            'n_rbf_dist': N_RBF_DIST,
+            'n_rbf_angle': N_RBF_ANGLE,
+            'n_rbf_dihedral': N_RBF_DIHEDRAL,
+            'max_quads': MAX_QUADS,
+            'fold_seed': FOLD_SEED,
+            'n_folds': N_FOLDS,
+        },
+    }
+    torch.save(save_data, save_path)
+    size_mb = os.path.getsize(save_path) / 1e6
+    dt = time.time() - t0
+    print(f"\n  ✅ Saved: {save_path} ({size_mb:.1f} MB)")
+    print(f"  Total time: {dt:.1f}s")
+    # ── SUMMARY ──────────────────────────────────────────────
+    print(f"""
+  ╔══════════════════════════════════════════════════════════╗
+  ║  Dataset Summary                                        ║
+  ╠══════════════════════════════════════════════════════════╣
+  ║  Samples:            {N:>6}                              ║
+  ║  Atom features:      {N_ATOM_FEAT:>6}d (12 elem + 3 coord + 3 local)  ║
+  ║  Bond RBF:           {N_RBF_DIST:>6}d                              ║
+  ║  Bond physics:       {N_BOND_PHYSICS:>6}d (k, μ, ω, Δχ, ...)          ║
+  ║  Angle RBF:          {N_RBF_ANGLE:>6}d                              ║
+  ║  Dihedral RBF:       {N_RBF_DIHEDRAL:>6}d                              ║
+  ║  Composition:        {comp_features.shape[1]:>6}d (MAGPIE+extras+struct+m2v)║
+  ║  Global physics:     {N_GLOBAL_PHYS:>6}d                              ║
+  ║  Folds:              {N_FOLDS:>6} (seed={FOLD_SEED})         ║
+  ║  File size:          {size_mb:>5.1f} MB                            ║
+  ╚══════════════════════════════════════════════════════════╝
+  ⚠ Remember: NO scaling applied. Apply StandardScaler at
+    training time using ONLY train-fold indices!
+  Architecture-agnostic: plug ANY model on top of this dataset.
+    """)
+if __name__ == '__main__':
+    main()

model_code/phonons_model.py ADDED Viewed

	@@ -0,0 +1,839 @@

+"""
++=============================================================+
+|  TRIADS V6 — Graph Attention TRM + Gate-Based Halting        |
+|                                                              |
+|  Single model: Gate-halt (4-16 adaptive cycles)              |
+|  d=56, 4 heads, gated residuals, deep supervision            |
+|  SWA last 50 ep | 200 epochs                                 |
+|                                                              |
+|  Loads: phonons_v6_dataset.pt                                |
++=============================================================+
+DEPENDENCIES (dataset already pre-computed, no matminer needed):
+    pip install torch numpy scikit-learn tqdm
+    (all pre-installed on Kaggle)
+USAGE:
+    python phonons_v6.py
+"""
+import os, copy, json, time, math, warnings, threading
+from collections import defaultdict
+warnings.filterwarnings('ignore')
+import numpy as np
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.optim.swa_utils import AveragedModel, SWALR
+from sklearn.preprocessing import StandardScaler
+# Notebook dashboard (IPython is always available on Kaggle)
+try:
+    from IPython.display import display, HTML, clear_output
+    IN_NOTEBOOK = True
+except ImportError:
+    IN_NOTEBOOK = False
+# ═══════════════════════════════════════════════════════════════
+# CONFIG
+# ═══════════════════════════════════════════════════════════════
+D             = 56
+N_HEADS       = 4
+N_WARMUP      = 1        # 1 unshared warm-up (param budget)
+N_ANGLE_RBF   = 8
+DROPOUT       = 0.1
+BATCH_SIZE    = 64
+EPOCHS        = 200
+SWA_START     = 150
+LR            = 5e-4
+WD            = 1e-4
+SEEDS         = [42]
+# Gate-halt model
+MIN_CYCLES    = 4
+MAX_CYCLES    = 16
+GATE_HALT_THR = 0.05     # halt when max gate < this
+GATE_SPARSITY = 0.001    # encourage gates to close
+BASELINES = {
+    'MEGNet': 28.76, 'ALIGNN': 29.34, 'MODNet': 45.39,
+    'CrabNet': 47.09, 'TRIADS V4': 56.33, 'TRIADS V3.1': 63.00,
+    'TRIADS V1': 71.82, 'Dummy': 323.76,
+}
+# ═══════════════════════════════════════════════════════════════
+# SCATTER
+# ═══════════════════════════════════════════════════════════════
+def scatter_sum(src, idx, dim_size):
+    out = torch.zeros(dim_size, src.shape[-1], dtype=src.dtype, device=src.device)
+    out.scatter_add_(0, idx.unsqueeze(-1).expand_as(src), src)
+    return out
+# ═══════════════════════════════════════════════════════════════
+# COLLATION + DATALOADER
+# ═══════════════════════════════════════════════════════════════
+def collate(graphs, comp, glob_phys, targets, indices, device):
+    az, af = [], []
+    ei, rb, vc, ph = [], [], [], []
+    tr, an = [], []
+    ba, na_list = [], []
+    a_off, e_off = 0, 0
+    for k, i in enumerate(indices):
+        g = graphs[i]
+        na, ne = g['n_atoms'], g['n_edges']
+        az.append(g['atom_z'])
+        af.append(g['atom_features'])
+        ei.append(g['edge_index'] + a_off)
+        rb.append(g['edge_rbf']); vc.append(g['edge_vec']); ph.append(g['edge_physics'])
+        tr.append(g['triplet_index'] + e_off)
+        an.append(g['angle_rbf'])
+        ba.append(torch.full((na,), k, dtype=torch.long))
+        na_list.append(na)
+        a_off += na; e_off += ne
+    return (
+        comp[indices].to(device),
+        glob_phys[indices].to(device),
+        {
+            'atom_z': torch.cat(az).to(device),
+            'atom_feat': torch.cat(af).to(device),
+            'ei': torch.cat(ei, 1).to(device),
+            'rbf': torch.cat(rb).to(device),
+            'vec': torch.cat(vc).to(device),
+            'phys': torch.cat(ph).to(device),
+            'triplets': torch.cat(tr, 1).to(device),
+            'angle_feat': torch.cat(an).to(device),
+            'batch': torch.cat(ba).to(device),
+            'n_crystals': len(indices),
+            'n_atoms': na_list,
+        },
+        targets[indices].to(device),
+    )
+class Loader:
+    def __init__(self, graphs, comp, gp, tgt, idx, bs, dev, shuf=False):
+        self.g, self.c, self.gp, self.t = graphs, comp, gp, tgt
+        self.idx, self.bs, self.dev, self.shuf = np.array(idx), bs, dev, shuf
+    def __iter__(self):
+        i = self.idx.copy()
+        if self.shuf: np.random.shuffle(i)
+        self._b = [i[j:j+self.bs] for j in range(0, len(i), self.bs)]
+        self._p = 0; return self
+    def __next__(self):
+        if self._p >= len(self._b): raise StopIteration
+        b = self._b[self._p]; self._p += 1
+        return collate(self.g, self.c, self.gp, self.t, b, self.dev)
+    def __len__(self): return (len(self.idx) + self.bs - 1) // self.bs
+# ═══════════════════════════════════════════════════════════════
+# GRAPH MESSAGE PASSING LAYER (Line Graph style)
+# ═══════════════════════════════════════════════════════════════
+class GraphMPLayer(nn.Module):
+    """Bond update (line graph) + Atom update (edge-gated)."""
+    def __init__(self, d, n_angle=N_ANGLE_RBF, dropout=DROPOUT):
+        super().__init__()
+        # Phase 1: Bond update from angular neighbors
+        self.bond_msg  = nn.Sequential(nn.Linear(d*2 + n_angle, d), nn.SiLU())
+        self.bond_gate = nn.Sequential(nn.Linear(d*2 + n_angle, d), nn.Sigmoid())
+        self.bond_up   = nn.Sequential(nn.Linear(d*2, d), nn.LayerNorm(d), nn.SiLU(), nn.Dropout(dropout))
+        # Phase 2: Atom update from bonds
+        self.atom_msg  = nn.Sequential(nn.Linear(d*3, d), nn.SiLU())
+        self.atom_gate = nn.Sequential(nn.Linear(d*3, d), nn.Sigmoid())
+        self.atom_up   = nn.Sequential(nn.Linear(d*2, d), nn.LayerNorm(d), nn.SiLU(), nn.Dropout(dropout))
+    def forward(self, atoms, bonds, ei, triplets, angle_feat):
+        # Phase 1: bonds learn from angular neighbors
+        if triplets.shape[1] > 0:
+            b_ij, b_kj = bonds[triplets[0]], bonds[triplets[1]]
+            inp = torch.cat([b_ij, b_kj, angle_feat], -1)
+            msg = self.bond_msg(inp) * self.bond_gate(inp)
+            agg = torch.zeros(bonds.size(0), bonds.size(1), dtype=torch.float32, device=msg.device)
+            agg.scatter_add_(0, triplets[0].unsqueeze(-1).expand_as(msg), msg)
+            bonds = bonds + self.bond_up(torch.cat([bonds, agg], -1))
+        # Phase 2: atoms aggregate from bonds
+        inp = torch.cat([atoms[ei[0]], atoms[ei[1]], bonds], -1)
+        msg = self.atom_msg(inp) * self.atom_gate(inp)
+        agg = scatter_sum(msg, ei[1], atoms.size(0))
+        atoms = atoms + self.atom_up(torch.cat([atoms, agg], -1))
+        return atoms, bonds
+# ═══════════════════════════════════════════════════════════════
+# PHONON V6 MODEL
+# ═══════════════════════════════════════════════════════════════
+class PhononV6(nn.Module):
+    """
+    Graph Attention TRM for phonon prediction.
+    mode='fixed':     Fixed n_cycles TRM cycles (Model 1)
+    mode='gate_halt': Gate-based implicit halting (Model 2)
+    """
+    def __init__(self, comp_dim, global_phys_dim=15, d=D,
+                 mode='gate_halt', n_cycles=MAX_CYCLES,
+                 min_cycles=MIN_CYCLES, max_cycles=MAX_CYCLES,
+                 n_warmup=N_WARMUP, n_heads=N_HEADS, dropout=DROPOUT):
+        super().__init__()
+        self.d = d
+        self.mode = mode
+        self.total_cycles = n_cycles if mode == 'fixed' else max_cycles
+        self.min_cycles = min_cycles if mode == 'gate_halt' else self.total_cycles
+        # Feature layout (from V6 dataset: 132 magpie + extras + 11 struct + 200 m2v)
+        self.n_magpie = 132
+        self.n_extra = comp_dim - 132 - 11 - 200
+        self.n_comp_tokens = 22 + 1 + 1  # 22 magpie + 1 extra + 1 m2v = 24
+        # ── Input Encoding ────────────────────────────────────
+        self.atom_embed = nn.Embedding(103, d)
+        self.atom_feat_proj = nn.Linear(18, d)
+        self.rbf_enc  = nn.Linear(40, d)
+        self.vec_enc  = nn.Linear(3, d)
+        self.phys_enc = nn.Linear(8, d)
+        # ── Composition Token Projections ─────────────────────
+        self.magpie_proj = nn.Linear(6, d)
+        self.extra_proj  = nn.Linear(max(self.n_extra, 1), d)
+        self.m2v_proj    = nn.Linear(200, d)
+        # ── Context (structural + global physics) ─────────────
+        self.ctx_proj = nn.Linear(11 + global_phys_dim, d)
+        # ── Token Type Embeddings ─────────────────────────────
+        self.type_embed = nn.Embedding(2, d)
+        # ── Warm-up Layers (unshared) ─────────────────────────
+        self.warmup = nn.ModuleList([GraphMPLayer(d, N_ANGLE_RBF, dropout) for _ in range(n_warmup)])
+        self.warmup_out = nn.Sequential(nn.Linear(d, d), nn.LayerNorm(d), nn.SiLU())
+        # ── Shared TRM Block ──────────────────────────────────
+        # Graph MP (shared)
+        self.trm_gnn = GraphMPLayer(d, N_ANGLE_RBF, dropout)
+        # Self-Attention
+        self.sa   = nn.MultiheadAttention(d, n_heads, dropout=dropout, batch_first=True)
+        self.sa_n = nn.LayerNorm(d)
+        self.sa_ff = nn.Sequential(nn.Linear(d, d), nn.GELU(), nn.Dropout(dropout), nn.Linear(d, d))
+        self.sa_fn = nn.LayerNorm(d)
+        # Cross-Attention
+        self.ca   = nn.MultiheadAttention(d, n_heads, dropout=dropout, batch_first=True)
+        self.ca_n = nn.LayerNorm(d)
+        # ── State Update (Gated Residuals) ───────────────────
+        self.z_proj = nn.Linear(d*3, d)
+        self.z_up   = nn.Sequential(nn.Linear(d*2, d), nn.SiLU(), nn.Linear(d, d))
+        self.z_gate = nn.Sequential(nn.Linear(d*2, d), nn.Sigmoid())
+        self.y_up   = nn.Sequential(nn.Linear(d*2, d), nn.SiLU(), nn.Linear(d, d))
+        self.y_gate = nn.Sequential(nn.Linear(d*2, d), nn.Sigmoid())
+        # ── Output Head ───────────────────────────────────────
+        self.head = nn.Sequential(nn.Linear(d, d//2), nn.SiLU(), nn.Linear(d//2, 1))
+        self._init_weights()
+    def _init_weights(self):
+        for m in self.modules():
+            if isinstance(m, nn.Linear):
+                nn.init.xavier_uniform_(m.weight)
+                if m.bias is not None: nn.init.zeros_(m.bias)
+    def forward(self, comp, glob_phys, g, deep_supervision=False):
+        B   = g['n_crystals']
+        ei  = g['ei']
+        dev = comp.device
+        # ══════════════════════════════════════════════════════
+        #  INPUT ENCODING
+        # ══════════════════════════════════════════════════════
+        # Atom features
+        atoms = self.atom_embed(g['atom_z'].clamp(0, 102)) + self.atom_feat_proj(g['atom_feat'])
+        # Bond features: distance (direction-gated) + physics
+        bonds = self.rbf_enc(g['rbf']) * torch.tanh(self.vec_enc(g['vec'])) + self.phys_enc(g['phys'])
+        triplets   = g['triplets']
+        angle_feat = g['angle_feat']
+        # ══════════════════════════════════════════════════════
+        #  WARM-UP (2 unshared graph layers)
+        # ══════════════════════════════════════════════════════
+        for layer in self.warmup:
+            atoms, bonds = layer(atoms, bonds, ei, triplets, angle_feat)
+        atoms = self.warmup_out(atoms)
+        # ══════════════════════════════════════════════════════
+        #  COMPOSITION TOKENS (24 total)
+        # ══════════════════════════════════════════════════════
+        magpie = comp[:, :132].view(B, 22, 6)
+        extras = comp[:, 132:132+self.n_extra]
+        s_meta = comp[:, 132+self.n_extra:132+self.n_extra+11]
+        m2v    = comp[:, -200:]
+        mag_tok = self.magpie_proj(magpie)                    # [B, 22, d]
+        ext_tok = self.extra_proj(extras).unsqueeze(1)        # [B, 1, d]
+        m2v_tok = self.m2v_proj(m2v).unsqueeze(1)             # [B, 1, d]
+        comp_tok = torch.cat([mag_tok, ext_tok, m2v_tok], 1)  # [B, 24, d]
+        comp_tok = comp_tok + self.type_embed.weight[0]
+        # Context vector (structural + global physics)
+        ctx = self.ctx_proj(torch.cat([s_meta, glob_phys], -1))  # [B, d]
+        # ══════════════════════════════════════════════════════
+        #  TRM REASONING LOOP
+        # ══════════════════════════════════════════════════════
+        z = torch.zeros(B, self.d, device=dev)
+        y = torch.zeros(B, self.d, device=dev)
+        preds = []
+        n_atoms = g['n_atoms']
+        self._gate_sparsity = 0.  # track gate magnitudes for regularizer
+        for cyc in range(self.total_cycles):
+            # ── Phase 1+2: Graph MP (shared weights) ──────────
+            atoms, bonds = self.trm_gnn(atoms, bonds, ei, triplets, angle_feat)
+            # ── Pad atoms for attention ─────────────────��─────
+            ma = max(n_atoms)
+            atom_tok = atoms.new_zeros(B, ma, self.d)
+            atom_mask = torch.ones(B, ma, dtype=torch.bool, device=dev)
+            off = 0
+            for i, na in enumerate(n_atoms):
+                atom_tok[i, :na] = atoms[off:off+na]
+                atom_mask[i, :na] = False
+                off += na
+            atom_tok = atom_tok + self.type_embed.weight[1]
+            # ── Phase 3: Joint Self-Attention ─────────────────
+            all_tok = torch.cat([comp_tok, atom_tok], 1)
+            full_mask = torch.cat([
+                torch.zeros(B, self.n_comp_tokens, dtype=torch.bool, device=dev),
+                atom_mask
+            ], 1)
+            sa_out = self.sa(all_tok, all_tok, all_tok, key_padding_mask=full_mask)[0]
+            all_tok = self.sa_n(all_tok + sa_out)
+            all_tok = self.sa_fn(all_tok + self.sa_ff(all_tok))
+            comp_tok = all_tok[:, :self.n_comp_tokens]
+            atom_tok = all_tok[:, self.n_comp_tokens:]
+            # ── Phase 4: Cross-Attention (comp queries atoms) ─
+            ca_out = self.ca(comp_tok, atom_tok, atom_tok, key_padding_mask=atom_mask)[0]
+            comp_tok = self.ca_n(comp_tok + ca_out)
+            # ── Unpad atoms back to flat ──────────────────────
+            parts = [atom_tok[i, :n_atoms[i]] for i in range(B)]
+            atoms = torch.cat(parts, 0)
+            # ── Phase 5: State Update (Gated Residuals) ───────
+            xp = comp_tok.mean(dim=1)  # [B, d]
+            z_inp = self.z_proj(torch.cat([xp, ctx, y], -1))
+            z_cand = self.z_up(torch.cat([z_inp, z], -1))
+            z_g = self.z_gate(torch.cat([z_inp, z], -1))
+            z = z + z_g * z_cand
+            y_cand = self.y_up(torch.cat([y, z], -1))
+            y_g = self.y_gate(torch.cat([y, z], -1))
+            y = y + y_g * y_cand
+            # Track gate sparsity (mean of all gate activations)
+            self._gate_sparsity = self._gate_sparsity + (z_g.mean() + y_g.mean()) / 2
+            preds.append(self.head(y).squeeze(-1))
+            # ── Phase 6: Gate-Based Halting ────────────────────
+            if self.mode == 'gate_halt' and cyc >= self.min_cycles - 1:
+                if y_g.max().item() < GATE_HALT_THR:
+                    break
+        # Normalize gate sparsity by number of cycles actually run
+        n_run = len(preds)
+        self._gate_sparsity = self._gate_sparsity / max(n_run, 1)
+        return preds if deep_supervision else preds[-1]
+    def count_parameters(self):
+        return sum(p.numel() for p in self.parameters() if p.requires_grad)
+# ═══════════════════════════════════════════════════════════════
+# LOSS FUNCTIONS
+# ═══════════════════════════════════════════════════════════════
+def deep_sup_loss(preds_list, targets):
+    """Linearly-weighted deep supervision: later cycles get more weight."""
+    p = torch.stack(preds_list)
+    w = torch.arange(1, p.shape[0]+1, device=p.device, dtype=p.dtype)
+    w = w / w.sum()
+    return (w * (p - targets.unsqueeze(0)).abs().mean(1)).sum()
+def gate_halt_loss(preds_list, targets, gate_sparsity):
+    """Deep supervision + gate sparsity to encourage early halting."""
+    return deep_sup_loss(preds_list, targets) + GATE_SPARSITY * gate_sparsity
+# ═══════════════════════════════════════════════════════════════
+# STRATIFIED SPLIT (within train fold → train/val)
+# ═══════════════════════════════════════════════════════════════
+def strat_split(t, vf=0.15, seed=42):
+    bins = np.digitize(t, np.percentile(t, [25, 50, 75]))
+    tr, vl = [], []
+    rng = np.random.RandomState(seed)
+    for b in range(4):
+        m = np.where(bins == b)[0]
+        if len(m) == 0: continue
+        n = max(1, int(len(m) * vf))
+        c = rng.choice(m, n, replace=False)
+        vl.extend(c.tolist())
+        tr.extend(np.setdiff1d(m, c).tolist())
+    return np.array(tr), np.array(vl)
+# ═══════════════════════════════════════════════════════════════
+# LIVE DASHBOARD (IPython HTML — works in Kaggle/Jupyter)
+# ═══════════════════════════════════════════════════════════════
+_print_lock = threading.Lock()
+# Shared state updated by training threads, read by dashboard
+_dash_state = {
+    'GH': {'fold': 0, 'ep': 0, 'tr': float('inf'), 'val': float('inf'),
+           'best': float('inf'), 'best_ep': 0, 'lr': 0., 'eta_m': 0,
+           'ep_s': 0., 'swa': False, 'done': False, 'test_mae': None},
+}
+_dash_log = []  # Accumulates milestone messages
+def _log(msg):
+    with _print_lock:
+        _dash_log.append(msg)
+        if not IN_NOTEBOOK:
+            print(msg, flush=True)
+def _render_html():
+    """Build an HTML table from _dash_state + recent log lines."""
+    css = (
+        '<style>'
+        '.tv6{font-family:monospace;font-size:13px;border-collapse:collapse;width:100%}'
+        '.tv6 th{background:#1a1a2e;color:#e94560;padding:6px 10px;text-align:right;border-bottom:2px solid #e94560}'
+        '.tv6 td{padding:5px 10px;text-align:right;border-bottom:1px solid #333}'
+        '.tv6 tr:nth-child(odd){background:#16213e}'
+        '.tv6 tr:nth-child(even){background:#0f3460}'
+        '.tv6 td:first-child,.tv6 th:first-child{text-align:left;font-weight:bold;color:#e9c46a}'
+        '.tv6 .best{color:#2ecc71;font-weight:bold}'
+        '.tv6 .done{color:#2ecc71}'
+        '.tv6 .swa{color:#9b59b6}'
+        '.tv6 .training{color:#f39c12}'
+        '.tv6 .waiting{color:#636e72}'
+        '.logbox{font-family:monospace;font-size:12px;color:#dfe6e9;background:#0a0a0a;'
+        'padding:8px 12px;margin-top:8px;border-radius:6px;max-height:200px;overflow-y:auto}'
+        '</style>'
+    )
+    rows = ''
+    for name, s in _dash_state.items():
+        if s['done'] and s['test_mae']:
+            status = f'<span class="done">✅ {s["test_mae"]:.1f}</span>'
+        elif s['swa']:
+            status = '<span class="swa">SWA</span>'
+        elif s['ep'] == 0:
+            status = '<span class="waiting">Waiting</span>'
+        else:
+            status = '<span class="training">▶ Training</span>'
+        ep_str   = f"{s['ep']}/{EPOCHS}" if s['ep'] else '-'
+        tr_str   = f"{s['tr']:.1f}" if s['tr'] < 1e6 else '-'
+        val_str  = f"{s['val']:.1f}" if s['val'] < 1e6 else '-'
+        best_str = f'<span class="best">{s["best"]:.1f}@{s["best_ep"]}</span>' if s['best'] < 1e6 else '-'
+        lr_str   = f"{s['lr']:.0e}" if s['lr'] > 0 else '-'
+        eps_str  = f"{s['ep_s']:.1f}" if s['ep_s'] > 0 else '-'
+        eta_str  = f"{s['eta_m']:.0f}m" if s['eta_m'] > 0 else '-'
+        fold_str = str(s['fold']) if s['fold'] else '-'
+        rows += (f'<tr><td>{name}</td><td>{fold_str}</td><td>{ep_str}</td>'
+                 f'<td>{tr_str}</td><td>{val_str}</td><td>{best_str}</td>'
+                 f'<td>{lr_str}</td><td>{eps_str}</td><td>{eta_str}</td>'
+                 f'<td>{status}</td></tr>')
+    table = (
+        f'{css}<h3 style="color:#e94560;font-family:monospace;margin:4px 0">⚡ TRIADS V6 — Live Dashboard</h3>'
+        f'<table class="tv6"><tr><th>Model</th><th>Fold</th><th>Epoch</th>'
+        f'<th>Train MAE</th><th>Val MAE</th><th>Best MAE</th>'
+        f'<th>LR</th><th>s/ep</th><th>ETA</th><th>Status</th></tr>{rows}</table>'
+    )
+    # Show last 8 log messages
+    if _dash_log:
+        log_html = '<br>'.join(_dash_log[-8:])
+        table += f'<div class="logbox">{log_html}</div>'
+    return table
+class Dashboard:
+    """Background thread that re-renders an HTML table every 5s in-place."""
+    def __init__(self):
+        self._stop = threading.Event()
+        self._thread = None
+    def start(self):
+        if not IN_NOTEBOOK:
+            return
+        self._stop.clear()
+        self._thread = threading.Thread(target=self._run, daemon=True)
+        self._thread.start()
+    def stop(self):
+        if not IN_NOTEBOOK or self._thread is None:
+            return
+        self._stop.set()
+        self._thread.join(timeout=10)
+        # Final render
+        clear_output(wait=True)
+        display(HTML(_render_html()))
+    def _run(self):
+        while not self._stop.is_set():
+            try:
+                clear_output(wait=True)
+                display(HTML(_render_html()))
+            except Exception:
+                pass
+            self._stop.wait(5)
+_dashboard = Dashboard()
+def train_fold_core(model, tr_loader, vl_loader, device, fold, seed,
+                    model_name, tgt_mean=0., tgt_std=1., log_every=10):
+    """
+    Train one model on one device. Uses AMP + structured line logging.
+    Returns (best_val_mae, model_with_best_weights).
+    """
+    opt = torch.optim.AdamW(model.parameters(), lr=LR, weight_decay=WD)
+    # Cosine scheduler with 10-epoch linear warmup
+    WARMUP_EP = 10
+    def lr_lambda(ep):
+        if ep < WARMUP_EP: return (ep + 1) / WARMUP_EP
+        progress = (ep - WARMUP_EP) / max(1, EPOCHS - WARMUP_EP)
+        return 0.5 * (1 + math.cos(math.pi * progress)) * (1 - 1e-5/LR) + 1e-5/LR
+    sch = torch.optim.lr_scheduler.LambdaLR(opt, lr_lambda)
+    swa_model = AveragedModel(model)
+    swa_sch = SWALR(opt, swa_lr=1e-4)
+    bv, bw, best_ep = float('inf'), None, 0
+    fold_start = time.time()
+    for ep in range(EPOCHS):
+        ep_start = time.time()
+        use_swa = ep >= SWA_START
+        # ── TRAIN ─────────────────────────────────────────────
+        model.train()
+        te, tn = 0., 0
+        for cb, gb, g_batch, tb in tr_loader:
+            sp = model(cb, gb, g_batch, True)
+            if model.mode == 'gate_halt':
+                loss = gate_halt_loss(sp, tb, model._gate_sparsity)
+            else:
+                loss = deep_sup_loss(sp, tb)
+            opt.zero_grad(set_to_none=True)
+            loss.backward()
+            torch.nn.utils.clip_grad_norm_(model.parameters(), 0.5)
+            opt.step()
+            with torch.no_grad():
+                te += ((sp[-1] * tgt_std + tgt_mean) - (tb * tgt_std + tgt_mean)).abs().sum().item()
+                tn += len(tb)
+        if use_swa:
+            swa_model.update_parameters(model)
+            swa_sch.step()
+        else:
+            sch.step()
+        # ── VALIDATE ──────────────────────────────────────────
+        eval_m = swa_model if use_swa and ep == EPOCHS - 1 else model
+        eval_m.eval()
+        ve, vn = 0., 0
+        with torch.inference_mode():
+            for cb, gb, g_batch, tb in vl_loader:
+                pred = eval_m(cb, gb, g_batch)
+                ve += ((pred * tgt_std + tgt_mean) - (tb * tgt_std + tgt_mean)).abs().sum().item()
+                vn += len(tb)
+        train_mae = te / max(tn, 1)
+        val_mae = ve / max(vn, 1)
+        ep_time = time.time() - ep_start
+        if val_mae < bv:
+            bv = val_mae
+            bw = copy.deepcopy(model.state_dict())
+            best_ep = ep + 1
+        # ── UPDATE DASHBOARD STATE (every epoch) ────────────
+        lr_now = opt.param_groups[0]['lr']
+        eta_m = (EPOCHS - ep - 1) * ep_time / 60
+        _dash_state[model_name].update({
+            'fold': fold, 'ep': ep + 1,
+            'tr': train_mae, 'val': val_mae,
+            'best': bv, 'best_ep': best_ep,
+            'lr': lr_now, 'ep_s': ep_time,
+            'eta_m': eta_m, 'swa': use_swa,
+        })
+        # ── PLAIN LOG (fallback / milestone prints) ───────────
+        if not IN_NOTEBOOK and ((ep + 1) % log_every == 0 or ep == 0 or ep == EPOCHS - 1):
+            swa_tag = ' SWA' if use_swa else ''
+            _log(f"    [{model_name}|F{fold}] ep {ep+1:>3}/{EPOCHS}"
+                 f" │ Tr={train_mae:>6.1f}  Val={val_mae:>6.1f}"
+                 f"  Best={bv:>6.1f}@{best_ep:<3}"
+                 f" │ lr={lr_now:.0e}{swa_tag}"
+                 f" │ {ep_time:.1f}s/ep  ETA {eta_m:.0f}m")
+    model.load_state_dict(bw)
+    total_time = time.time() - fold_start
+    _log(f"    [{model_name}|F{fold}] ✅ Done in {total_time/60:.1f}m │ Best Val MAE = {bv:.2f} @ epoch {best_ep}")
+    return bv, model
+def evaluate_model(model, test_loader, device, tgt_mean=0., tgt_std=1.):
+    """Evaluate model MAE on test set (returns MAE in original scale)."""
+    model.eval()
+    ee, en_ = 0., 0
+    with torch.inference_mode():
+        for cb, gb, g_batch, tb in test_loader:
+            pred = model(cb, gb, g_batch) * tgt_std + tgt_mean
+            real = tb * tgt_std + tgt_mean
+            ee += (pred - real).abs().sum().item()
+            en_ += len(tb)
+    return ee / max(en_, 1)
+# ═══════════════════════════════════════════════════════════════
+# DUAL-GPU PARALLEL TRAINING
+# ═══════════════════════════════════════════════════════════════
+def _train_worker(model, tr_loader, vl_loader, te_loader, device,
+                  fold, seed, model_name, result_dict, key,
+                  tgt_mean=0., tgt_std=1.):
+    """Thread worker: train + evaluate one model on one GPU."""
+    try:
+        _, best_model = train_fold_core(
+            model, tr_loader, vl_loader, device, fold, seed, model_name,
+            tgt_mean=tgt_mean, tgt_std=tgt_std
+        )
+        mae = evaluate_model(best_model, te_loader, device, tgt_mean, tgt_std)
+        result_dict[key] = mae
+        _dash_state[model_name]['test_mae'] = mae
+        _dash_state[model_name]['done'] = True
+        _log(f"    [{model_name}|F{fold}] 🏆 Test MAE = {mae:.2f} cm⁻¹")
+        del best_model
+    except Exception as e:
+        import traceback
+        _log(f"    [{model_name}|F{fold}] ❌ ERROR: {e}\n{traceback.format_exc()}")
+        result_dict[key] = float('inf')
+        _dash_state[model_name]['done'] = True
+    finally:
+        if device.type == 'cuda':
+            torch.cuda.empty_cache()
+# ═══════════════��═══════════════════════════════════════════════
+# MAIN
+# ═══════════════════════════════════════════════════════════════
+def main():
+    t0 = time.time()
+    n_gpus = torch.cuda.device_count() if torch.cuda.is_available() else 0
+    print(f"""
+  ╔══════════════════════════════════════════════════════════╗
+  ║  TRIADS V6 — Graph-TRM + Gate-Based Halting              ║
+  ║                                                          ║
+  ║  Gate-halt: {MIN_CYCLES}-{MAX_CYCLES} adaptive cycles, d={D}                ║
+  ║  Deep supervision │ SWA (last {EPOCHS-SWA_START} ep) │ {EPOCHS} ep          ║
+  ╚══════════════════════════════════════════════════════════╝
+    """)
+    device = torch.device('cuda:0' if n_gpus > 0 else 'cpu')
+    if n_gpus > 0:
+        name = torch.cuda.get_device_name(0)
+        mem = torch.cuda.get_device_properties(0).total_memory / 1e9
+        print(f"  GPU: {name} ({mem:.1f} GB)")
+        torch.backends.cuda.matmul.allow_tf32 = True
+        torch.backends.cudnn.benchmark = True
+    else:
+        print("  ⚠ No GPU — training will be slow")
+    # ── LOAD DATASET ──────────────────────────────────────────
+    kaggle_path = "/kaggle/input/datasets/rudratiwari0099x/phonons-training-dataset/phonons_v6_dataset.pt"
+    local_path = "phonons_v6_dataset.pt"
+    ds_path = kaggle_path if os.path.exists(kaggle_path) else local_path
+    print(f"  Loading {ds_path}...")
+    data = torch.load(ds_path, weights_only=False)
+    graphs = data['graphs']
+    comp_all = data['comp_features']
+    glob_phys = data['global_physics']
+    tgt_all = data['targets']
+    fold_indices = data['fold_indices']
+    N = data['n_samples']
+    comp_dim = comp_all.shape[1]
+    gp_dim = glob_phys.shape[1]
+    print(f"  Dataset: {N} samples | comp_dim: {comp_dim} | global_phys: {gp_dim}")
+    # ── VERIFY FOLDS ──────────────────────────────────────────
+    for fi, (tr, te) in enumerate(fold_indices):
+        assert len(set(tr) & set(te)) == 0, f"LEAK in fold {fi}!"
+    print("  5 folds: zero leakage ✓")
+    # ── MODEL SIZE CHECK ─────────────────────────────────────
+    m_test = PhononV6(comp_dim, gp_dim, mode='gate_halt',
+                      min_cycles=MIN_CYCLES, max_cycles=MAX_CYCLES)
+    n_params = m_test.count_parameters()
+    print(f"  Model (Gate-Halt TRM): {n_params:,} params")
+    del m_test
+    print()
+    # ── TRAINING ──────────────────────────────────────────────
+    tnp = tgt_all.numpy()
+    results = {}
+    _dashboard.start()
+    try:
+        for seed in SEEDS:
+            print(f"  {'═'*3} Seed {seed} {'═'*55}")
+            ts = time.time()
+            fold_maes = {}
+            for fi, (tv_idx, te_idx) in enumerate(fold_indices):
+                tv_idx, te_idx = np.array(tv_idx), np.array(te_idx)
+                print(f"\n  ┌─ Fold {fi+1}/5 {'─'*50}")
+                # Train/val split within train fold
+                tri, vli = strat_split(tnp[tv_idx], 0.15, seed + fi)
+                # Normalize targets (from train split ONLY — zero leakage)
+                tgt_mean = float(tgt_all[tv_idx[tri]].mean())
+                tgt_std  = float(tgt_all[tv_idx[tri]].std()) + 1e-8
+                tgt_norm = (tgt_all - tgt_mean) / tgt_std
+                print(f"  │ Target norm: mean={tgt_mean:.1f} std={tgt_std:.1f}")
+                # Scale features (ONLY from train split — zero leakage)
+                sc = StandardScaler().fit(comp_all[tv_idx[tri]].numpy())
+                cs = torch.tensor(
+                    np.nan_to_num(sc.transform(comp_all.numpy()), nan=0.).astype(np.float32)
+                )
+                sc_gp = StandardScaler().fit(glob_phys[tv_idx[tri]].numpy())
+                gps = torch.tensor(
+                    np.nan_to_num(sc_gp.transform(glob_phys.numpy()), nan=0.).astype(np.float32)
+                )
+                # Seed for reproducibility
+                torch.manual_seed(seed + fi)
+                np.random.seed(seed + fi)
+                if n_gpus > 0:
+                    torch.cuda.manual_seed_all(seed + fi)
+                # Create model
+                model = PhononV6(comp_dim, gp_dim, mode='gate_halt',
+                                 min_cycles=MIN_CYCLES,
+                                 max_cycles=MAX_CYCLES).to(device)
+                # Build loaders with NORMALIZED targets
+                trl = Loader(graphs, cs, gps, tgt_norm, tv_idx[tri], BATCH_SIZE, device, True)
+                vll = Loader(graphs, cs, gps, tgt_norm, tv_idx[vli], BATCH_SIZE, device, False)
+                tel = Loader(graphs, cs, gps, tgt_norm, te_idx,      BATCH_SIZE, device, False)
+                # Reset dashboard
+                _dash_state['GH']['done'] = False
+                # Train
+                _, best_model = train_fold_core(
+                    model, trl, vll, device, fi+1, seed, "GH",
+                    tgt_mean=tgt_mean, tgt_std=tgt_std
+                )
+                mae = evaluate_model(best_model, tel, device, tgt_mean, tgt_std)
+                fold_maes[fi] = mae
+                _dash_state['GH']['test_mae'] = mae
+                _dash_state['GH']['done'] = True
+                _log(f"    [GH|F{fi+1}] 🏆 Test MAE = {mae:.2f} cm⁻¹")
+                # ── SAVE WEIGHTS ─────────────────────────────────────
+                os.makedirs('phonons_models_v6', exist_ok=True)
+                torch.save({
+                    'model_state': best_model.state_dict(),
+                    'test_mae': mae,
+                    'fold': fi + 1,
+                    'seed': seed,
+                    'comp_dim': comp_dim,
+                    'gp_dim': gp_dim,
+                }, f'phonons_models_v6/phonons_v6_s{seed}_f{fi+1}.pt')
+                _log(f"    [GH|F{fi+1}] 💾 Saved phonons_models_v6/phonons_v6_s{seed}_f{fi+1}.pt")
+                # ─────────────────────────────────────────────────────
+                print(f"  └─ Fold {fi+1} done │ MAE = {fold_maes[fi]:.2f} cm⁻¹")
+                del model, best_model
+                if n_gpus > 0: torch.cuda.empty_cache()
+            avg = np.mean(list(fold_maes.values()))
+            results[seed] = fold_maes
+            elapsed = time.time() - ts
+            print(f"\n  Seed {seed} │ Avg MAE: {avg:.2f} │ {elapsed/60:.1f} min")
+    finally:
+        _dashboard.stop()
+    # ── FINAL RESULTS ─────────────────────────────────────────
+    fa = np.mean([np.mean(list(v.values())) for v in results.values()])
+    print(f"""
+{'='*62}
+  FINAL RESULTS — V6 Gate-Halt TRM
+{'='*62}
+  {'Model':<45} {'MAE':>10}
+  {'─'*57}""")
+    for n, v in sorted(BASELINES.items(), key=lambda x: x[1]):
+        beaten = ' ← BEATEN!' if fa < v else ''
+        print(f"  {n:<45} {v:>10.2f}{beaten}")
+    print(f"  {'V6 Gate-Halt TRM ('+str(n_params//1000)+'K, '+str(MIN_CYCLES)+'-'+str(MAX_CYCLES)+' cycles)':<45} {fa:>10.2f} ← OURS")
+    print(f"  {'─'*57}")
+    print(f"  Total time: {(time.time()-t0)/60:.1f} min")
+    # ── SAVE ──────────────────────────────────────────────────
+    res = {
+        'model': 'V6-Gate-Halt-TRM', 'params': n_params,
+        'cycles': f'{MIN_CYCLES}-{MAX_CYCLES}',
+        'avg_mae': round(fa, 2),
+        'per_fold': {str(s): {str(k): round(v, 2) for k,v in m.items()}
+                     for s,m in results.items()},
+    }
+    with open('phonons_v6_results.json', 'w') as f:
+        json.dump(res, f, indent=2)
+    print("  Saved: phonons_v6_results.json\n")
+if __name__ == '__main__':
+    main()

model_code/steels_model.py ADDED Viewed

	@@ -0,0 +1,1056 @@

+"""
+╔══════════════════════════════════════════════════════════════════════╗
+║  TRM-MatSci V13 — 2-Layer SA + Multi-Seed Ensemble                   ║
+║  Dataset: matbench_steels │ 312 samples │ 5-Fold Nested CV          ║
+║                                                                      ║
+║  V13A  2-Layer Self-Attention + Standard Deep Supervision            ║
+║        d_attn=64, nhead=4, d_hidden=96, ff_dim=150, 20 steps       ║
+║        Expanded features (Magpie + Mat2Vec + Extra descriptors)      ║
+║        2nd SA layer for higher-order property interactions           ║
+║        5-seed ensemble (avg predictions across seeds)                ║
+║                                                                      ║
+║  V13B  Same 2-Layer SA + Confidence-Weighted Deep Supervision        ║
+║        22 steps, confidence_head learns which step to trust          ║
+║        5-seed ensemble (avg predictions across seeds)                ║
+║                                                                      ║
+║  All models: Deep Supervision + SWA + AdamW + 300 epochs            ║
+║  Baseline: V12A = 95.99 MPa (current best)                          ║
+╚══════════════════════════════════════════════════════════════════════╝
+"""
+import os, copy, json, time, logging, warnings, shutil, urllib.request
+warnings.filterwarnings('ignore')
+import numpy as np
+import pandas as pd
+import matplotlib
+matplotlib.use('Agg')
+import matplotlib.pyplot as plt
+import matplotlib.gridspec as gridspec
+from tqdm import tqdm
+import torch
+import torch.nn as nn
+import torch.nn.functional as F
+from torch.utils.data import Dataset, DataLoader
+import torch.optim as optim
+from torch.optim.swa_utils import AveragedModel, SWALR, update_bn
+from sklearn.model_selection import KFold
+from sklearn.preprocessing import StandardScaler
+from pymatgen.core import Composition
+from matminer.featurizers.composition import ElementProperty
+from gensim.models import Word2Vec
+logging.basicConfig(level=logging.INFO, format='%(name)s │ %(message)s')
+log = logging.getLogger("TRM13")
+# Seeds for multi-seed ensemble
+SEEDS = [42, 123, 7, 0, 99]
+N_SEEDS = len(SEEDS)
+BASELINES = {
+    'TPOT-Mat':       79.9468,
+    'AutoML-Mat':     82.3043,
+    'MODNet':         87.7627,
+    'RF-SCM/Magpie': 103.5125,
+    'V12A (best)':    95.9900,
+    'V11B':          102.3003,
+    'V10A':          103.2867,
+    'CrabNet':       107.3160,
+    'Darwin':        123.2932,
+}
+# ══════════════════════════════════════════════════════════════════════
+# 1. FEATURIZER + DATASET
+# ══════════════════════════════════════════════════════════════════════
+class ExpandedFeaturizer:
+    """Magpie (22 props × 6 stats) + Extra matminer descriptors + Mat2Vec (200d).
+    Extra descriptors: ElementFraction, Stoichiometry, ValenceOrbital,
+    IonProperty, BandCenter — all concatenated as a flat vector between
+    the Magpie block and Mat2Vec.
+    """
+    GCS = "https://storage.googleapis.com/mat2vec/"
+    FILES = ["pretrained_embeddings",
+             "pretrained_embeddings.wv.vectors.npy",
+             "pretrained_embeddings.trainables.syn1neg.npy"]
+    def __init__(self, cache="mat2vec_cache"):
+        from matminer.featurizers.composition import (
+            ElementFraction, Stoichiometry, ValenceOrbital,
+            IonProperty, BandCenter
+        )
+        from matminer.featurizers.base import MultipleFeaturizer
+        self.ep_magpie = ElementProperty.from_preset("magpie")
+        self.n_mg = len(self.ep_magpie.feature_labels())
+        self.extra_feats = MultipleFeaturizer([
+            ElementFraction(),
+            Stoichiometry(),
+            ValenceOrbital(),
+            IonProperty(),
+            BandCenter(),
+        ])
+        self.n_extra = None   # detected at featurize time
+        self.scaler = None
+        os.makedirs(cache, exist_ok=True)
+        for f in self.FILES:
+            p = os.path.join(cache, f)
+            if not os.path.exists(p):
+                log.info(f"  Downloading {f}...")
+                urllib.request.urlretrieve(self.GCS + f, p)
+        self.m2v = Word2Vec.load(os.path.join(cache, "pretrained_embeddings"))
+        self.emb = {w: self.m2v.wv[w] for w in self.m2v.wv.index_to_key}
+    def _pool(self, c):
+        v, t = np.zeros(200, np.float32), 0.0
+        for s, f in c.get_el_amt_dict().items():
+            if s in self.emb: v += f * self.emb[s]; t += f
+        return v / max(t, 1e-8)
+    def featurize_all(self, comps):
+        out = []
+        for c in tqdm(comps, desc="  Featurizing (expanded)", leave=False):
+            try: mg = np.array(self.ep_magpie.featurize(c), np.float32)
+            except: mg = np.zeros(self.n_mg, np.float32)
+            try:
+                ex = np.array(self.extra_feats.featurize(c), np.float32)
+            except:
+                ex = np.zeros(self.n_extra or 200, np.float32)
+            if self.n_extra is None:
+                self.n_extra = len(ex)
+                log.info(f"Expanded features: {self.n_mg} Magpie + "
+                         f"{self.n_extra} Extra + 200 Mat2Vec = "
+                         f"{self.n_mg + self.n_extra + 200}d")
+            out.append(np.concatenate([
+                np.nan_to_num(mg, nan=0.0),
+                np.nan_to_num(ex, nan=0.0),
+                self._pool(c)
+            ]))
+        return np.array(out)
+    def fit_scaler(self, X): self.scaler = StandardScaler().fit(X)
+    def transform(self, X):
+        if not self.scaler: return X
+        return np.nan_to_num(self.scaler.transform(X), nan=0.0).astype(np.float32)
+class DSData(Dataset):
+    def __init__(self, X, y):
+        self.X = torch.tensor(X, dtype=torch.float32)
+        self.y = torch.tensor(np.array(y, np.float32))
+    def __len__(self): return len(self.y)
+    def __getitem__(self, i): return self.X[i], self.y[i]
+# ══════════════════════════════════════════════════════════════════════
+# 2. MODELS — with 2-Layer Self-Attention
+# ══════════════════════════════════════════════════════════════════════
+class DeepHybridTRM(nn.Module):
+    """V13A: 2-Layer SA Hybrid-TRM with Standard Deep Supervision.
+    Key difference from V12A's HybridTRM:
+      - TWO self-attention layers (SA1 → FF1 → SA2 → FF2 → CA)
+      - Each SA layer has its own residual + LayerNorm + FF block
+      - This enables higher-order property interaction modeling
+        (e.g., "correlation between electronegativity-range AND
+         atomic-radius-mean" requires composing two rounds of attention)
+      - Cross-attention (CA) to Mat2Vec context remains after SA stack
+    Everything else (MLP reasoning loop, deep supervision, SWA)
+    is identical to V12A.
+    """
+    def __init__(self, n_props=22, stat_dim=6, n_extra=0, mat2vec_dim=200,
+                 d_attn=64, nhead=4, d_hidden=96, ff_dim=150,
+                 dropout=0.2, max_steps=20, **kw):
+        super().__init__()
+        self.max_steps, self.D = max_steps, d_hidden
+        self.n_props, self.stat_dim = n_props, stat_dim
+        self.n_extra = n_extra
+        # ── Attention feature extractor (2-Layer SA) ──────────────────
+        self.tok_proj = nn.Sequential(
+            nn.Linear(stat_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        self.m2v_proj = nn.Sequential(
+            nn.Linear(mat2vec_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        # Self-Attention Layer 1
+        self.sa1 = nn.MultiheadAttention(
+            d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa1_n = nn.LayerNorm(d_attn)
+        self.sa1_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa1_fn = nn.LayerNorm(d_attn)
+        # Self-Attention Layer 2 (NEW — captures higher-order interactions)
+        self.sa2 = nn.MultiheadAttention(
+            d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa2_n = nn.LayerNorm(d_attn)
+        self.sa2_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa2_fn = nn.LayerNorm(d_attn)
+        # Cross-Attention to Mat2Vec context (after SA stack)
+        self.ca = nn.MultiheadAttention(
+            d_attn, nhead, dropout=dropout, batch_first=True)
+        self.ca_n = nn.LayerNorm(d_attn)
+        # Pool with optional extra feature injection
+        pool_in = d_attn + (n_extra if n_extra > 0 else 0)
+        self.pool = nn.Sequential(
+            nn.Linear(pool_in, d_hidden), nn.LayerNorm(d_hidden), nn.GELU())
+        # MLP-TRM recursive reasoning (shared weights)
+        self.z_up = nn.Sequential(
+            nn.Linear(d_hidden*3, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.y_up = nn.Sequential(
+            nn.Linear(d_hidden*2, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.head = nn.Linear(d_hidden, 1)
+        self._init()
+    def _init(self):
+        for m in self.modules():
+            if isinstance(m, nn.Linear):
+                nn.init.xavier_uniform_(m.weight)
+                if m.bias is not None: nn.init.zeros_(m.bias)
+    def _attention(self, x):
+        B = x.size(0)
+        mg_dim = self.n_props * self.stat_dim
+        mg = x[:, :mg_dim]
+        if self.n_extra > 0:
+            extra = x[:, mg_dim:mg_dim + self.n_extra]
+            m2v = x[:, mg_dim + self.n_extra:]
+        else:
+            extra = None
+            m2v = x[:, mg_dim:]
+        tok = self.tok_proj(mg.view(B, self.n_props, self.stat_dim))
+        ctx = self.m2v_proj(m2v).unsqueeze(1)
+        # SA Layer 1: learn pairwise property interactions
+        tok = self.sa1_n(tok + self.sa1(tok, tok, tok)[0])
+        tok = self.sa1_fn(tok + self.sa1_ff(tok))
+        # SA Layer 2: learn higher-order property interactions
+        tok = self.sa2_n(tok + self.sa2(tok, tok, tok)[0])
+        tok = self.sa2_fn(tok + self.sa2_ff(tok))
+        # Cross-Attention to Mat2Vec chemistry context
+        tok = self.ca_n(tok + self.ca(tok, ctx, ctx)[0])
+        pooled = tok.mean(dim=1)  # [B, d_attn]
+        if extra is not None:
+            pooled = torch.cat([pooled, extra], dim=-1)
+        return self.pool(pooled)  # [B, d_hidden]
+    def forward(self, x, deep_supervision=False, return_trajectory=False):
+        B = x.size(0)
+        xp = self._attention(x)
+        z = torch.zeros(B, self.D, device=x.device)
+        y = torch.zeros(B, self.D, device=x.device)
+        step_preds = []
+        for _ in range(self.max_steps):
+            z = z + self.z_up(torch.cat([xp, y, z], -1))
+            y = y + self.y_up(torch.cat([y, z], -1))
+            step_preds.append(self.head(y).squeeze(1))
+        if deep_supervision:
+            return step_preds
+        elif return_trajectory:
+            return step_preds[-1], step_preds
+        else:
+            return step_preds[-1]
+    def count_parameters(self):
+        return sum(p.numel() for p in self.parameters() if p.requires_grad)
+class DeepConfidenceHybridTRM(nn.Module):
+    """V13B: 2-Layer SA Hybrid-TRM with Confidence-Weighted Deep Supervision.
+    Same 2-layer SA feature extractor as DeepHybridTRM, but with:
+      - confidence_head that learns which recursion step to trust
+      - Final prediction = softmax(confidence) · step_preds
+      - No ponder cost (avoids V11C's failure)
+      - 22 recursion steps (vs 20 for V13A)
+    """
+    def __init__(self, n_props=22, stat_dim=6, n_extra=0, mat2vec_dim=200,
+                 d_attn=64, nhead=4, d_hidden=96, ff_dim=150,
+                 dropout=0.2, max_steps=22, **kw):
+        super().__init__()
+        self.max_steps, self.D = max_steps, d_hidden
+        self.n_props, self.stat_dim = n_props, stat_dim
+        self.n_extra = n_extra
+        # ── Attention feature extractor (2-Layer SA) ──────────────────
+        self.tok_proj = nn.Sequential(
+            nn.Linear(stat_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        self.m2v_proj = nn.Sequential(
+            nn.Linear(mat2vec_dim, d_attn), nn.LayerNorm(d_attn), nn.GELU())
+        # Self-Attention Layer 1
+        self.sa1 = nn.MultiheadAttention(
+            d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa1_n = nn.LayerNorm(d_attn)
+        self.sa1_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa1_fn = nn.LayerNorm(d_attn)
+        # Self-Attention Layer 2 (higher-order interactions)
+        self.sa2 = nn.MultiheadAttention(
+            d_attn, nhead, dropout=dropout, batch_first=True)
+        self.sa2_n = nn.LayerNorm(d_attn)
+        self.sa2_ff = nn.Sequential(
+            nn.Linear(d_attn, d_attn*2), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(d_attn*2, d_attn))
+        self.sa2_fn = nn.LayerNorm(d_attn)
+        # Cross-Attention to Mat2Vec context
+        self.ca = nn.MultiheadAttention(
+            d_attn, nhead, dropout=dropout, batch_first=True)
+        self.ca_n = nn.LayerNorm(d_attn)
+        # Pool with optional extra feature injection
+        pool_in = d_attn + (n_extra if n_extra > 0 else 0)
+        self.pool = nn.Sequential(
+            nn.Linear(pool_in, d_hidden), nn.LayerNorm(d_hidden), nn.GELU())
+        # MLP-TRM recursive reasoning (shared weights)
+        self.z_up = nn.Sequential(
+            nn.Linear(d_hidden*3, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.y_up = nn.Sequential(
+            nn.Linear(d_hidden*2, ff_dim), nn.GELU(), nn.Dropout(dropout),
+            nn.Linear(ff_dim, d_hidden), nn.LayerNorm(d_hidden))
+        self.head = nn.Linear(d_hidden, 1)
+        # ── Confidence head: learns which step to trust ──────────────
+        self.confidence_head = nn.Sequential(
+            nn.Linear(d_hidden, d_hidden // 2), nn.GELU(),
+            nn.Linear(d_hidden // 2, 1))  # raw logit, softmaxed later
+        self._init()
+    def _init(self):
+        for m in self.modules():
+            if isinstance(m, nn.Linear):
+                nn.init.xavier_uniform_(m.weight)
+                if m.bias is not None: nn.init.zeros_(m.bias)
+        with torch.no_grad():
+            nn.init.zeros_(self.confidence_head[-1].bias)
+    def _attention(self, x):
+        B = x.size(0)
+        mg_dim = self.n_props * self.stat_dim
+        mg = x[:, :mg_dim]
+        if self.n_extra > 0:
+            extra = x[:, mg_dim:mg_dim + self.n_extra]
+            m2v = x[:, mg_dim + self.n_extra:]
+        else:
+            extra = None
+            m2v = x[:, mg_dim:]
+        tok = self.tok_proj(mg.view(B, self.n_props, self.stat_dim))
+        ctx = self.m2v_proj(m2v).unsqueeze(1)
+        # SA Layer 1
+        tok = self.sa1_n(tok + self.sa1(tok, tok, tok)[0])
+        tok = self.sa1_fn(tok + self.sa1_ff(tok))
+        # SA Layer 2
+        tok = self.sa2_n(tok + self.sa2(tok, tok, tok)[0])
+        tok = self.sa2_fn(tok + self.sa2_ff(tok))
+        # Cross-Attention
+        tok = self.ca_n(tok + self.ca(tok, ctx, ctx)[0])
+        pooled = tok.mean(dim=1)
+        if extra is not None:
+            pooled = torch.cat([pooled, extra], dim=-1)
+        return self.pool(pooled)
+    def forward(self, x, deep_supervision=False, return_confidence=False):
+        """Forward pass.
+        Returns:
+            deep_supervision=True:  (step_preds, confidence_logits)
+            deep_supervision=False, return_confidence=False:
+                weighted_pred: [B] confidence-weighted prediction
+            deep_supervision=False, return_confidence=True:
+                (weighted_pred, confidence_weights): [B], [B, max_steps]
+        """
+        B = x.size(0)
+        xp = self._attention(x)
+        z = torch.zeros(B, self.D, device=x.device)
+        y = torch.zeros(B, self.D, device=x.device)
+        step_preds = []
+        conf_logits = []
+        for _ in range(self.max_steps):
+            z = z + self.z_up(torch.cat([xp, y, z], -1))
+            y = y + self.y_up(torch.cat([y, z], -1))
+            step_preds.append(self.head(y).squeeze(1))
+            conf_logits.append(self.confidence_head(y).squeeze(1))
+        conf_logits = torch.stack(conf_logits, dim=1)  # [B, max_steps]
+        if deep_supervision:
+            return step_preds, conf_logits
+        # Confidence-weighted prediction
+        conf_weights = F.softmax(conf_logits, dim=1)  # [B, max_steps]
+        preds_stack = torch.stack(step_preds, dim=1)   # [B, max_steps]
+        weighted_pred = (preds_stack * conf_weights).sum(dim=1)  # [B]
+        if return_confidence:
+            return weighted_pred, conf_weights
+        return weighted_pred
+    def count_parameters(self):
+        return sum(p.numel() for p in self.parameters() if p.requires_grad)
+# ══════════════════════════════════════════════════════════════════════
+# 3. LOSS FUNCTIONS
+# ══════════════════════════════════════════════════════════════════════
+def deep_supervision_loss(step_preds, targets):
+    """Linear-weighted L1 loss across all recursion steps."""
+    n = len(step_preds)
+    weights = [(i + 1) for i in range(n)]
+    total_w = sum(weights)
+    loss = 0.0
+    for pred, w in zip(step_preds, weights):
+        loss += (w / total_w) * F.l1_loss(pred, targets)
+    return loss
+def confidence_ds_loss(step_preds, targets, conf_logits):
+    """Advanced Deep Supervision: standard DS + confidence-weighted L1.
+    Components:
+    1. Standard linear-weighted deep supervision on all steps
+    2. L1 loss on the confidence-weighted final prediction
+    """
+    ds = deep_supervision_loss(step_preds, targets)
+    conf_weights = F.softmax(conf_logits, dim=1)  # [B, max_steps]
+    preds_stack = torch.stack(step_preds, dim=1)   # [B, max_steps]
+    weighted_pred = (preds_stack * conf_weights).sum(dim=1)
+    conf_loss = F.l1_loss(weighted_pred, targets)
+    return ds + conf_loss
+# ══════════════════════════════════════════════════════════════════════
+# 4. UTILS + TRAINING
+# ══════════════════════════════════════════════════════════════════════
+def strat_split(targets, val_size=0.15, seed=42):
+    bins = np.percentile(targets, [25, 50, 75])
+    lbl = np.digitize(targets, bins)
+    tr, vl = [], []
+    rng = np.random.RandomState(seed)
+    for b in range(4):
+        m = np.where(lbl == b)[0]
+        if len(m) == 0: continue
+        n = max(1, int(len(m) * val_size))
+        c = rng.choice(m, n, replace=False)
+        vl.extend(c.tolist()); tr.extend(np.setdiff1d(m, c).tolist())
+    return np.array(tr), np.array(vl)
+def train_fold_standard(model, tr_dl, vl_dl, device,
+                        epochs=300, swa_start=200, fold=1, name=""):
+    """Training with standard deep supervision."""
+    opt = optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
+    sch = optim.lr_scheduler.CosineAnnealingLR(opt, T_max=swa_start, eta_min=1e-4)
+    swa_m = AveragedModel(model)
+    swa_s = SWALR(opt, swa_lr=5e-4)
+    swa_on = False
+    best_v, best_w = float('inf'), copy.deepcopy(model.state_dict())
+    hist = {'train': [], 'val': []}
+    pbar = tqdm(range(epochs), desc=f"  [{name}] F{fold}/5",
+                leave=False, ncols=120)
+    for ep in pbar:
+        model.train(); tl = 0.0
+        for bx, by in tr_dl:
+            bx, by = bx.to(device), by.to(device)
+            step_preds = model(bx, deep_supervision=True)
+            loss = deep_supervision_loss(step_preds, by)
+            opt.zero_grad(set_to_none=True); loss.backward()
+            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+            opt.step()
+            tl += F.l1_loss(step_preds[-1], by).item() * len(by)
+        tl /= len(tr_dl.dataset)
+        model.eval(); vl = 0.0
+        with torch.no_grad():
+            for bx, by in vl_dl:
+                bx, by = bx.to(device), by.to(device)
+                pred = model(bx)
+                vl += F.l1_loss(pred, by).item() * len(by)
+        vl /= len(vl_dl.dataset)
+        hist['train'].append(tl); hist['val'].append(vl)
+        if ep < swa_start:
+            sch.step()
+            if vl < best_v: best_v, best_w = vl, copy.deepcopy(model.state_dict())
+        else:
+            if not swa_on: swa_on = True
+            swa_m.update_parameters(model); swa_s.step()
+        pbar.set_postfix(Tr=f'{tl:.1f}', Val=f'{vl:.1f}',
+                        Best=f'{best_v:.1f}', Ph='SWA' if swa_on else 'COS')
+    if swa_on:
+        update_bn(tr_dl, swa_m, device=device)
+        model.load_state_dict(swa_m.module.state_dict())
+    else:
+        model.load_state_dict(best_w)
+    return best_v, model, hist
+def train_fold_confidence(model, tr_dl, vl_dl, device,
+                          epochs=300, swa_start=200, fold=1, name=""):
+    """Training with confidence-weighted deep supervision."""
+    opt = optim.AdamW(model.parameters(), lr=1e-3, weight_decay=1e-4)
+    sch = optim.lr_scheduler.CosineAnnealingLR(opt, T_max=swa_start, eta_min=1e-4)
+    swa_m = AveragedModel(model)
+    swa_s = SWALR(opt, swa_lr=5e-4)
+    swa_on = False
+    best_v, best_w = float('inf'), copy.deepcopy(model.state_dict())
+    hist = {'train': [], 'val': []}
+    pbar = tqdm(range(epochs), desc=f"  [{name}] F{fold}/5",
+                leave=False, ncols=120)
+    for ep in pbar:
+        model.train(); tl = 0.0
+        for bx, by in tr_dl:
+            bx, by = bx.to(device), by.to(device)
+            step_preds, conf_logits = model(bx, deep_supervision=True)
+            loss = confidence_ds_loss(step_preds, by, conf_logits)
+            opt.zero_grad(set_to_none=True); loss.backward()
+            torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
+            opt.step()
+            # Track confidence-weighted MAE for display
+            with torch.no_grad():
+                cw = F.softmax(conf_logits, dim=1)
+                ps = torch.stack(step_preds, dim=1)
+                wp = (ps * cw).sum(dim=1)
+                tl += F.l1_loss(wp, by).item() * len(by)
+        tl /= len(tr_dl.dataset)
+        model.eval(); vl = 0.0
+        with torch.no_grad():
+            for bx, by in vl_dl:
+                bx, by = bx.to(device), by.to(device)
+                pred = model(bx)  # uses confidence-weighted by default
+                vl += F.l1_loss(pred, by).item() * len(by)
+        vl /= len(vl_dl.dataset)
+        hist['train'].append(tl); hist['val'].append(vl)
+        if ep < swa_start:
+            sch.step()
+            if vl < best_v: best_v, best_w = vl, copy.deepcopy(model.state_dict())
+        else:
+            if not swa_on: swa_on = True
+            swa_m.update_parameters(model); swa_s.step()
+        pbar.set_postfix(Tr=f'{tl:.1f}', Val=f'{vl:.1f}',
+                        Best=f'{best_v:.1f}', Ph='SWA' if swa_on else 'COS')
+    if swa_on:
+        update_bn(tr_dl, swa_m, device=device)
+        model.load_state_dict(swa_m.module.state_dict())
+    else:
+        model.load_state_dict(best_w)
+    return best_v, model, hist
+def predict(model, dl, device):
+    model.eval(); preds = []
+    with torch.no_grad():
+        for bx, _ in dl:
+            preds.append(model(bx.to(device)).cpu())
+    return torch.cat(preds)
+def predict_confidence(model, dl, device):
+    """Predict using confidence model, also return per-step weights."""
+    model.eval()
+    all_preds, all_weights = [], []
+    with torch.no_grad():
+        for bx, _ in dl:
+            pred, weights = model(bx.to(device), return_confidence=True)
+            all_preds.append(pred.cpu())
+            all_weights.append(weights.cpu())
+    return torch.cat(all_preds), torch.cat(all_weights)
+def get_targets(dl):
+    tgts = []
+    for _, by in dl: tgts.append(by)
+    return torch.cat(tgts)
+# ══════════════════════════════════════════════════════════════════════
+# 5. MAIN BENCHMARK — Multi-Seed Ensemble
+# ══════════════════════════════════════════════════════════════════════
+def run_benchmark():
+    t0 = time.time()
+    print("\n" + "═"*72)
+    print("  TRM-MatSci V13 │ 2-Layer SA + Multi-Seed Ensemble │ matbench_steels")
+    print("  V13A: 2-Layer SA + expanded features + standard DS (5-seed ensemble)")
+    print("  V13B: 2-Layer SA + expanded features + confidence DS (5-seed ensemble)")
+    print(f"  Seeds: {SEEDS}")
+    print("═"*72 + "\n")
+    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+    if device.type == 'cuda':
+        log.info(f"GPU: {torch.cuda.get_device_name(0)}  "
+                 f"({torch.cuda.get_device_properties(0).total_mem/1e9:.1f} GB)")
+        torch.backends.cuda.matmul.allow_tf32 = True
+        torch.backends.cudnn.benchmark = True
+    log.info("Loading matbench_steels...")
+    from matminer.datasets import load_dataset
+    df = load_dataset("matbench_steels")
+    comps_raw = df['composition'].tolist()
+    targets_all = np.array(df['yield strength'].tolist(), np.float32)
+    comps_all = [Composition(c) for c in comps_raw]
+    # ── FEATURIZE ─────────────────────────────────────────────────────
+    log.info("Computing EXPANDED features...")
+    feat = ExpandedFeaturizer()
+    X_all = feat.featurize_all(comps_all)
+    n_extra = feat.n_extra
+    log.info(f"Features: {X_all.shape} (n_extra={n_extra})")
+    kfold = KFold(n_splits=5, shuffle=True, random_state=18012019)
+    folds = list(kfold.split(comps_all))
+    os.makedirs('trm_models_v13', exist_ok=True)
+    dl_kw = dict(batch_size=32, num_workers=0)
+    # ── CONFIGS ───────────────────────────────────────────────────────
+    shared_kw = dict(n_props=22, stat_dim=6, n_extra=n_extra,
+                     mat2vec_dim=200, d_attn=64, nhead=4,
+                     d_hidden=96, ff_dim=150, dropout=0.2)
+    configs = {
+        'V13A-2xSA-StdDS': {
+            'model_cls': DeepHybridTRM,
+            'model_kw': {**shared_kw, 'max_steps': 20},
+            'train_fn': train_fold_standard,
+            'predict_fn': predict,
+            'is_confidence': False,
+        },
+        'V13B-2xSA-ConfDS': {
+            'model_cls': DeepConfidenceHybridTRM,
+            'model_kw': {**shared_kw, 'max_steps': 22},
+            'train_fn': train_fold_confidence,
+            'predict_fn': None,  # uses predict_confidence
+            'is_confidence': True,
+        },
+    }
+    # Print param counts
+    print(f"\n  {'Config':<24} {'Params':>10} {'Steps':>8}  {'Seeds':>6}")
+    print(f"  {'─'*54}")
+    for cname, cfg in configs.items():
+        _m = cfg['model_cls'](**cfg['model_kw'])
+        np_ = _m.count_parameters(); del _m
+        cfg['n_params'] = np_
+        steps = cfg['model_kw']['max_steps']
+        print(f"  {cname:<24} {np_:>10,} {steps:>8}  {N_SEEDS:>6}")
+    print()
+    # ── TRAIN + EVALUATE (Multi-Seed) ─────────────────────────────────
+    all_results = {}
+    all_hists = {}
+    all_conf_weights = {}
+    for cname, cfg in configs.items():
+        print(f"\n{'▓'*72}")
+        print(f"  {cname} — {N_SEEDS}-Seed Ensemble")
+        print(f"{'▓'*72}")
+        # Store per-seed, per-fold predictions and MAEs
+        seed_fold_preds = {s: {} for s in SEEDS}    # seed -> {fold_idx: preds_tensor}
+        seed_fold_maes = {s: [] for s in SEEDS}     # seed -> [mae_f1, ..., mae_f5]
+        fold_hists = []                              # collect from first seed only
+        fold_conf_w = []                             # collect from first seed only
+        for si, seed in enumerate(SEEDS):
+            print(f"\n  ╔═══ Seed {seed} ({si+1}/{N_SEEDS}) ═══╗")
+            for fi, (tv_i, te_i) in enumerate(folds):
+                print(f"\n  ── [{cname} seed={seed}] Fold {fi+1}/5 {'─'*30}")
+                tri, vli = strat_split(targets_all[tv_i], 0.15, seed+fi)
+                feat.fit_scaler(X_all[tv_i][tri])
+                tr_s = feat.transform(X_all[tv_i][tri])
+                vl_s = feat.transform(X_all[tv_i][vli])
+                te_s = feat.transform(X_all[te_i])
+                pin = device.type == 'cuda'
+                tr_dl = DataLoader(DSData(tr_s, targets_all[tv_i][tri]), shuffle=True,
+                                   pin_memory=pin, **dl_kw)
+                vl_dl = DataLoader(DSData(vl_s, targets_all[tv_i][vli]), shuffle=False,
+                                   pin_memory=pin, **dl_kw)
+                te_dl = DataLoader(DSData(te_s, targets_all[te_i]), shuffle=False,
+                                   pin_memory=pin, **dl_kw)
+                te_tgt = get_targets(te_dl)
+                torch.manual_seed(seed + fi); np.random.seed(seed + fi)
+                if device.type == 'cuda': torch.cuda.manual_seed(seed + fi)
+                model = cfg['model_cls'](**cfg['model_kw']).to(device)
+                bv, model, hist = cfg['train_fn'](model, tr_dl, vl_dl, device,
+                                                   fold=fi+1,
+                                                   name=f"{cname}[s{seed}]")
+                # Save hist only for first seed
+                if si == 0:
+                    fold_hists.append(hist)
+                # Predict
+                if cfg['is_confidence']:
+                    pred, conf_w = predict_confidence(model, te_dl, device)
+                    if si == 0:
+                        fold_conf_w.append(conf_w)
+                    avg_peak = conf_w.argmax(dim=1).float().mean().item() + 1
+                    mae = F.l1_loss(pred, te_tgt).item()
+                    log.info(f"  [s{seed}] F{fi+1}: MAE={mae:.2f}  "
+                             f"(val {bv:.2f}, avg peak step={avg_peak:.1f})")
+                else:
+                    pred = cfg['predict_fn'](model, te_dl, device)
+                    mae = F.l1_loss(pred, te_tgt).item()
+                    log.info(f"  [s{seed}] F{fi+1}: MAE={mae:.2f}  (val {bv:.2f})")
+                seed_fold_preds[seed][fi] = pred
+                seed_fold_maes[seed].append(mae)
+                torch.save({'model_state': model.state_dict(), 'test_mae': mae,
+                            'config': cname, 'seed': seed},
+                           f'trm_models_v13/{cname}_seed{seed}_fold{fi+1}.pt')
+                # Free GPU memory
+                del model; torch.cuda.empty_cache() if device.type == 'cuda' else None
+            seed_avg = float(np.mean(seed_fold_maes[seed]))
+            print(f"  ╚═══ Seed {seed} avg: {seed_avg:.2f} MPa ═══╝")
+        # ── Compute ensemble predictions ──────────────────────────────
+        ensemble_fold_maes = []
+        for fi, (tv_i, te_i) in enumerate(folds):
+            te_tgt_np = targets_all[te_i]
+            te_tgt_t = torch.tensor(te_tgt_np, dtype=torch.float32)
+            # Average predictions across all seeds for this fold
+            all_seed_preds = torch.stack([seed_fold_preds[s][fi] for s in SEEDS])
+            ensemble_pred = all_seed_preds.mean(dim=0)
+            ens_mae = F.l1_loss(ensemble_pred, te_tgt_t).item()
+            ensemble_fold_maes.append(ens_mae)
+        ens_avg = float(np.mean(ensemble_fold_maes))
+        ens_std = float(np.std(ensemble_fold_maes))
+        # Also compute per-seed averages for reporting
+        per_seed_avgs = {s: float(np.mean(seed_fold_maes[s])) for s in SEEDS}
+        best_single_seed = min(per_seed_avgs.items(), key=lambda x: x[1])
+        all_results[cname] = {
+            'avg': ens_avg, 'std': ens_std, 'folds': ensemble_fold_maes,
+            'params': cfg['n_params'],
+            'per_seed_avgs': per_seed_avgs,
+            'per_seed_folds': {str(s): seed_fold_maes[s] for s in SEEDS},
+            'best_single_seed': best_single_seed[0],
+            'best_single_mae': best_single_seed[1],
+        }
+        all_hists[cname] = fold_hists
+        if fold_conf_w:
+            all_conf_weights[cname] = fold_conf_w
+        print(f"\n  ═══ {cname} ═══")
+        print(f"      Ensemble ({N_SEEDS}-seed avg): {ens_avg:.4f} ±{ens_std:.4f} MPa")
+        print(f"      Best single seed ({best_single_seed[0]}): "
+              f"{best_single_seed[1]:.4f} MPa")
+        for s in SEEDS:
+            print(f"      Seed {s:>3}: {per_seed_avgs[s]:.2f} MPa  "
+                  f"folds={[f'{m:.1f}' for m in seed_fold_maes[s]]}")
+    # ══════════════════════════════════════════════════════════════════
+    # FINAL RESULTS
+    # ══════════════════════════════════════════════════════════════════
+    tt = time.time() - t0
+    print(f"\n{'═'*72}")
+    print(f"  FINAL LEADERBOARD — matbench_steels V13 (5-Fold Avg MAE)")
+    print(f"{'═'*72}")
+    print(f"  {'Model':<26} {'Params':>10} {'MAE(MPa)':>10} {'±Std':>8}  Notes")
+    print(f"  {'─'*72}")
+    for n, r in sorted(all_results.items(), key=lambda x: x[1]['avg']):
+        tag = ("  ← BEATS MODNet 🏆" if r['avg'] < 87.76 else
+               "  ← BEATS V12A ✓"  if r['avg'] < 95.99 else
+               "  ← BEATS RF-SCM ✓"  if r['avg'] < 103.51 else
+               "  ← BEATS DARWIN ✓"  if r['avg'] < 123.29 else "")
+        print(f"  {n+' (ens)':<26} {r['params']:>9,} "
+              f"{r['avg']:>10.4f} {r['std']:>8.4f}{tag}")
+        print(f"  {n+' (best 1)':<26} {'':>10} "
+              f"{r['best_single_mae']:>10.4f} {'':>8}  seed={r['best_single_seed']}")
+    print(f"  {'─'*72}")
+    for bn, bv in sorted(BASELINES.items(), key=lambda x: x[1]):
+        print(f"  {bn:<26} {'baseline':>10} {bv:>10.4f}")
+    print(f"\n  Total time: {tt/60:.1f} minutes  ({N_SEEDS} seeds × 2 configs × 5 folds)")
+    # Per-fold ensemble breakdown
+    print(f"\n{'═'*72}")
+    print(f"  PER-FOLD ENSEMBLE BREAKDOWN")
+    print(f"{'═'*72}")
+    cnames = list(all_results.keys())
+    header = f"  {'Fold':<6}"
+    for cn in cnames:
+        header += f" {cn:>20}"
+    print(header)
+    print(f"  {'─'*52}")
+    for fi in range(5):
+        row = f"  {fi+1:<6}"
+        for cn in cnames:
+            row += f" {all_results[cn]['folds'][fi]:>20.2f}"
+        print(row)
+    # Per-seed breakdown
+    print(f"\n{'═'*72}")
+    print(f"  PER-SEED BREAKDOWN")
+    print(f"{'═'*72}")
+    for cn in cnames:
+        r = all_results[cn]
+        print(f"\n  {cn}:")
+        header = f"    {'Seed':<6}"
+        for fi in range(5):
+            header += f" {'F'+str(fi+1):>8}"
+        header += f" {'Avg':>8}"
+        print(header)
+        print(f"    {'─'*52}")
+        for s in SEEDS:
+            row = f"    {s:<6}"
+            for mae in r['per_seed_folds'][str(s)]:
+                row += f" {mae:>8.2f}"
+            row += f" {r['per_seed_avgs'][s]:>8.2f}"
+            print(row)
+        print(f"    {'─'*52}")
+        row = f"    {'ENS':<6}"
+        for mae in r['folds']:
+            row += f" {mae:>8.2f}"
+        row += f" {r['avg']:>8.2f}"
+        print(row)
+    # Confidence stats
+    if all_conf_weights:
+        print(f"\n  Confidence Step Selection Summary:")
+        for cn, fw_list in all_conf_weights.items():
+            all_w = torch.cat(fw_list, dim=0)
+            avg_w = all_w.mean(dim=0)
+            peak_step = avg_w.argmax().item() + 1
+            avg_peak = all_w.argmax(dim=1).float().mean().item() + 1
+            print(f"    {cn}: avg peak step={avg_peak:.1f}, "
+                  f"population peak=step {peak_step}")
+    print()
+    generate_plots(all_results, all_hists, all_conf_weights)
+    save_summary(all_results, all_hists, all_conf_weights, tt)
+    return all_results
+# ══════════════════════════════════════════════════════════════════════
+# 6. PLOTS
+# ══════════════════════════════════════════════════════════════════════
+PAL = {'V13A-2xSA-StdDS': '#1565C0', 'V13B-2xSA-ConfDS': '#E65100'}
+def generate_plots(all_results, all_hists, all_conf_weights):
+    names = list(all_results.keys())
+    avgs = [all_results[n]['avg'] for n in names]
+    stds = [all_results[n]['std'] for n in names]
+    cols = [PAL.get(n, '#888') for n in names]
+    fig = plt.figure(figsize=(22, 18))
+    gs = gridspec.GridSpec(2, 2, figure=fig, hspace=0.35, wspace=0.30)
+    # ── Plot 1: Bar chart vs baselines ────────────────────────────────
+    ax1 = fig.add_subplot(gs[0, 0])
+    # Show both ensemble and best-single-seed bars
+    x_pos = np.arange(len(names))
+    w = 0.35
+    ens_bars = ax1.bar(x_pos - w/2, avgs, w, yerr=stds, capsize=6,
+                       color=cols, alpha=0.88, edgecolor='white',
+                       linewidth=1.5, label='Ensemble')
+    best_singles = [all_results[n]['best_single_mae'] for n in names]
+    single_bars = ax1.bar(x_pos + w/2, best_singles, w, capsize=6,
+                          color=cols, alpha=0.45, edgecolor='white',
+                          linewidth=1.5, label='Best Single Seed',
+                          hatch='//')
+    for bv, c, ls, lb in [
+        (87.76, '#F57F17', '--', 'MODNet (87.76)'),
+        (95.99, '#4CAF50', '-.', 'V12A (95.99)'),
+        (102.30, '#9E9E9E', '-.', 'V11B (102.30)'),
+        (103.51, '#B0BEC5', ':', 'RF-SCM (103.51)'),
+        (107.32, '#FF9800', ':', 'CrabNet (107.32)'),
+    ]:
+        ax1.axhline(bv, color=c, linestyle=ls, linewidth=1.8, label=lb, alpha=0.85)
+    for bar, m, s in zip(ens_bars, avgs, stds):
+        ax1.text(bar.get_x()+bar.get_width()/2, bar.get_height()+s+1,
+                 f'{m:.1f}', ha='center', fontsize=11, fontweight='bold')
+    for bar, m in zip(single_bars, best_singles):
+        ax1.text(bar.get_x()+bar.get_width()/2, bar.get_height()+1,
+                 f'{m:.1f}', ha='center', fontsize=9, fontstyle='italic',
+                 alpha=0.7)
+    ax1.set_xticks(x_pos)
+    ax1.set_xticklabels(names, fontsize=8)
+    ax1.legend(fontsize=6, loc='upper right')
+    ax1.set_ylabel('MAE (MPa)'); ax1.set_ylim(0, max(avgs)*1.6)
+    ax1.set_title('V13 Results vs Baselines (Ensemble + Best Single)',
+                  fontsize=11, fontweight='bold')
+    ax1.grid(axis='y', alpha=0.3)
+    # ── Plot 2: Per-fold grouped bars ─────────────────────────────────
+    ax2 = fig.add_subplot(gs[0, 1])
+    x = np.arange(1, 6)
+    w = 0.35
+    for i, (n, col) in enumerate(zip(names, cols)):
+        fold_vals = all_results[n]['folds']
+        ax2.bar(x + (i - 0.5) * w, fold_vals, w, color=col, alpha=0.8,
+                label=n + ' (ens)', edgecolor='white')
+    ax2.axhline(95.99, color='#4CAF50', ls='-.', lw=1.5, label='V12A (95.99)')
+    ax2.axhline(87.76, color='#F57F17', ls='--', lw=1.5, label='MODNet (87.76)')
+    ax2.set_xlabel('Fold'); ax2.set_ylabel('MAE (MPa)')
+    ax2.set_xticks(x); ax2.set_xticklabels([f'F{i}' for i in range(1,6)])
+    ax2.set_title('Per-Fold Ensemble Breakdown', fontweight='bold')
+    ax2.legend(fontsize=7); ax2.grid(axis='y', alpha=0.2)
+    # ── Plot 3: Training/Val loss curves ──────────────────────────────
+    ax3 = fig.add_subplot(gs[1, 0])
+    for cname, col in PAL.items():
+        if cname not in all_hists: continue
+        for fi, h in enumerate(all_hists[cname]):
+            lb_tr = f'{cname} train' if fi == 0 else None
+            lb_vl = f'{cname} val'   if fi == 0 else None
+            ax3.plot(h['train'], alpha=0.3, lw=0.8, color=col, label=lb_tr)
+            ax3.plot(h['val'],   alpha=0.7, lw=1.2, color=col, label=lb_vl,
+                     linestyle='--')
+    ax3.axhline(95.99, color='#4CAF50', ls='-.', lw=1.2, label='V12A (95.99)')
+    ax3.axvline(200, color='#4CAF50', ls='--', lw=1.2, alpha=0.6, label='SWA start')
+    ax3.set_xlabel('Epoch'); ax3.set_ylabel('MAE (MPa)')
+    ax3.set_title('Training Curves (seed 0, all folds)', fontweight='bold')
+    ax3.legend(fontsize=6, ncol=2); ax3.grid(alpha=0.2)
+    ax3.set_ylim(0, 300)
+    # ── Plot 4: Per-seed scatter / Confidence ─────────────────────────
+    ax4 = fig.add_subplot(gs[1, 1])
+    if all_conf_weights:
+        for cn, fw_list in all_conf_weights.items():
+            all_w = torch.cat(fw_list, dim=0)
+            avg_w = all_w.mean(dim=0).numpy()
+            steps = np.arange(1, len(avg_w)+1)
+            ax4.bar(steps, avg_w, color=PAL.get(cn, '#E65100'), alpha=0.8,
+                    label=f'{cn} avg confidence', edgecolor='white')
+            std_w = all_w.std(dim=0).numpy()
+            ax4.errorbar(steps, avg_w, yerr=std_w, fmt='none',
+                        ecolor='#333', capsize=2, alpha=0.5)
+        ax4.set_xlabel('Recursion Step')
+        ax4.set_ylabel('Confidence Weight (softmax)')
+        ax4.set_title('V13B: Where the Model Trusts Its Predictions',
+                      fontweight='bold')
+        ax4.legend(fontsize=8)
+        ax4.grid(axis='y', alpha=0.2)
+    else:
+        # Show per-seed MAE scatter if no confidence model
+        for i, (cn, col) in enumerate(zip(names, cols)):
+            r = all_results[cn]
+            seed_avgs = [r['per_seed_avgs'][s] for s in SEEDS]
+            ax4.scatter(SEEDS, seed_avgs, s=80, c=col, alpha=0.8,
+                       label=f'{cn} per-seed', zorder=5,
+                       edgecolors='white', linewidth=1)
+            ax4.axhline(r['avg'], color=col, ls='--', lw=1.5, alpha=0.6,
+                       label=f'{cn} ensemble={r["avg"]:.2f}')
+        ax4.axhline(95.99, color='#4CAF50', ls=':', lw=1, alpha=0.5, label='V12A')
+        ax4.set_xlabel('Random Seed')
+        ax4.set_ylabel('5-Fold Avg MAE (MPa)')
+        ax4.set_title('Per-Seed vs Ensemble Performance', fontweight='bold')
+        ax4.legend(fontsize=7); ax4.grid(alpha=0.2)
+    fig.suptitle('TRM-MatSci V13 │ 2-Layer SA + Multi-Seed Ensemble │ matbench_steels',
+                 fontsize=14, fontweight='bold', y=1.01)
+    fig.savefig('trm_results_v13.png', dpi=150, bbox_inches='tight')
+    plt.close(fig); log.info("✓ Saved: trm_results_v13.png")
+def save_summary(all_results, all_hists, all_conf_weights, total_s):
+    # Prepare confidence info
+    conf_info = {}
+    for cn, fw_list in all_conf_weights.items():
+        all_w = torch.cat(fw_list, dim=0)
+        conf_info[cn] = {
+            'avg_weights': all_w.mean(dim=0).numpy().round(4).tolist(),
+            'avg_peak_step': float(all_w.argmax(dim=1).float().mean().item() + 1),
+        }
+    s = {
+        'version': 'V13', 'task': 'matbench_steels',
+        'strategy': '2-Layer SA + Multi-Seed Ensemble',
+        'seeds': SEEDS,
+        'n_seeds': N_SEEDS,
+        'total_min': round(total_s/60, 1),
+        'models': {},
+        'confidence': conf_info,
+    }
+    for n, r in all_results.items():
+        s['models'][n] = {
+            'ensemble_avg': round(r['avg'], 4),
+            'ensemble_std': round(r['std'], 4),
+            'ensemble_folds': [round(x, 4) for x in r['folds']],
+            'params': r['params'],
+            'best_single_seed': r['best_single_seed'],
+            'best_single_mae': round(r['best_single_mae'], 4),
+            'per_seed_avgs': {str(k): round(v, 4) for k, v in r['per_seed_avgs'].items()},
+        }
+    with open('trm_models_v13/summary_v13.json', 'w') as f:
+        json.dump(s, f, indent=2, default=str)
+    log.info("✓ Saved: summary_v13.json")
+if __name__ == '__main__':
+    results = run_benchmark()
+    shutil.make_archive("trm_v13_all", "zip", "trm_models_v13")
+    log.info("✓ Created trm_v13_all.zip")

requirements.txt ADDED Viewed

	@@ -0,0 +1,10 @@

+torch>=2.0
+pymatgen>=2024.1.1
+matminer>=0.9.0
+gensim>=4.0.0
+scikit-learn>=1.3.0
+numpy>=1.24.0
+pandas>=2.0.0
+tqdm>=4.65.0
+huggingface_hub>=0.20.0
+gradio>=4.0.0

weights/README.md ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d2a5bec16529a25e4d500eea32ec1c9aaff2d12b3a014220f4c0303a75fffa04
+size 1165

weights/expt_gap/weights.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:2f4658f262e0f3501e5716c35184fbcc86a4bc28765fbcbcc34756ce1ebf0976
+size 2111183

weights/glass/weights.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:f6f173c5a305bcee0ec837e7b6a58802b9f88b3745349913721757dc7d1e2c77
+size 966543

weights/is_metal/weights.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:06ed5d20f532f9193aed736f92cb94a7b181ea6b347e959dc7612100f3ff073c
+size 970383

weights/jdft2d/weights.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:21d3e4c4728e18e473b4860d81bec77a2a1633540f89c35160811ec9625c4569
+size 1598799

weights/phonons/weights.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:47fe2ab26addf64bfc1e78c6f6e9b02e408ee290b4f51fb91d85d5b270c51193
+size 6170267

weights/steels/weights.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:64ee164db899d44365bea3a67ef258d7e122144d4088357dd56af10a1c0af838
+size 4574159