Spaces:
Running
Running
manpreet88
commited on
Commit
·
cd8041c
1
Parent(s):
c03986e
Update README.md
Browse files
README.md
CHANGED
|
@@ -1 +1,380 @@
|
|
|
|
|
| 1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# PolyFusionAgent: a multimodal foundation model and an autonomous AI assistant for polymer informatics
|
| 2 |
|
| 3 |
+
**PolyFusionAgent** is an interactive framework that couples a **multimodal polymer foundation model (PolyFusion)** with a **tool-augmented, literature-grounded design agent (PolyAgent)** for polymer property prediction, inverse design, and evidence-linked scientific reasoning.
|
| 4 |
+
|
| 5 |
+
> **PolyFusion** aligns complementary polymer views—**PSMILES sequence**, **2D topology**, **3D structural proxies**, and **chemical fingerprints**—into a shared latent space that transfers across chemistries and data regimes.
|
| 6 |
+
> **PolyAgent** closes the design loop by connecting **prediction + generation + retrieval + visualization** so recommendations are contextualized with explicit supporting precedent.
|
| 7 |
+
|
| 8 |
+
---
|
| 9 |
+
|
| 10 |
+
## Authors & Affiliation
|
| 11 |
+
|
| 12 |
+
**Manpreet Kaur**¹, **Qian Liu**¹\*
|
| 13 |
+
|
| 14 |
+
¹ Department of Applied Computer Science, The University of Winnipeg, Winnipeg, MB, Canada
|
| 15 |
+
|
| 16 |
+
### Contact
|
| 17 |
+
- **Qian Liu** — qi.liu@uwinnipeg.ca
|
| 18 |
+
|
| 19 |
+
---
|
| 20 |
+
|
| 21 |
+
## Abstract
|
| 22 |
+
|
| 23 |
+
Polymers underpin technologies from energy storage to biomedicine, yet discovery remains constrained by an astronomically large design space and fragmented representations of polymer structure, properties, and prior knowledge. Although machine learning has advanced property prediction and candidate generation, most models remain disconnected from the physical and experimental context needed for actionable materials design.
|
| 24 |
+
|
| 25 |
+
Here we introduce **PolyFusionAgent**, an interactive framework that couples a multimodal polymer foundation model (**PolyFusion**) with a tool-augmented, literature-grounded design agent (**PolyAgent**). PolyFusion aligns complementary polymer views—sequence, topology, three-dimensional structural proxies, and chemical fingerprints—across millions of polymers to learn a shared latent space that transfers across chemistries and data regimes. Using this unified representation, PolyFusion improves prediction of key thermophysical properties and enables property-conditioned generation of chemically valid, structurally novel polymers that extend beyond the reference design space.
|
| 26 |
+
|
| 27 |
+
PolyAgent closes the design loop by coupling prediction and inverse design to evidence retrieval from the polymer literature, so that hypotheses are proposed, evaluated, and contextualized with explicit supporting precedent in a single workflow. Together, **PolyFusionAgent** establishes a route toward interactive, evidence-linked polymer discovery that combines large-scale representation learning, multimodal chemical knowledge, and verifiable scientific reasoning.
|
| 28 |
+
|
| 29 |
+
---
|
| 30 |
+
|
| 31 |
+
<p align="center">
|
| 32 |
+
<img src="assets/PolyFusionAgent_overview.png" alt="PolyFusionAgent overview" width="850"/>
|
| 33 |
+
</p>
|
| 34 |
+
|
| 35 |
+
## Contents
|
| 36 |
+
|
| 37 |
+
- [1. Repository Overview](#1-repository-overview)
|
| 38 |
+
- [2. Dependencies & Environment](#2-dependencies--environment)
|
| 39 |
+
- [2.1 Installation](#21-installation)
|
| 40 |
+
- [2.2 Optional Chemistry & GPU Notes](#22-optional-chemistry--gpu-notes)
|
| 41 |
+
- [3. Data, Modalities, and Preprocessing](#3-data-modalities-and-preprocessing)
|
| 42 |
+
- [3.1 Input CSV schema](#31-input-csv-schema)
|
| 43 |
+
- [3.2 Generate multimodal columns (graph/geometry/fingerprints)](#32-generate-multimodal-columns-graphgeometryfingerprints)
|
| 44 |
+
- [3.3 What “graph”, “geometry”, and “fingerprints” look like](#33-what-graph-geometry-and-fingerprints-look-like)
|
| 45 |
+
- [4. Models & Artifacts](#4-models--artifacts)
|
| 46 |
+
- [5. Running the Code](#5-running-the-code)
|
| 47 |
+
- [5.1 Multimodal contrastive pretraining (PolyFusion)](#51-multimodal-contrastive-pretraining-polyfusion)
|
| 48 |
+
- [5.2 Downstream property prediction](#52-downstream-property-prediction)
|
| 49 |
+
- [5.3 Inverse design / polymer generation](#53-inverse-design--polymer-generation)
|
| 50 |
+
- [5.4 PolyAgent (Gradio UI)](#54-polyagent-gradio-ui)
|
| 51 |
+
- [6. Results & Reproducibility](#6-results--reproducibility)
|
| 52 |
+
- [7. Citation](#7-citation)
|
| 53 |
+
- [8. Contact](#8-contact)
|
| 54 |
+
- [9. License & Disclaimer](#9-license--disclaimer)
|
| 55 |
+
|
| 56 |
+
---
|
| 57 |
+
|
| 58 |
+
## 1. Repository Overview
|
| 59 |
+
|
| 60 |
+
This repository contains three major components:
|
| 61 |
+
|
| 62 |
+
### **(A) PolyFusion** — multimodal polymer foundation model
|
| 63 |
+
PolyFusion learns a shared embedding space by aligning polymer modalities with **multimodal contrastive learning**:
|
| 64 |
+
- **PSMILES encoder**: DeBERTaV2-style sequence encoder (`PolyFusion/DeBERTav2.py`)
|
| 65 |
+
- **2D graph encoder**: GINE (Graph Isomorphism Network w/ edge features) (`PolyFusion/GINE.py`)
|
| 66 |
+
- **3D proxy encoder**: SchNet (`PolyFusion/SchNet.py`)
|
| 67 |
+
- **Fingerprint encoder**: Transformer encoder for Morgan bits (`PolyFusion/Transformer.py`)
|
| 68 |
+
- **Pretraining script**: `PolyFusion/CL.py`
|
| 69 |
+
|
| 70 |
+
### **(B) Downstream Tasks** — prediction + inverse design
|
| 71 |
+
- **Property prediction** (multi-property evaluation with per-property CV): `Downstream Tasks/Property_Prediction.py`
|
| 72 |
+
- **Inverse design / generation** (property-conditioned generation using SELFIES-TED decoding + latent guidance): `Downstream Tasks/Polymer_Generation.py`
|
| 73 |
+
|
| 74 |
+
### **(C) PolyAgent** — tool-augmented design assistant
|
| 75 |
+
A modular orchestrator that can:
|
| 76 |
+
- extract multimodal polymer data
|
| 77 |
+
- encode PolyFusion embeddings
|
| 78 |
+
- predict properties using best downstream heads
|
| 79 |
+
- generate candidates via an inverse-design generator
|
| 80 |
+
- retrieve literature via local RAG + web
|
| 81 |
+
- visualize polymer renderings and explainability maps
|
| 82 |
+
- compose a grounded, citation-linked final response
|
| 83 |
+
|
| 84 |
+
Files:
|
| 85 |
+
- `PolyAgent/orchestrator.py`
|
| 86 |
+
- `PolyAgent/rag_pipeline.py`
|
| 87 |
+
- `PolyAgent/gradio_interface.py`
|
| 88 |
+
|
| 89 |
+
---
|
| 90 |
+
|
| 91 |
+
## 2. Dependencies & Environment
|
| 92 |
+
|
| 93 |
+
### 2.1 Installation
|
| 94 |
+
|
| 95 |
+
```bash
|
| 96 |
+
git clone https://github.com/manpreet88/PolyFusionAgent.git
|
| 97 |
+
cd PolyFusionAgent
|
| 98 |
+
|
| 99 |
+
# Recommended: create a fresh environment (conda or venv), then:
|
| 100 |
+
pip install -r requirements.txt
|
| 101 |
+
2.2 Optional Chemistry & GPU Notes
|
| 102 |
+
RDKit (recommended)
|
| 103 |
+
Data_Modalities.py and many optional visual/validation steps in generation/agent workflows work best with RDKit.
|
| 104 |
+
Recommended installation:
|
| 105 |
+
|
| 106 |
+
conda install -c conda-forge rdkit
|
| 107 |
+
GPU (recommended for training & large runs)
|
| 108 |
+
PyTorch + CUDA should match your GPU driver. If you use torch-geometric, install it following the official wheels for your CUDA/PyTorch build.
|
| 109 |
+
|
| 110 |
+
3. Data, Modalities, and Preprocessing
|
| 111 |
+
3.1 Input CSV schema
|
| 112 |
+
At minimum, your dataset CSV should include a polymer string column:
|
| 113 |
+
|
| 114 |
+
psmiles (required): polymer SMILES / PSMILES string (often contains [*] endpoints)
|
| 115 |
+
|
| 116 |
+
Optional:
|
| 117 |
+
|
| 118 |
+
source (optional): any identifier/source tag
|
| 119 |
+
|
| 120 |
+
property columns (optional): e.g., density, Tg, Tm, Td, etc. (names vary—see downstream scripts’ column matching)
|
| 121 |
+
|
| 122 |
+
Example:
|
| 123 |
+
|
| 124 |
+
psmiles,source,density,glass transition,melting,thermal decomposition
|
| 125 |
+
[*]CC(=O)OCCO[*],PI1M,1.21,55,155,350
|
| 126 |
+
...
|
| 127 |
+
Wildcard handling: this code replaces * (atomicNum 0) with Astatine (At, Z=85) internally for RDKit robustness, while preserving endpoint semantics.
|
| 128 |
+
|
| 129 |
+
3.2 Generate multimodal columns (graph/geometry/fingerprints)
|
| 130 |
+
Use Data_Modalities.py to process a CSV and append JSON blobs for:
|
| 131 |
+
|
| 132 |
+
graph
|
| 133 |
+
|
| 134 |
+
geometry
|
| 135 |
+
|
| 136 |
+
fingerprints
|
| 137 |
+
|
| 138 |
+
python Data_Modalities.py \
|
| 139 |
+
--csv_file /path/to/your/polymers.csv \
|
| 140 |
+
--chunk_size 1000 \
|
| 141 |
+
--num_workers 24
|
| 142 |
+
Outputs:
|
| 143 |
+
|
| 144 |
+
/path/to/your/polymers_processed.csv (same rows + new modality columns)
|
| 145 |
+
|
| 146 |
+
/path/to/your/polymers_failures.jsonl (failures with index/smiles/error)
|
| 147 |
+
|
| 148 |
+
3.3 What “graph”, “geometry”, and “fingerprints” look like
|
| 149 |
+
Each processed row stores modalities as JSON strings.
|
| 150 |
+
|
| 151 |
+
graph contains:
|
| 152 |
+
|
| 153 |
+
node_features: atomic_num, degree, formal_charge, hybridization, aromatic/ring flags, chirality, etc.
|
| 154 |
+
|
| 155 |
+
edge_indices + edge_features (bond_type, stereo, conjugation, etc.)
|
| 156 |
+
|
| 157 |
+
adjacency_matrix
|
| 158 |
+
|
| 159 |
+
graph_features (MolWt, LogP, TPSA, rings, rotatable bonds, HBA/HBD, ...)
|
| 160 |
+
|
| 161 |
+
geometry contains:
|
| 162 |
+
|
| 163 |
+
ETKDG-generated conformers, optimized via MMFF/UFF (best energy chosen)
|
| 164 |
+
|
| 165 |
+
best_conformer: atomic_numbers + coordinates + energy + optional 3D descriptors
|
| 166 |
+
|
| 167 |
+
falls back to 2D coords if 3D fails
|
| 168 |
+
|
| 169 |
+
fingerprints contains:
|
| 170 |
+
|
| 171 |
+
Morgan fingerprints (bitstrings + counts) for radii up to 3 (default)
|
| 172 |
+
|
| 173 |
+
e.g., morgan_r3_bits, morgan_r3_counts, plus smaller radii
|
| 174 |
+
|
| 175 |
+
4. Models & Artifacts
|
| 176 |
+
This repo is organized so you can train and export artifacts for:
|
| 177 |
+
|
| 178 |
+
PolyFusion (pretraining)
|
| 179 |
+
multimodal CL checkpoint bundle (e.g., multimodal_output/best/...)
|
| 180 |
+
|
| 181 |
+
unimodal encoder checkpoints (optional, used by some scripts)
|
| 182 |
+
|
| 183 |
+
Downstream (best weights per property)
|
| 184 |
+
saved best checkpoint per property (CV selection)
|
| 185 |
+
|
| 186 |
+
directory example: multimodal_downstream_bestweights/...
|
| 187 |
+
|
| 188 |
+
Inverse design generator artifacts
|
| 189 |
+
decoder bundles + scalers + (optionally) SentencePiece tokenizer assets
|
| 190 |
+
|
| 191 |
+
directory example: multimodal_inverse_design_output/.../best_models
|
| 192 |
+
|
| 193 |
+
Important: Several scripts include placeholder paths at the top (e.g., /path/to/...). You must update them for your filesystem.
|
| 194 |
+
|
| 195 |
+
5. Running the Code
|
| 196 |
+
5.1 Multimodal contrastive pretraining (PolyFusion)
|
| 197 |
+
Main entry:
|
| 198 |
+
|
| 199 |
+
PolyFusion/CL.py
|
| 200 |
+
|
| 201 |
+
What it does (high-level):
|
| 202 |
+
|
| 203 |
+
Streams a large CSV (CSV_PATH) and writes per-sample .pt files to avoid RAM spikes.
|
| 204 |
+
|
| 205 |
+
Encodes polymer modalities with DeBERTaV2 (PSMILES), GINE (2D), SchNet (3D), Transformer (fingerprints).
|
| 206 |
+
|
| 207 |
+
Projects each modality embedding into a shared space.
|
| 208 |
+
|
| 209 |
+
Trains with contrastive alignment (InfoNCE) + optional reconstruction objectives.
|
| 210 |
+
|
| 211 |
+
Steps
|
| 212 |
+
|
| 213 |
+
Edit path placeholders in PolyFusion/CL.py, e.g.:
|
| 214 |
+
|
| 215 |
+
CSV_PATH
|
| 216 |
+
|
| 217 |
+
SPM_MODEL
|
| 218 |
+
|
| 219 |
+
PREPROC_DIR
|
| 220 |
+
|
| 221 |
+
OUTPUT_DIR and BEST_*_DIR locations (if used)
|
| 222 |
+
|
| 223 |
+
Run:
|
| 224 |
+
|
| 225 |
+
python PolyFusion/CL.py
|
| 226 |
+
Tip: Start with a smaller TARGET_ROWS (e.g., 100k) to validate pipeline correctness before scaling.
|
| 227 |
+
|
| 228 |
+
5.2 Downstream property prediction
|
| 229 |
+
Script:
|
| 230 |
+
|
| 231 |
+
Downstream Tasks/Property_Prediction.py
|
| 232 |
+
|
| 233 |
+
This script:
|
| 234 |
+
|
| 235 |
+
loads your dataset CSV with modalities (e.g., polyinfo_with_modalities.csv)
|
| 236 |
+
|
| 237 |
+
loads pretrained encoders / CL fused backbone
|
| 238 |
+
|
| 239 |
+
trains a fusion + regression head for each requested property
|
| 240 |
+
|
| 241 |
+
evaluates using true K-fold (NUM_RUNS = 5) and saves best weights
|
| 242 |
+
|
| 243 |
+
Steps
|
| 244 |
+
|
| 245 |
+
Update placeholders near the top of the script:
|
| 246 |
+
|
| 247 |
+
POLYINFO_PATH
|
| 248 |
+
|
| 249 |
+
PRETRAINED_MULTIMODAL_DIR
|
| 250 |
+
|
| 251 |
+
optional: BEST_*_DIR (if needed)
|
| 252 |
+
|
| 253 |
+
output paths: OUTPUT_RESULTS, BEST_WEIGHTS_DIR
|
| 254 |
+
|
| 255 |
+
Run:
|
| 256 |
+
|
| 257 |
+
python "Downstream Tasks/Property_Prediction.py"
|
| 258 |
+
Requested properties (default)
|
| 259 |
+
|
| 260 |
+
REQUESTED_PROPERTIES = [
|
| 261 |
+
"density",
|
| 262 |
+
"glass transition",
|
| 263 |
+
"melting",
|
| 264 |
+
"specific volume",
|
| 265 |
+
"thermal decomposition"
|
| 266 |
+
]
|
| 267 |
+
The script includes a robust column-matching function that tries to map these names to your dataframe’s actual column headers.
|
| 268 |
+
|
| 269 |
+
5.3 Inverse design / polymer generation
|
| 270 |
+
Script:
|
| 271 |
+
|
| 272 |
+
Downstream Tasks/Polymer_Generation.py
|
| 273 |
+
|
| 274 |
+
Core idea:
|
| 275 |
+
|
| 276 |
+
condition a SELFIES-TED-style decoder on PolyFusion embeddings,
|
| 277 |
+
|
| 278 |
+
guide sampling toward target property values (with optional latent noise and verification)
|
| 279 |
+
|
| 280 |
+
Steps
|
| 281 |
+
|
| 282 |
+
Update placeholders in the Config dataclass:
|
| 283 |
+
|
| 284 |
+
POLYINFO_PATH
|
| 285 |
+
|
| 286 |
+
pretrained weights directories (CL + downstream + tokenizer)
|
| 287 |
+
|
| 288 |
+
output directory OUTPUT_DIR
|
| 289 |
+
|
| 290 |
+
Run:
|
| 291 |
+
|
| 292 |
+
python "Downstream Tasks/Polymer_Generation.py"
|
| 293 |
+
Notes
|
| 294 |
+
|
| 295 |
+
If RDKit and SELFIES are installed, the script can:
|
| 296 |
+
|
| 297 |
+
validate chemistry constraints more robustly
|
| 298 |
+
|
| 299 |
+
convert polymer endpoints safely (e.g., [*] ↔ [At] internal representation)
|
| 300 |
+
|
| 301 |
+
5.4 PolyAgent (Gradio UI)
|
| 302 |
+
Files:
|
| 303 |
+
|
| 304 |
+
PolyAgent/orchestrator.py (core engine)
|
| 305 |
+
|
| 306 |
+
PolyAgent/gradio_interface.py (UI)
|
| 307 |
+
|
| 308 |
+
PolyAgent/rag_pipeline.py (local RAG utilities)
|
| 309 |
+
|
| 310 |
+
What you configure
|
| 311 |
+
In PolyAgent/orchestrator.py, update the PathsConfig placeholders, e.g.:
|
| 312 |
+
|
| 313 |
+
cl_weights_path
|
| 314 |
+
|
| 315 |
+
downstream_bestweights_5m_dir
|
| 316 |
+
|
| 317 |
+
inverse_design_5m_dir
|
| 318 |
+
|
| 319 |
+
spm_model_path, spm_vocab_path
|
| 320 |
+
|
| 321 |
+
chroma_db_path (if using local RAG store)
|
| 322 |
+
|
| 323 |
+
Environment variables
|
| 324 |
+
|
| 325 |
+
OPENAI_API_KEY (required for planning/composition)
|
| 326 |
+
|
| 327 |
+
Optional (improves retrieval coverage):
|
| 328 |
+
|
| 329 |
+
OPENAI_MODEL (defaults set in config)
|
| 330 |
+
|
| 331 |
+
HF_TOKEN (if pulling HF artifacts)
|
| 332 |
+
|
| 333 |
+
SPRINGER_NATURE_API_KEY, SEMANTIC_SCHOLAR_API_KEY
|
| 334 |
+
|
| 335 |
+
Run the UI
|
| 336 |
+
|
| 337 |
+
cd PolyAgent
|
| 338 |
+
python gradio_interface.py --server-name 0.0.0.0 --server-port 7860
|
| 339 |
+
Prompting tips
|
| 340 |
+
|
| 341 |
+
To trigger inverse design: include “generate” / “inverse design” and a target value:
|
| 342 |
+
|
| 343 |
+
target_value=60 or Tg 60
|
| 344 |
+
|
| 345 |
+
Provide a seed polymer pSMILES in a code block:
|
| 346 |
+
|
| 347 |
+
[*]CC(=O)OCCOCCOC(=O)C[*]
|
| 348 |
+
If you need more citations, ask explicitly:
|
| 349 |
+
|
| 350 |
+
“cite 10 papers”
|
| 351 |
+
|
| 352 |
+
6. Results & Reproducibility
|
| 353 |
+
PolyFusion is designed for scalable multimodal alignment across large polymer corpora.
|
| 354 |
+
|
| 355 |
+
Downstream scripts perform K-fold evaluation per property and save best weights.
|
| 356 |
+
|
| 357 |
+
PolyAgent produces evidence-linked answers with tool outputs and DOI-style links (when available).
|
| 358 |
+
|
| 359 |
+
Reproducibility reminder: Several scripts currently use in-file configuration constants (placeholders). For a clean workflow, keep a consistent folder layout for datasets and checkpoints and update paths in one place (or refactor into a shared config module).
|
| 360 |
+
|
| 361 |
+
7. Citation
|
| 362 |
+
If you use this repository in your work, please cite the accompanying manuscript:
|
| 363 |
+
|
| 364 |
+
@article{kaur2026polyfusionagent,
|
| 365 |
+
title = {PolyFusionAgent: a multimodal foundation model and autonomous AI assistant for polymer informatics},
|
| 366 |
+
author = {Kaur, Manpreet and Liu, Qian},
|
| 367 |
+
year = {2026},
|
| 368 |
+
note = {Manuscript / preprint},
|
| 369 |
+
}
|
| 370 |
+
Replace the BibTeX entry above with the final venue DOI/citation when available.
|
| 371 |
+
|
| 372 |
+
8. Contact
|
| 373 |
+
Corresponding author: Qian Liu — qi.liu@uwinnipeg.ca
|
| 374 |
+
|
| 375 |
+
Contributing author: Manpreet Kaur — kaur-m43@webmail.uwinnipeg.ca
|
| 376 |
+
|
| 377 |
+
9. License & Disclaimer
|
| 378 |
+
License: (Add your license file here; e.g., MIT / Apache-2.0 / CC BY-NC for models)
|
| 379 |
+
|
| 380 |
+
Disclaimer: This codebase is provided for research and development use. Polymer generation outputs and suggested candidates should be validated with domain expertise, safety constraints, and experimental verification before deployment.
|