kaurm43 commited on
Commit
1aea056
Β·
verified Β·
1 Parent(s): de7a31c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +242 -0
README.md CHANGED
@@ -4,3 +4,245 @@ sdk: gradio
4
  python_version: "3.11"
5
  app_file: PolyAgent/gradio_interface.py
6
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  python_version: "3.11"
5
  app_file: PolyAgent/gradio_interface.py
6
  ---
7
+
8
+ # PolyFusionAgent: a multimodal foundation model and autonomous AI assistant for polymer property prediction and inverse design
9
+
10
+ **PolyFusionAgent** is an interactive framework that couples a **multimodal polymer foundation model (PolyFusion)** with a **tool-augmented, literature-grounded design agent (PolyAgent)** for polymer property prediction, inverse design, and evidence-linked scientific reasoning.
11
+
12
+ > **PolyFusion** aligns complementary polymer viewsβ€”**PSMILES sequence**, **2D topology**, **3D structural proxies**, and **chemical fingerprints**β€”into a shared latent space that transfers across chemistries and data regimes.
13
+ > **PolyAgent** closes the design loop by connecting **prediction + generation + retrieval + visualization** so recommendations are contextualized with explicit supporting precedent.
14
+
15
+ ## Links
16
+ - **Live Space:** [kaurm43/PolyFusionAgent](https://huggingface.co/spaces/kaurm43/PolyFusionAgent)
17
+ - **Weights repo:** [kaurm43/polyfusionagent-weights](https://huggingface.co/kaurm43/polyfusionagent-weights)
18
+ - **Weights file browser:** [weights/tree/main](https://huggingface.co/kaurm43/polyfusionagent-weights/tree/main)
19
+
20
+ ---
21
+
22
+ ## Authors & Affiliation
23
+
24
+ **Manpreet Kaur**ΒΉ, **Qian Liu**ΒΉ*
25
+ ΒΉ Department of Applied Computer Science, The University of Winnipeg, Winnipeg, MB, Canada
26
+
27
+ ### Contact
28
+ - **Qian Liu** β€” qi.liu@uwinnipeg.ca
29
+
30
+ ---
31
+
32
+ ## Abstract
33
+
34
+ Polymers underpin technologies from energy storage to biomedicine, yet discovery remains constrained by an astronomically large design space and fragmented representations of polymer structure, properties, and prior knowledge. Although machine learning has advanced property prediction and candidate generation, most models remain disconnected from the physical and experimental context needed for actionable materials design.
35
+
36
+ Here we introduce **PolyFusionAgent**, an interactive framework that couples a multimodal polymer foundation model (**PolyFusion**) with a tool-augmented, literature-grounded design agent (**PolyAgent**). PolyFusion aligns complementary polymer viewsβ€”sequence, topology, three-dimensional structural proxies, and chemical fingerprintsβ€”across millions of polymers to learn a shared latent space that transfers across chemistries and data regimes. Using this unified representation, PolyFusion improves prediction of key thermophysical properties and enables property-conditioned generation of chemically valid, structurally novel polymers that extend beyond the reference design space.
37
+
38
+ PolyAgent closes the design loop by coupling prediction and inverse design to evidence retrieval from the polymer literature, so that hypotheses are proposed, evaluated, and contextualized with explicit supporting precedent in a single workflow. Together, **PolyFusionAgent** establishes a route toward interactive, evidence-linked polymer discovery that combines large-scale representation learning, multimodal chemical knowledge, and verifiable scientific reasoning.
39
+
40
+ ---
41
+
42
+ ## Repository Structure
43
+
44
+ ```text
45
+ .
46
+ β”œβ”€β”€ PolyAgent/
47
+ β”‚ β”œβ”€β”€ gradio_interface.py # Gradio UI (Console / Tools / Other LLMs)
48
+ β”‚ β”œβ”€β”€ orchestrator.py # Controller: planning + tool registry + execution
49
+ β”‚ └── rag_pipeline.py # Local KB + web retrieval + PDF ingestion utilities
50
+ β”œβ”€β”€ PolyFusion/
51
+ β”‚ β”œβ”€β”€ CL.py # Multimodal contrastive learning utilities
52
+ β”‚ β”œβ”€β”€ DeBERTav2.py # PSMILES encoder wrapper (HF Transformers)
53
+ β”‚ β”œβ”€β”€ GINE.py # 2D graph encoder (PyTorch Geometric)
54
+ β”‚ β”œβ”€β”€ SchNet.py # 3D geometry encoder (PyTorch Geometric SchNet)
55
+ β”‚ └── Transformer.py # Fingerprint transformer encoder
56
+ β”œβ”€β”€ Downstream Tasks/
57
+ β”‚ β”œβ”€β”€ Polymer_Generation.py # Inverse design / generation utilities
58
+ β”‚ └── Property_Prediction.py # Property prediction utilities
59
+ β”œβ”€β”€ Data_Modalities.py # CSVβ†’multimodal extraction (2D graph / 3D geometry / fingerprints) + wildcard handling
60
+ β”œβ”€β”€ requirements.txt
61
+ └── README.md
62
+ ```
63
+
64
+ ## What PolyFusionAgent can do
65
+
66
+ ### PolyFusion
67
+
68
+ #### 1) Multimodal extraction from PSMILES (Data_Modalities.py)
69
+ - Builds PSMILES sequence inputs for the language encoder
70
+ - Constructs RDKit-based 2D atom/bond graphs (node/edge features + connectivity)
71
+ - Generates ETKDG 3D conformer proxies with force-field relaxation fallback
72
+ - Computes Morgan (ECFP-style) fingerprints (fixed-length, radius-configurable)
73
+ - Wildcard handling: attachment points [*] are mapped to a rare marker (e.g., [At]) for stable RDKit featurization, then converted back for display/generation outputs
74
+
75
+ #### 2) Multimodal foundation embedding (PolyFusion/*)
76
+ Encoders per modality:
77
+ - PSMILES Transformer (PolyFusion/DeBERTav2.py)
78
+ - GINE for 2D graphs (PolyFusion/GINE.py)
79
+ - SchNet for 3D geometry (PolyFusion/SchNet.py)
80
+ - Fingerprint Transformer (PolyFusion/Transformer.py)
81
+
82
+ - Projects each modality into a shared, unit-normalized latent space
83
+ - Uses contrastive alignment where a fused structural anchor (PSMILES + 2D + 3D) is aligned with a fingerprint target (PolyFusion/CL.py)
84
+
85
+ ### Downstream Tasks
86
+
87
+ #### 3) Forward property prediction (structure β†’ properties) (Downstream Tasks/Property_Prediction.py)
88
+ - Lightweight regressors on top of PolyFusion embeddings for thermophysical property prediction
89
+ - Returns predictions in original units (with standardization handled internally)
90
+
91
+ #### 4) Inverse design / polymer generation (targets β†’ candidates) (Downstream Tasks/Polymer_Generation.py)
92
+ - Property-conditioned candidate generation using PolyFusion embeddings as the conditioning interface
93
+ - Supports optional seeding/biasing (e.g., start from a reference polymer family)
94
+ - Produces candidate lists suitable for generate β†’ filter β†’ validate workflows
95
+
96
+ ### PolyAgent
97
+
98
+ #### Goal
99
+ Convert open-ended polymer design prompts into grounded, constraint-consistent, evidence-linked outputs by coupling PolyFusion with tool-mediated verification and retrieval.
100
+
101
+ #### What PolyAgent does (system-level)
102
+ - Decomposes a user request into typed sub-tasks (prediction, generation, retrieval, visualization)
103
+ - Calls tools for prediction, inverse design, retrieval (local RAG + web), and visualization
104
+ - Returns a final response with explicit evidence/citations and an experiment-ready validation plan
105
+
106
+ #### Main files
107
+ - PolyAgent/orchestrator.py β€” planning + tool routing (controller)
108
+ - PolyAgent/rag_pipeline.py β€” local retrieval utilities (PDF β†’ chunks β†’ embeddings β†’ vector store)
109
+ - PolyAgent/gradio_interface.py β€” Gradio UI entrypoint
110
+
111
+ ---
112
+
113
+ ## Running on Hugging Face Spaces
114
+
115
+ This repository is configured as a Gradio Space via the YAML header at the top of this README.
116
+
117
+ Entry point: app_file: PolyAgent/gradio_interface.py
118
+
119
+ Space URL: kaurm43/PolyFusionAgent
120
+
121
+ ---
122
+
123
+ ## Model weights and artifacts
124
+
125
+ The orchestrator downloads required artifacts (tokenizers, pretrained encoders, downstream heads, inverse-design models) from a Hugging Face model repo at runtime using snapshot_download.
126
+
127
+ ### Default weights repo
128
+ By default, the Space expects:
129
+
130
+ POLYFUSION_WEIGHTS_REPO=kaurm43/polyfusionagent-weights
131
+
132
+ POLYFUSION_WEIGHTS_REPO_TYPE=model
133
+
134
+ Weights repo: kaurm43/polyfusionagent-weights
135
+
136
+ Weights files: weights/tree/main
137
+
138
+ ### Override via environment variables (local or Space secrets)
139
+ POLYFUSION_WEIGHTS_REPO=your-org/your-weights-repo
140
+ POLYFUSION_WEIGHTS_REPO_TYPE=model
141
+ POLYFUSION_WEIGHTS_DIR=/path/to/cache
142
+ HF_TOKEN=... # only if private
143
+
144
+ ### Expected weights layout (inside the weights repo)
145
+ The orchestrator expects these folders/files:
146
+
147
+ - tokenizer_spm_5m/**
148
+ - polyfusion_cl_5m/**
149
+ - downstream_heads_5m/**
150
+ - inverse_design_5m/**
151
+ - MANIFEST.txt
152
+
153
+ If you are building your own weights repo, mirror this structure.
154
+
155
+ ---
156
+
157
+ ## Local knowledge base (RAG)
158
+
159
+ ### Chroma DB path
160
+ The orchestrator defaults to a folder path (relative or absolute):
161
+
162
+ CHROMA_DB_PATH=chroma_polymer_db_big
163
+
164
+ ### Options
165
+ - Ship a Chroma DB folder in this repo (good for small KBs)
166
+ - Host a KB as a separate dataset/model repo and download it similarly to weights
167
+
168
+ ---
169
+
170
+ ## Local Quickstart (optional)
171
+
172
+ ### 1) Create environment
173
+ ```bash
174
+ python -m venv .venv
175
+ # Windows:
176
+ # .venv\Scripts\activate
177
+ # macOS/Linux:
178
+ source .venv/bin/activate
179
+ python -m pip install --upgrade pip
180
+ ```
181
+
182
+ ### 2) Install dependencies
183
+ ```bash
184
+ pip install -r requirements.txt
185
+ ```
186
+
187
+ ### 3) Run the Gradio app
188
+ ```bash
189
+ python PolyAgent/gradio_interface.py
190
+ ```
191
+
192
+ ---
193
+
194
+ ## Configuration (optional)
195
+
196
+ ### Common environment variables
197
+ ```bash
198
+ # Weights / cache
199
+ POLYFUSION_WEIGHTS_REPO=kaurm43/polyfusionagent-weights
200
+ POLYFUSION_WEIGHTS_REPO_TYPE=model
201
+ POLYFUSION_WEIGHTS_DIR=.cache/polyfusion_weights
202
+
203
+ # Retrieval DB path
204
+ CHROMA_DB_PATH=chroma_polymer_db_big
205
+
206
+ # - If someone forks/runs locally and needs these APIs, they could create their OWN keys and set them as
207
+ # environment variables (or add them in their own Space Settings β†’ Secrets).
208
+
209
+ # Hugging Face token:
210
+ # In a Hugging Face Space: Settings β†’ Secrets β†’ add key "HF_TOKEN" with your token value.
211
+ HF_TOKEN=hf_...
212
+
213
+ # OpenAI credentials:
214
+ # Create your own key, then set it as an environment variable or Space secret.
215
+ OPENAI_API_KEY=sk-...
216
+ OPENAI_MODEL=gpt-4.1
217
+ ```
218
+ ## Reproducibility
219
+
220
+ This repo separates **(1) representation learning**, **(2) downstream tasks**, and **(3) the interactive agent/UI** so results can be reproduced end-to-end.
221
+
222
+ ### What to run (in execution order)
223
+
224
+ - **Data extraction / featurization**
225
+ - Code: `Data_Modalities.py`
226
+
227
+ - **PolyFusion pretraining (multimodal contrastive learning)**
228
+ - Individual Encoders:
229
+ - `PolyFusion/DeBERTav2.py`
230
+ - `PolyFusion/GINE.py`
231
+ - `PolyFusion/SchNet.py`
232
+ - `PolyFusion/Transformer.py`
233
+ - Code: `PolyFusion/CL.py`
234
+
235
+ - **Downstream evaluation (property prediction)**
236
+ - Code: `Downstream Tasks/Property_Prediction.py`
237
+
238
+ - **Inverse design (property-conditioned generation)**
239
+ - Code: `Downstream Tasks/Polymer_Generation.py`
240
+
241
+ - **Agent + UI (PolyAgent + Gradio Space)**
242
+ - Retrieval utilities: `PolyAgent/rag_pipeline.py`
243
+ - Controller / tool routing: `PolyAgent/orchestrator.py`
244
+ - Entry point: `PolyAgent/gradio_interface.py`
245
+
246
+ ### Weights / artifacts (exact versions)
247
+ All pretrained checkpoints, tokenizers, and downstream heads are stored in the weights repository:
248
+ - https://huggingface.co/kaurm43/polyfusionagent-weights