File size: 5,245 Bytes
c71533a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
title: Polymer Discovery Platform
sdk: streamlit
python_version: "3.10"
app_file: Home.py
---

# Polymer Discovery Platform

An integrated Streamlit platform for polymer screening and candidate discovery. The application combines property lookup, machine-learning prediction, molecular visualization, multi-objective discovery, AI-assisted query translation, novel polymer SMILES generation, and export to an automated molecular dynamics workflow.

## What The Platform Does

The website is organized into eight modules:

- `Property Probe`: query a single polymer by SMILES or name and retrieve available database values with prediction fallback.
- `Batch Prediction`: run multi-property prediction for pasted, uploaded, or built-in polymer sets.
- `Molecular View`: render 2D and 3D molecular structures and export structure assets.
- `Discovery (Manual)`: perform explicit constraint-based and multi-objective polymer screening.
- `Discovery (AI)`: translate natural-language design requests into structured discovery settings with bring-your-own-key LLM support.
- `Novel SMILES Generation`: sample new polymer candidates with the pretrained RNN and filter against local datasets.
- `Literature Search`: search papers, stage evidence records, and review structured material-property extraction before promotion.
- `Feedback`: submit issue reports and feature requests through a webhook-backed form.

## Core Capabilities

- Multi-source property lookup from `EXP`, `MD`, `DFT`, `GC`, and `POLYINFO`
- Property prediction across 28 polymer properties
- Large-scale screening over real and virtual candidate libraries
- Exact Pareto ranking with trust and diversity-aware selection
- AI-assisted prompt-to-spec generation for discovery workflows
- Novelty-filtered polymer SMILES generation
- Material-aware literature retrieval, evidence staging, and reviewer workflow
- ADEPT handoff for downstream molecular dynamics workflow packaging

## Repository Layout

```text
.
β”œβ”€β”€ Home.py                      # Main Streamlit homepage
β”œβ”€β”€ app.py                       # Compatibility entrypoint
β”œβ”€β”€ pages/                       # User-facing application modules
β”œβ”€β”€ src/                         # Prediction, discovery, lookup, and UI logic
β”œβ”€β”€ literature/                  # Literature-mining pipeline components
β”œβ”€β”€ scripts/                     # Utility and workflow scripts
β”œβ”€β”€ data/                        # Lookup tables, discovery datasets, ADEPT files
β”œβ”€β”€ models/                      # Trained prediction and generation assets
β”œβ”€β”€ RNN/                         # Generator training/inference code
└── icons/                       # Application icons and branding assets
```

## Data And Model Assets

This repository expects pretrained models and local data tables to be present. The application uses:

- source datasets such as `EXP.csv`, `MD.csv`, `DFT.csv`, `GC.csv`, `POLYINFO.csv`, and `PI1M.csv`
- derived property tables such as `POLYINFO_PROPERTY.parquet` and `PI1M_PROPERTY.parquet`
- trained checkpoint files under `models/`
- pretrained RNN assets under `RNN/pretrained_model/` and `models/rnn/pretrained_model/`

If you clone only the code without the large assets, several app modules will not run correctly.

## Local Development

Use Python `3.10`.

```bash
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
streamlit run Home.py
```

Open `http://localhost:8501`.

## Literature Dependencies

The production app now includes the literature workflow through `requirements.txt`.
If you are working on the literature pipeline in isolation, you can still install:

```bash
pip install -r requirements-literature.txt
```

## Environment Configuration

Create a local `.env` file if needed. The template is provided in `.env.example`.

Key variables used by the platform include:

### LLM / Discovery AI

- `CRC_OPENWEBUI_API_KEY`
- `OPENWEBUI_API_KEY`
- `OPENAI_API_KEY`
- `CRC_OPENWEBUI_BASE_URL`
- `OPENWEBUI_BASE_URL`
- `CRC_OPENWEBUI_MODEL`
- `OPENWEBUI_MODEL`
- `OPENAI_MODEL`

The Discovery AI page also supports direct bring-your-own-key usage against supported providers from the UI.

### Literature Pipeline

- `PUBMED_EMAIL`
- `PUBMED_API_KEY`
- `SEMANTIC_SCHOLAR_API_KEY`
- `PAGEINDEX_API_KEY`
- `LITERATURE_MODEL_OPTIONS`

### Feedback / Analytics

- `FEEDBACK_WEBHOOK_URL`
- `FEEDBACK_WEBHOOK_TOKEN`
- `APP_DEPLOYMENT_SOURCE`

## Running With Docker

```bash
docker build -t polymer-discovery .
docker run --rm -p 8501:8501 polymer-discovery
```

The container launches:

```bash
streamlit run Home.py --server.port=8501 --server.address=0.0.0.0 --server.headless=true
```

## Notes For Deployment

- The app is designed as a Streamlit website.
- Heavy modules depend on local datasets and pretrained checkpoints being available at the expected paths.
- The AI-assisted discovery page requires a valid API key when using in-app LLM generation.
- The feedback page requires a configured webhook to receive submissions.

## Citation And Use

If you use this platform in research or build on top of it, cite the associated paper once published. Until then, reference the repository and the MONSTER Lab platform description.