File size: 16,605 Bytes
bdb1950
d9ad05e
037ba33
 
 
bdb1950
86f7d1b
bdb1950
037ba33
bdb1950
 
d9ad05e
037ba33
d9ad05e
037ba33
d9ad05e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
037ba33
 
 
 
 
d9ad05e
 
 
 
 
 
 
 
 
 
 
 
 
037ba33
 
 
 
 
d9ad05e
 
 
 
 
 
 
 
037ba33
 
 
 
 
d9ad05e
037ba33
 
 
 
 
 
 
 
d9ad05e
 
037ba33
 
 
 
 
 
d9ad05e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
037ba33
d9ad05e
 
 
 
 
037ba33
d9ad05e
 
 
037ba33
 
d9ad05e
 
037ba33
d9ad05e
 
037ba33
 
d9ad05e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
037ba33
 
d9ad05e
037ba33
d9ad05e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
037ba33
 
d9ad05e
037ba33
d9ad05e
037ba33
d9ad05e
037ba33
d9ad05e
 
 
 
037ba33
d9ad05e
037ba33
d9ad05e
 
 
 
 
 
 
 
 
037ba33
d9ad05e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
037ba33
 
d9ad05e
 
 
 
 
 
 
 
 
 
037ba33
 
 
d9ad05e
 
 
037ba33
d9ad05e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
037ba33
 
 
 
 
 
d9ad05e
 
 
037ba33
d9ad05e
 
037ba33
d9ad05e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
037ba33
 
d9ad05e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
037ba33
d9ad05e
 
 
037ba33
 
 
d9ad05e
 
 
 
 
 
037ba33
d9ad05e
037ba33
d9ad05e
037ba33
d9ad05e
 
 
 
 
037ba33
d9ad05e
 
 
037ba33
d9ad05e
 
 
 
 
 
 
037ba33
d9ad05e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
---
title: CpptrajAI
emoji: 🧬
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 8502
pinned: false
license: mit
---

# CpptrajAI

An AI-powered IDE for molecular dynamics (MD) trajectory analysis using **cpptraj** and large language models with Retrieval-Augmented Generation (RAG).

> **Type a prompt like "Calculate RMSD of the protein backbone" β€” CpptrajAI writes the cpptraj script, runs it, and reports the results.**

---

## Table of Contents

- [Features](#features)
- [Quick Start](#quick-start)
- [AI Backend Setup](#ai-backend-setup)
- [Uploading Files](#uploading-files)
- [Using the AI Agent](#using-the-ai-agent)
- [Script Editor](#script-editor)
- [Python Editor](#python-editor)
- [Results & Plots](#results--plots)
- [3D Viewer](#3d-viewer)
- [Supported Analyses](#supported-analyses)
- [Supported File Formats](#supported-file-formats)
- [Architecture](#architecture)
- [Agent Execution](#agent-execution)
- [Docker](#docker)

---

## Features

| Feature | Description |
|---------|-------------|
| **AI Agent** | Natural-language prompt β†’ cpptraj script β†’ execution β†’ result interpretation |
| **RAG over cpptraj manual** | On-demand TF-IDF retrieval from cached cpptraj syntax β€” the AI searches documentation only when it needs exact syntax |
| **Multi-provider AI** | Claude (Anthropic), GPT-4o (OpenAI), Gemini (Google), or any Ollama local model |
| **Local model support** | Run any Ollama model (qwen3, llama3, deepseek, etc.) on your own hardware β€” no API key needed |
| **Script Editor** | Write/edit cpptraj scripts manually with one-click execution |
| **Python Editor** | Post-process output files with Python/pandas/matplotlib inline |
| **Interactive Plots** | Plotly charts auto-generated from output `.dat` files |
| **3D Viewer** | Visualize topology and trajectory frames with 3Dmol.js |
| **Command Reference** | Searchable left-panel listing all cpptraj commands with syntax |
| **Multi-user** | Fully session-isolated β€” multiple users can run simultaneously |
| **Reset All** | One-click session reset to start fresh |

---

## Quick Start

### 1. Clone the repository

```bash
git clone https://github.com/nagarh/CpptrajAI.git
cd CpptrajAI
```

### 2. Install Python dependencies

```bash
pip install -r requirements.txt
```

### 3. Install cpptraj

cpptraj must be installed and available on your PATH.

**Via conda (recommended):**
```bash
conda install -c conda-forge ambertools
```

> `ambertools` includes cpptraj. Requires Python 3.11.

**From source:**
```bash
git clone https://github.com/Amber-MD/cpptraj.git
cd cpptraj && ./configure gnu && make -j4 install
```

**Custom path:** If cpptraj is not on your PATH, set the environment variable:
```bash
export CPPTRAJ_PATH=/path/to/cpptraj
```

### 4. Start the server

```bash
python server.py
```

Open your browser at **http://localhost:8502**

---

## AI Backend Setup

CpptrajAI supports cloud AI providers and local models via Ollama.

### Cloud Providers

| Provider | Models | Where to get key |
|----------|--------|-----------------|
| **Anthropic (Claude)** | Haiku 4.5, Sonnet 4.6, Opus 4.6 | [console.anthropic.com](https://console.anthropic.com) |
| **OpenAI** | GPT-4o, GPT-4o Mini | [platform.openai.com](https://platform.openai.com) |
| **Google (Gemini)** | Gemini 2.5 Flash | [aistudio.google.com](https://aistudio.google.com) |

### Local Models via Ollama (Free, No API Key)

Run any model locally using [Ollama](https://ollama.com):

```bash
# Install Ollama, then pull a model
ollama pull qwen3:14b

# Start Ollama server
ollama serve
```

In CpptrajAI Settings:
- Provider β†’ **Ollama**
- Base URL β†’ `http://localhost:11434/v1`
- Model β†’ `qwen3:14b` (or any model you pulled)

> Recommended local models: `qwen3:14b`, `qwen3:32b`, `qwen3:30b-a3b` (MoE). These have strong tool-calling support essential for the agentic workflow.

### Model Recommendations

| Model | Best for | Notes |
|-------|----------|-------|
| **Claude Sonnet 4.6** | Complex multi-step analyses β€” PCA, DCCM, 2D PMF, free energy landscapes | Most reliable for chained tool calls and multi-script workflows. Recommended for production use. |
| **GPT-4o** | Moderate complexity β€” RMSD, RMSF, Rg, clustering, hydrogen bonds | Reliable and accurate. Watch rate limits (TPM) on long sessions. |
| **Gemini 2.5 Flash** | Light to moderate analyses | Fast and cost-effective for routine tasks. |
| **Qwen3:14b / 32b (Ollama)** | Simple to moderate analyses β€” RMSD, Rg, strip/image, distance | Free and runs locally. Handles common analyses well but can hallucinate on complex multi-step workflows. Use `qwen3:32b` for best local results. |

> **Recommendation:** Use Claude Sonnet 4.6 for anything involving PCA, correlation matrices, or free energy. Use Qwen3 locally for quick exploratory analyses.

**How to configure any provider:**
1. Click **βš™ Settings** (top-right of the IDE)
2. Select your provider
3. Paste your API key (not needed for Ollama)
4. Choose a model
5. Click **Save**

> **Privacy:** API keys are stored only in your browser session and are never written to disk or logged.

---

## Uploading Files

Before running any analysis, upload your MD files using the **right panel**:

1. **Topology file** β€” drag and drop or click to upload (`.prmtop`, `.parm7`, `.psf`, `.gro`, `.mol2`)
2. **Trajectory file(s)** β€” upload one or more trajectory files (`.nc`, `.ncdf`, `.dcd`, `.xtc`, `.trr`, `.crd`)

Once uploaded, the IDE displays:
- Topology filename
- Total atoms, residues
- Protein residues, ligand residues (auto-detected)
- Trajectory file(s) loaded

> **Test data:** Click **Load Test Data** to load the built-in sample topology and trajectory to try the app without your own files.

### File type detection

- `.prmtop`, `.parm7`, `.psf`, `.gro`, `.mol2` β†’ always topology
- `.nc`, `.ncdf`, `.dcd`, `.xtc`, `.trr`, `.crd`, `.mdcrd` β†’ always trajectory
- `.pdb` β†’ auto-detected:
  - If a proper topology (`.prmtop` etc.) is already loaded β†’ treated as trajectory
  - Otherwise β†’ scanned for multi-MODEL records to determine if trajectory or single structure

---

## Using the AI Agent

The AI Chat tab is the primary interface. Type your analysis request in plain English.

### Example prompts

```
Calculate RMSD of protein backbone over all frames
```
```
Plot radius of gyration of the ligand
```
```
Calculate the dynamic cross-correlation matrix of the CΞ± atoms and plot it as a heatmap
```
```
Strip water molecules and save a new trajectory
```
```
Calculate the radius of gyration of the protein and plot a 2D free energy landscape (PMF) as a function of RMSD vs Rg
```

### How it works

1. Your prompt is enriched with file context (topology name, atom/residue counts, ligand info)
2. The AI calls `search_cpptraj_docs` when it needs exact command syntax from the manual
3. The AI writes a cpptraj script using verified commands and syntax
4. The script is executed automatically
5. Output files are read back and the AI summarizes key results
6. Plots are generated automatically for `.dat` output files

### Stop a running analysis

Click the **Stop** button (appears while the AI is thinking/running) to cancel mid-stream.

### Conversation history

The AI maintains conversation history within your session, so you can ask follow-up questions:
```
Now do the same analysis but only for residues 50-150
```
```
Can you also calculate the dihedral angles for these residues?
```

---

## Script Editor

The **Script** tab lets you write cpptraj scripts manually.

- Use the **Command Reference** (left panel) to look up syntax β€” click any command to insert it
- Scripts are pre-filled with `parm` and `trajin` lines pointing to your uploaded files
- Click **Run Script** to execute
- The `go` command is appended automatically if missing

### Example script

```
parm protein.prmtop
trajin mdin_prod.nc
strip :WAT
autoimage
rmsd backbone :1-200@CA,C,N,O first out rmsd_backbone.dat
radgyr :203 out ligand_rg.dat mass
go
```

---

## Python Editor

The **Python** tab provides an inline Python environment for post-processing output files.

- Output files from cpptraj are available in the working directory
- Use `pandas`, `numpy`, `matplotlib`, `scipy`, `scikit-learn` to process and plot results
- Results and plots appear in the output panel

### Example

```python
import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("rmsd_backbone.dat", sep=r"\s+", comment="#",
                 names=["frame", "rmsd"])
print(df.describe())
print(f"Mean RMSD: {df['rmsd'].mean():.3f} Γ…")
```

---

## Results & Plots

After each analysis run, output files appear in the **right panel**:

- `.dat` files β†’ automatically plotted as interactive Plotly line charts
- Multiple datasets in a single file β†’ plotted as multi-line chart
- Click any file name to view its raw content
- Click **Download** to save a file locally

---

## 3D Viewer

The right panel includes a **3D molecular viewer** powered by 3Dmol.js:

- Automatically displays your uploaded topology (`.prmtop`, `.pdb`, etc.)
- If a trajectory was processed and a PDB output exists, it can be loaded for frame animation
- Supports standard visualization styles: cartoon, stick, sphere, surface

---

## Supported Analyses

CpptrajAI supports all cpptraj analyses. Common categories:

| Category | Examples |
|----------|---------|
| **Structural metrics** | RMSD, RMSF, radius of gyration, distance, angle, dihedral |
| **Correlation analysis** | Dynamic cross-correlation matrix (DCCM), pairwise CΞ± distance matrix |
| **Solvent / surface** | SASA, water shell analysis, volumetric density |
| **Dynamics** | Atomic fluctuations, diffusion/MSD, B-factors |
| **Clustering** | Hierarchical, K-means, DBSCAN |
| **Dimensionality reduction** | PCA (covariance matrix β†’ diagonalization β†’ projection) |
| **Interactions** | Hydrogen bonds, native contacts (Q-value), salt bridges |
| **Secondary structure** | DSSP per-residue per-frame |
| **Trajectory manipulation** | Strip atoms/solvent, imaging, centering, autoimage |
| **Free energy** | 2D PMF landscape, dihedral entropy |

---

## Supported File Formats

| Type | Extensions |
|------|------------|
| **Topology** | `.prmtop` `.parm7` `.psf` `.pdb` `.gro` `.mol2` |
| **Trajectory** | `.nc` `.ncdf` `.dcd` `.xtc` `.trr` `.crd` `.mdcrd` `.rst7` |
| **Output data** | `.dat` (whitespace-delimited, auto-plotted) |

---

## Architecture

```
CpptrajAI/
β”œβ”€β”€ server.py               # Flask backend β€” REST API + SSE streaming
β”œβ”€β”€ agent_ide.html          # Single-page frontend β€” HTML/CSS/JS
β”œβ”€β”€ core/
β”‚   β”œβ”€β”€ agent.py            # AI agent: tool calling, conversation history, RAG
β”‚   β”œβ”€β”€ knowledge_base.py   # cpptraj manual RAG (TF-IDF) + command registry
β”‚   β”œβ”€β”€ llm_backends.py     # Claude / OpenAI / Gemini / Ollama backends
β”‚   └── runner.py           # cpptraj subprocess execution + file management
β”œβ”€β”€ CpptrajManual.pdf       # Source PDF for RAG
β”œβ”€β”€ cpptraj_manual_cache.json  # Pre-parsed PDF chunks (213 chunks)
β”œβ”€β”€ test_data/              # Sample .prmtop and .nc for quick testing
β”œβ”€β”€ Dockerfile              # For HuggingFace Spaces deployment
└── requirements.txt
```

## Agent Execution

This section explains exactly how CpptrajAI processes a user prompt from start to finish.

### Execution flow

![Agent Execution Flow](agent_flow.svg)

### Agent tools

The AI agent has access to the following tools it can call autonomously:

| Tool | Description |
|------|-------------|
| `search_cpptraj_docs` | Search the cpptraj manual (TF-IDF RAG) for exact command names and syntax. Called on demand before writing scripts. |
| `run_cpptraj_script` | Write and execute a cpptraj script. Returns stdout, stderr, elapsed time, and output files generated. |
| `run_python_script` | Write and execute a Python script for post-processing, plotting, or statistics on cpptraj output files. |
| `read_output_file` | Read the content of an output file produced by a previous cpptraj run. |
| `list_output_files` | List all output files currently in the working directory. |

### Multi-step workflow handling

Each `run_cpptraj_script` call is a **fresh cpptraj process** β€” in-memory datasets do not persist between calls. The agent handles this by:

1. Writing every intermediate result to disk with `out filename`
2. Reloading data in subsequent scripts using `readdata filename name datasetname`
3. Passing computed results (e.g. eigenvectors from PCA) to Python for post-processing

**Example β€” PCA workflow:**
```
Step 1 β†’ run_cpptraj_script  : compute covariance matrix β†’ write evecs.dat
Step 2 β†’ run_cpptraj_script  : readdata evecs.dat β†’ project trajectory β†’ write pca.dat
Step 3 β†’ run_python_script   : load pca.dat β†’ plot PC1 vs PC2 free energy landscape
```

### RAG pipeline

1. `CpptrajManual.pdf` is parsed into 213 chunks at startup (cached to JSON)
2. A TF-IDF index is built over all chunks
3. The AI agent has a `search_cpptraj_docs` tool it calls on demand when it needs exact command syntax
4. The top-2 most relevant manual chunks are returned to the model
5. Cloud models (Claude, GPT-4o, Gemini) call the tool only when uncertain β€” local models call it before every script for reliability
6. The AI writes scripts using exact command names from the retrieved documentation

### Token cost optimisation

Running an AI agent with tool calls can be expensive if not carefully managed. CpptrajAI applies several techniques to minimise token usage:

| Technique | Saving |
|-----------|--------|
| **On-demand RAG** | `search_cpptraj_docs` is a tool the model calls only when it needs syntax β€” not injected into every message. Saves ~1500 tokens/request vs always-on RAG. |
| **No cheatsheet in system prompt** | The full command cheatsheet was removed from the system prompt. The model uses the search tool instead. Saves ~1500 tokens/request. |
| **Sliding conversation window** | Only the last 3 user turns are sent to the API β€” not the full history. Older turns are dropped. |
| **Compressed tool results** | Large cpptraj stdout is trimmed to the first 8 lines + line count before storing in history. |
| **Concise responses enforced** | The system prompt enforces 1-2 sentence summaries β€” no markdown tables, headers, or interpretation sections in replies. |
| **No max_tokens for local models** | Ollama models run without an output token cap β€” free to generate as much as needed. Cloud models are capped at 4096 output tokens to control cost. |

### Multi-user isolation

Each browser session gets a unique UUID cookie. All state (uploaded files, agent history, working directory, stop events) is stored per-session and automatically cleaned up after 2 hours of inactivity.

---

## Docker

```bash
docker pull nagarh/cpptraj-ai:latest
docker run -p 8502:8502 nagarh/cpptraj-ai:latest
```

Open **http://localhost:8502**

### Environment variables

| Variable | Default | Description |
|----------|---------|-------------|
| `CPPTRAJ_PATH` | bundled via ambertools | Path to cpptraj binary |
| `PORT` | `8502` | Server port |
| `FLASK_SECRET_KEY` | default | Change in production |

---

## License

MIT License. See [LICENSE](LICENSE) for details.

---

## Tools Used

| Tool | Purpose |
|------|---------|
| [cpptraj](https://github.com/Amber-MD/cpptraj) | MD trajectory analysis engine |
| [Anthropic Claude](https://anthropic.com) | AI backend (cloud) |
| [OpenAI GPT-4o](https://openai.com) | AI backend (cloud) |
| [Google Gemini](https://ai.google.dev) | AI backend (cloud) |
| [Ollama](https://ollama.com) | Local model inference |
| [3Dmol.js](https://3dmol.csb.pitt.edu) | 3D molecular visualization |
| [Plotly](https://plotly.com) | Interactive plots |
| [Flask](https://flask.palletsprojects.com) | Backend web framework |
| [scikit-learn](https://scikit-learn.org) | TF-IDF RAG pipeline |

---

## Citation

If you use CpptrajAI in your work, please cite:

```bibtex
@software{CpptrajAI,
  title  = {CpptrajAI: AI-Powered IDE for Molecular Dynamics Trajectory Analysis},
  author = {Nagar, Hemant},
  year   = {2025},
  url    = {https://github.com/nagarh/CpptrajAI}
}
```

Please also cite **cpptraj**:

> Roe, D. R., & Cheatham III, T. E. (2013). PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. *Journal of Chemical Theory and Computation*, *9*(7), 3084–3095.

---

## Contact

- **Author**: Hemant Nagar
- **Email**: hn533621@ohio.edu
- **GitHub**: [github.com/nagarh](https://github.com/nagarh)