Spaces:

hemantn
/

CpptrajAI

Running

App Files Files Community

CpptrajAI / README.md

hemantn

Add app_port: 8502 to fix HuggingFace launch timeout

86f7d1b verified 18 days ago

preview code

raw

history blame contribute delete

16.6 kB

metadata

title: CpptrajAI
emoji: 🧬
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 8502
pinned: false
license: mit

CpptrajAI

An AI-powered IDE for molecular dynamics (MD) trajectory analysis using cpptraj and large language models with Retrieval-Augmented Generation (RAG).

Type a prompt like "Calculate RMSD of the protein backbone" — CpptrajAI writes the cpptraj script, runs it, and reports the results.

Features
Quick Start
AI Backend Setup
Uploading Files
Using the AI Agent
Script Editor
Python Editor
Results & Plots
3D Viewer
Supported Analyses
Supported File Formats
Architecture
Agent Execution
Docker

Features

Feature	Description
AI Agent	Natural-language prompt → cpptraj script → execution → result interpretation
RAG over cpptraj manual	On-demand TF-IDF retrieval from cached cpptraj syntax — the AI searches documentation only when it needs exact syntax
Multi-provider AI	Claude (Anthropic), GPT-4o (OpenAI), Gemini (Google), or any Ollama local model
Local model support	Run any Ollama model (qwen3, llama3, deepseek, etc.) on your own hardware — no API key needed
Script Editor	Write/edit cpptraj scripts manually with one-click execution
Python Editor	Post-process output files with Python/pandas/matplotlib inline
Interactive Plots	Plotly charts auto-generated from output `.dat` files
3D Viewer	Visualize topology and trajectory frames with 3Dmol.js
Command Reference	Searchable left-panel listing all cpptraj commands with syntax
Multi-user	Fully session-isolated — multiple users can run simultaneously
Reset All	One-click session reset to start fresh

Quick Start

1. Clone the repository

git clone https://github.com/nagarh/CpptrajAI.git
cd CpptrajAI

2. Install Python dependencies

pip install -r requirements.txt

3. Install cpptraj

cpptraj must be installed and available on your PATH.

Via conda (recommended):

conda install -c conda-forge ambertools

ambertools includes cpptraj. Requires Python 3.11.

From source:

git clone https://github.com/Amber-MD/cpptraj.git
cd cpptraj && ./configure gnu && make -j4 install

Custom path: If cpptraj is not on your PATH, set the environment variable:

export CPPTRAJ_PATH=/path/to/cpptraj

4. Start the server

python server.py

Open your browser at http://localhost:8502

AI Backend Setup

CpptrajAI supports cloud AI providers and local models via Ollama.

Cloud Providers

Provider	Models	Where to get key
Anthropic (Claude)	Haiku 4.5, Sonnet 4.6, Opus 4.6	console.anthropic.com
OpenAI	GPT-4o, GPT-4o Mini	platform.openai.com
Google (Gemini)	Gemini 2.5 Flash	aistudio.google.com

Local Models via Ollama (Free, No API Key)

Run any model locally using Ollama:

# Install Ollama, then pull a model
ollama pull qwen3:14b

# Start Ollama server
ollama serve

In CpptrajAI Settings:

Provider → Ollama
Base URL → http://localhost:11434/v1
Model → qwen3:14b (or any model you pulled)

Recommended local models: qwen3:14b, qwen3:32b, qwen3:30b-a3b (MoE). These have strong tool-calling support essential for the agentic workflow.

Model Recommendations

Model	Best for	Notes
Claude Sonnet 4.6	Complex multi-step analyses — PCA, DCCM, 2D PMF, free energy landscapes	Most reliable for chained tool calls and multi-script workflows. Recommended for production use.
GPT-4o	Moderate complexity — RMSD, RMSF, Rg, clustering, hydrogen bonds	Reliable and accurate. Watch rate limits (TPM) on long sessions.
Gemini 2.5 Flash	Light to moderate analyses	Fast and cost-effective for routine tasks.
Qwen3:14b / 32b (Ollama)	Simple to moderate analyses — RMSD, Rg, strip/image, distance	Free and runs locally. Handles common analyses well but can hallucinate on complex multi-step workflows. Use `qwen3:32b` for best local results.

Recommendation: Use Claude Sonnet 4.6 for anything involving PCA, correlation matrices, or free energy. Use Qwen3 locally for quick exploratory analyses.

How to configure any provider:

Click ⚙ Settings (top-right of the IDE)
Select your provider
Paste your API key (not needed for Ollama)
Choose a model
Click Save

Privacy: API keys are stored only in your browser session and are never written to disk or logged.

Uploading Files

Before running any analysis, upload your MD files using the right panel:

Topology file — drag and drop or click to upload (.prmtop, .parm7, .psf, .gro, .mol2)
Trajectory file(s) — upload one or more trajectory files (.nc, .ncdf, .dcd, .xtc, .trr, .crd)

Once uploaded, the IDE displays:

Topology filename
Total atoms, residues
Protein residues, ligand residues (auto-detected)
Trajectory file(s) loaded

Test data: Click Load Test Data to load the built-in sample topology and trajectory to try the app without your own files.

File type detection

.prmtop, .parm7, .psf, .gro, .mol2 → always topology
.nc, .ncdf, .dcd, .xtc, .trr, .crd, .mdcrd → always trajectory
.pdb → auto-detected:
- If a proper topology (.prmtop etc.) is already loaded → treated as trajectory
- Otherwise → scanned for multi-MODEL records to determine if trajectory or single structure

Using the AI Agent

The AI Chat tab is the primary interface. Type your analysis request in plain English.

Example prompts

Calculate RMSD of protein backbone over all frames

Plot radius of gyration of the ligand

Calculate the dynamic cross-correlation matrix of the Cα atoms and plot it as a heatmap

Strip water molecules and save a new trajectory

Calculate the radius of gyration of the protein and plot a 2D free energy landscape (PMF) as a function of RMSD vs Rg

How it works

Your prompt is enriched with file context (topology name, atom/residue counts, ligand info)
The AI calls search_cpptraj_docs when it needs exact command syntax from the manual
The AI writes a cpptraj script using verified commands and syntax
The script is executed automatically
Output files are read back and the AI summarizes key results
Plots are generated automatically for .dat output files

Stop a running analysis

Click the Stop button (appears while the AI is thinking/running) to cancel mid-stream.

Conversation history

The AI maintains conversation history within your session, so you can ask follow-up questions:

Now do the same analysis but only for residues 50-150

Can you also calculate the dihedral angles for these residues?

Script Editor

The Script tab lets you write cpptraj scripts manually.

Use the Command Reference (left panel) to look up syntax — click any command to insert it
Scripts are pre-filled with parm and trajin lines pointing to your uploaded files
Click Run Script to execute
The go command is appended automatically if missing

Example script

parm protein.prmtop
trajin mdin_prod.nc
strip :WAT
autoimage
rmsd backbone :1-200@CA,C,N,O first out rmsd_backbone.dat
radgyr :203 out ligand_rg.dat mass
go

Python Editor

The Python tab provides an inline Python environment for post-processing output files.

Output files from cpptraj are available in the working directory
Use pandas, numpy, matplotlib, scipy, scikit-learn to process and plot results
Results and plots appear in the output panel

Example

import pandas as pd
import matplotlib.pyplot as plt

df = pd.read_csv("rmsd_backbone.dat", sep=r"\s+", comment="#",
                 names=["frame", "rmsd"])
print(df.describe())
print(f"Mean RMSD: {df['rmsd'].mean():.3f} Å")

Results & Plots

After each analysis run, output files appear in the right panel:

.dat files → automatically plotted as interactive Plotly line charts
Multiple datasets in a single file → plotted as multi-line chart
Click any file name to view its raw content
Click Download to save a file locally

3D Viewer

The right panel includes a 3D molecular viewer powered by 3Dmol.js:

Automatically displays your uploaded topology (.prmtop, .pdb, etc.)
If a trajectory was processed and a PDB output exists, it can be loaded for frame animation
Supports standard visualization styles: cartoon, stick, sphere, surface

Supported Analyses

CpptrajAI supports all cpptraj analyses. Common categories:

Category	Examples
Structural metrics	RMSD, RMSF, radius of gyration, distance, angle, dihedral
Correlation analysis	Dynamic cross-correlation matrix (DCCM), pairwise Cα distance matrix
Solvent / surface	SASA, water shell analysis, volumetric density
Dynamics	Atomic fluctuations, diffusion/MSD, B-factors
Clustering	Hierarchical, K-means, DBSCAN
Dimensionality reduction	PCA (covariance matrix → diagonalization → projection)
Interactions	Hydrogen bonds, native contacts (Q-value), salt bridges
Secondary structure	DSSP per-residue per-frame
Trajectory manipulation	Strip atoms/solvent, imaging, centering, autoimage
Free energy	2D PMF landscape, dihedral entropy

Supported File Formats

Type	Extensions
Topology	`.prmtop` `.parm7` `.psf` `.pdb` `.gro` `.mol2`
Trajectory	`.nc` `.ncdf` `.dcd` `.xtc` `.trr` `.crd` `.mdcrd` `.rst7`
Output data	`.dat` (whitespace-delimited, auto-plotted)

Architecture

CpptrajAI/
├── server.py               # Flask backend — REST API + SSE streaming
├── agent_ide.html          # Single-page frontend — HTML/CSS/JS
├── core/
│   ├── agent.py            # AI agent: tool calling, conversation history, RAG
│   ├── knowledge_base.py   # cpptraj manual RAG (TF-IDF) + command registry
│   ├── llm_backends.py     # Claude / OpenAI / Gemini / Ollama backends
│   └── runner.py           # cpptraj subprocess execution + file management
├── CpptrajManual.pdf       # Source PDF for RAG
├── cpptraj_manual_cache.json  # Pre-parsed PDF chunks (213 chunks)
├── test_data/              # Sample .prmtop and .nc for quick testing
├── Dockerfile              # For HuggingFace Spaces deployment
└── requirements.txt

Agent Execution

This section explains exactly how CpptrajAI processes a user prompt from start to finish.

Execution flow

Agent tools

The AI agent has access to the following tools it can call autonomously:

Tool	Description
`search_cpptraj_docs`	Search the cpptraj manual (TF-IDF RAG) for exact command names and syntax. Called on demand before writing scripts.
`run_cpptraj_script`	Write and execute a cpptraj script. Returns stdout, stderr, elapsed time, and output files generated.
`run_python_script`	Write and execute a Python script for post-processing, plotting, or statistics on cpptraj output files.
`read_output_file`	Read the content of an output file produced by a previous cpptraj run.
`list_output_files`	List all output files currently in the working directory.

Multi-step workflow handling

Each run_cpptraj_script call is a fresh cpptraj process — in-memory datasets do not persist between calls. The agent handles this by:

Writing every intermediate result to disk with out filename
Reloading data in subsequent scripts using readdata filename name datasetname
Passing computed results (e.g. eigenvectors from PCA) to Python for post-processing

Example — PCA workflow:

Step 1 → run_cpptraj_script  : compute covariance matrix → write evecs.dat
Step 2 → run_cpptraj_script  : readdata evecs.dat → project trajectory → write pca.dat
Step 3 → run_python_script   : load pca.dat → plot PC1 vs PC2 free energy landscape

RAG pipeline

CpptrajManual.pdf is parsed into 213 chunks at startup (cached to JSON)
A TF-IDF index is built over all chunks
The AI agent has a search_cpptraj_docs tool it calls on demand when it needs exact command syntax
The top-2 most relevant manual chunks are returned to the model
Cloud models (Claude, GPT-4o, Gemini) call the tool only when uncertain — local models call it before every script for reliability
The AI writes scripts using exact command names from the retrieved documentation

Token cost optimisation

Running an AI agent with tool calls can be expensive if not carefully managed. CpptrajAI applies several techniques to minimise token usage:

Technique	Saving
On-demand RAG	`search_cpptraj_docs` is a tool the model calls only when it needs syntax — not injected into every message. Saves ~1500 tokens/request vs always-on RAG.
No cheatsheet in system prompt	The full command cheatsheet was removed from the system prompt. The model uses the search tool instead. Saves ~1500 tokens/request.
Sliding conversation window	Only the last 3 user turns are sent to the API — not the full history. Older turns are dropped.
Compressed tool results	Large cpptraj stdout is trimmed to the first 8 lines + line count before storing in history.
Concise responses enforced	The system prompt enforces 1-2 sentence summaries — no markdown tables, headers, or interpretation sections in replies.
No max_tokens for local models	Ollama models run without an output token cap — free to generate as much as needed. Cloud models are capped at 4096 output tokens to control cost.

Multi-user isolation

Each browser session gets a unique UUID cookie. All state (uploaded files, agent history, working directory, stop events) is stored per-session and automatically cleaned up after 2 hours of inactivity.

Docker

docker pull nagarh/cpptraj-ai:latest
docker run -p 8502:8502 nagarh/cpptraj-ai:latest

Open http://localhost:8502

Environment variables

Variable	Default	Description
`CPPTRAJ_PATH`	bundled via ambertools	Path to cpptraj binary
`PORT`	`8502`	Server port
`FLASK_SECRET_KEY`	default	Change in production

License

MIT License. See LICENSE for details.

Tools Used

Tool	Purpose
cpptraj	MD trajectory analysis engine
Anthropic Claude	AI backend (cloud)
OpenAI GPT-4o	AI backend (cloud)
Google Gemini	AI backend (cloud)
Ollama	Local model inference
3Dmol.js	3D molecular visualization
Plotly	Interactive plots
Flask	Backend web framework
scikit-learn	TF-IDF RAG pipeline

Citation

If you use CpptrajAI in your work, please cite:

@software{CpptrajAI,
  title  = {CpptrajAI: AI-Powered IDE for Molecular Dynamics Trajectory Analysis},
  author = {Nagar, Hemant},
  year   = {2025},
  url    = {https://github.com/nagarh/CpptrajAI}
}

Please also cite cpptraj:

Roe, D. R., & Cheatham III, T. E. (2013). PTRAJ and CPPTRAJ: software for processing and analysis of molecular dynamics trajectory data. Journal of Chemical Theory and Computation, 9(7), 3084–3095.

Contact

Author: Hemant Nagar
Email: hn533621@ohio.edu
GitHub: github.com/nagarh