PhDFlo commited on
Commit
08452fe
·
2 Parent(s): 6e0de72 491e245

add doc to README

Browse files
Files changed (1) hide show
  1. README.md +160 -2
README.md CHANGED
@@ -16,7 +16,69 @@ tags:
16
 
17
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
18
 
19
- ### Environment creation with uv
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20
  Run the following in a bash shell:
21
  ```bash
22
  uv venv
@@ -24,8 +86,104 @@ source .venv/bin/activate
24
  uv pip install gradio[mcp] modal gemmi gradio_molecule3d
25
  ```
26
 
27
- ### Run the app
 
 
 
 
 
 
 
28
  Run in a bash shell:
29
  ```bash
30
  gradio app.py
31
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
18
 
19
+ ![Logo project](images/logo1.png)
20
+
21
+ # Stakes
22
+ The industry is undergoing a profound transformation due to the development of Large Language Models (LLMs) and the recent advancements that enable them to access external tools.
23
+ For years, companies have leveraged simulation tools to accelerate and reduce the costs of product development.
24
+ One of the primary challenges in the coming years will be to create agents capable of setting up, running, and processing simulations to further expedite innovation.
25
+ Engineers will focus on analysis rather than simulation setup, allowing them to concentrate on the most critical aspects of their work.
26
+
27
+ # Objective
28
+
29
+ This project represents a first step towards developing AI agents that can perform simulations using existing engineering softwares.
30
+ Key domains of application include:
31
+ - **CFD** (Computational Fluid Dynamics) simulations
32
+ - **Biology** (Protein Folding, Molecular Dynamics, etc.)
33
+ - **Neural network applications**
34
+
35
+ While this project focuses on biomolecules folding, the principles employed can be extended to other domains.
36
+ Specifically, it uses [Chai-1](https://www.chaidiscovery.com/blog/introducing-chai-1), a multi-modal foundation model for molecular structure prediction that achieves state-of-the-art performance across various benchmarks.
37
+ Chai-1 enables unified prediction of proteins, small molecules, DNA, RNA, glycosylations, and more.
38
+
39
+ Industrial computations frequently require substantial resources (large number of CPUs and GPUs) that are performed on High-Performance Computing (HPC) clusters.
40
+ To this end, [Modal Labs](https://modal.com/), a serverless platform that offers a straightforward method to run any application with the latest CPU and GPU hardware, will be used.
41
+
42
+ MCP servers are an efficient solution to connect LLMs to real world engineering applications by providing access to a set of tools.
43
+ The purpose of this project is to enable users to run biomolecule folding simulations using the Chai-1 model through any LLM chat or with a Gradio interface.
44
+
45
+
46
+ # Benefits
47
+
48
+ 1. **Efficiency**: The MCP server's connected to high-performance computing capabilities ensure that simulations are run quickly and efficiently.
49
+
50
+ 2. **Ease of Use**: Only provide necessary parameters to the user to simplify the process of setting up and running complex simulations.
51
+
52
+ 3. **Integration**: The seamless integration between the LLM's chat interface and the MCP server allows for a streamlined workflow, from simulation setup to results analysis.
53
+
54
+ The following video illustrates a practical use of the MCP server to run a biomolecules folding simulation using the Chai-1 model.
55
+ In this scenario, Copilot is used in Agent mode with Claude 3.5 Sonnet to leverage the tools provided by the MCP server.
56
+
57
+ # MCP tools
58
+ 1. `create_fasta_file`: Create a FASTA file from a biomolecule sequence string with a unique name.
59
+ 2. `create_json_config`: Create a JSON configuration file from the Gradio interface inputs.
60
+ 3. `compute_Chai1`: Compute a Chai-1 simulation on Modal labs server. Return a DataFrame with predicted scores: aggregated, pTM and ipTM.
61
+ 4. `plot_protein`: Plot the 3D structure of a biomolecule using the DataFrame from `compute_Chai1` (Use for Gradio interface).
62
+ 5. `show_cif_file`: Plot a 3D structure from a CIF file with the Molecule3D library (Use for the Gradio interface).
63
+
64
+ # Result example
65
+ The following image shows an example of a protein folding simulation using the Chai-1 model.
66
+ The simulation was run with the default configuration and the image is 3D view from the Gradio interface.
67
+
68
+ ![Protein folding example](images/protein.png)
69
+
70
+
71
+ # What's next?
72
+ 1. Expose additional tools to post-process the results of the simulations.
73
+ The current post-processing tools are suited for the Gradio interface (ex: Plot images of the molecule structure from a file).
74
+ 2. Continue the pipeline by adding softawres like [OpenMM](https://openmm.org/) or [Gromacs](https://www.gromacs.org/) for molecular dynamics simulations.
75
+ 3. Perform complete simulation plans including loops over parameters fully automated by the LLM.
76
+
77
+ # Contact
78
+ For any issues or questions, please contact the developer or refer to the documentation.
79
+
80
+
81
+ # Environment creation with uv
82
  Run the following in a bash shell:
83
  ```bash
84
  uv venv
 
86
  uv pip install gradio[mcp] modal gemmi gradio_molecule3d
87
  ```
88
 
89
+ # Connect to Modal
90
+ Create an account on Modal [website](https://modal.com) and run in your local terminal:
91
+ ```
92
+ python -m modal setup
93
+ ```
94
+
95
+
96
+ # Run the app
97
  Run in a bash shell:
98
  ```bash
99
  gradio app.py
100
  ```
101
+
102
+
103
+
104
+ # Gradio interface instructions
105
+
106
+ <div style="background-color:#f5f5f5; border-radius:8px; padding:18px 24px; margin-bottom:24px; border:1px solid #cccccc;">
107
+
108
+ ### 1. <span style="color:#e98935;">Create your JSON configuration file (Optional)</span>
109
+ <small>Default configuration is available if you skip this step.</small>
110
+
111
+ - In the `Configuration 📦` window, set your simulation parameters and generate the JSON config file. You can provide a file name in the dedicated box that will appear in the list of available configuration files. If you don't, a unique identifier will be assigned (e.g., `chai_{unique_id}_config.json`).
112
+ - **Parameters:**
113
+ - <b>Number of diffusion time steps:</b> 1 to 500
114
+ - <b>Number of trunk recycles:</b> 1 to 5
115
+ - <b>Seed:</b> 1 to 100
116
+ - <b>ESM_embeddings:</b> Include or not
117
+ - <b>MSA_server:</b> Include or not
118
+
119
+ ### 2. <span style="color:#e98935;">Upload a FASTA file with your molecule sequence (Optional)</span>
120
+ <small>Default FASTA files are available if you skip this step.</small>
121
+
122
+ - In the `Configuration 📦` window, write your FASTA content and create the file. You can provide a file name in the dedicated box that will appear in the list of available configuration files. If you don't provide a file name a unique identifier will be assigned (e.g., `chai_{unique_id}_input.fasta`). Also, if you don't provide a fasta content a default sequence will be written in the file.
123
+ - <b style="color:#b91c1c;">Warning:</b> The header must be well formatted for Chai1 to process it.
124
+
125
+ **FASTA template:**
126
+ <div style="background-color:#ffffff; border-radius:8px; padding:18px 24px; margin-bottom:24px; border:1px solid #cccccc;">
127
+
128
+ ```fasta
129
+ >{molecule_type}|{molecule_name}
130
+ Sequence (for protein/RNA/DNA) or SMILES for ligand
131
+ ```
132
+
133
+ </div>
134
+
135
+ **Accepted molecule types:**
136
+ `protein`/ `rna`/ `dna` / `ligand`
137
+
138
+ **Default input (provided by Chai1):**
139
+ <div style="background-color:#ffffff; border-radius:8px; padding:18px 24px; margin-bottom:24px; border:1px solid #cccccc;">
140
+
141
+ ```fasta
142
+ >protein|name=example-of-long-protein
143
+ AGSHSMRYFSTSVSRPGRGEPRFIAVGYVDDTQFVRFDSDAASPRGEPRAPWVEQEGPEYWDRETQKYKRQAQTDRVSLRNLRGYYNQSEAGSHTLQWMFGCDLGPDGRLLRGYDQSAYDGKDYIALNEDLRSWTAADTAAQITQRKWEAAREAEQRRAYLEGTCVEWLRRYLENGKETLQRAEHPKTHVTHHPVSDHEATLRCWALGFYPAEITLTWQWDGEDQTQDTELVETRPAGDGTFQKWAAVVVPSGEEQRYTCHVQHEGLPEPLTLRWEP
144
+
145
+ >protein|name=example-of-short-protein
146
+ AIQRTPKIQVYSRHPAENGKSNFLNCYVSGFHPSDIEVDLLKNGERIEKVEHSDLSFSKDWSFYLLYYTEFTPTEKDEYACRVNHVTLSQPKIVKWDRDM
147
+
148
+ >protein|name=example-peptide
149
+ GAAL
150
+
151
+ >ligand|name=example-ligand-as-smiles
152
+ CCCCCCCCCCCCCC(=O)O
153
+ ```
154
+
155
+ </div>
156
+ <small>For a peptide, use `protein` as the molecule type.</small>
157
+
158
+ **Other example:**
159
+ <div style="background-color:#ffffff; border-radius:8px; padding:18px 24px; margin-bottom:24px; border:1px solid #cccccc;">
160
+
161
+ ```fasta
162
+ >protein|lysozyme
163
+ MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPDLNAAKSELDKAIGRNCNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCAAINQVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPDRAKRVITTFRTGTWDAYKNL
164
+ ```
165
+
166
+ ```fasta
167
+ >rna|Chain B
168
+ UUAGGCGGCCACAGCGGUGGGGUUGCCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCACCAGCGUUCCGGGGAGUACUGGAGUGCGCGAGCCUCUGGGAAACCCGGUUCGCCGCCACC
169
+ MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPDLNAAKSELDKAIGRNCNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCAAINQVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPDRAKRVITTFRTGTWDAYKNL
170
+ ```
171
+
172
+ </div>
173
+
174
+ ### 3. <span style="color:#e98935;">Select your config and FASTA files</span>
175
+ <small>Files are stored in your working directory as you create them.</small>
176
+
177
+ In the `Run folding simulation 🚀` window, refresh the file list by clicking on the `Refresh available files`. Then select the configuration and fasta file you want.
178
+
179
+ ### 4. <span style="color:#e98935;">Run the simulation</span>
180
+
181
+ Press the `Run Simulation` button to start de folding Simulation. Five proteins folding simulations will be performed. This parameter is hard coded in Chai-1. The simulation time is expected to be from 2min to 10min depending on the molecule.
182
+
183
+ ### 5. <span style="color:#e98935;">Analyse the results of your simulation</span>
184
+
185
+ To analyse the results of the simulation, two outputs are provided:
186
+ - A table showing the score of the 5 folding performed
187
+ - Interactive 3D visualization of the molecule
188
+
189
+ Finally, you can get to the `Plot CIF file 💻` window to watch the cif files. This is mainly used to visualize CIF files after using this tool as an MCP server.