PhDFlo commited on
Commit
6e0de72
·
1 Parent(s): 1135d83

Text update

Browse files
Files changed (2) hide show
  1. app.py +21 -17
  2. introduction_page.md +8 -2
app.py CHANGED
@@ -64,7 +64,7 @@ def select_best_model(
64
  # Definition of the tools for the MCP server
65
  # Function to return a fasta file
66
  def create_fasta_file(file_content: str, name: Optional[str] = None, seq_name: Optional[str] = None) -> str:
67
- """Create a FASTA file from a protein sequence string with a unique name.
68
 
69
  Args:
70
  file_content (str): The content of the FASTA file required with optional line breaks
@@ -232,7 +232,7 @@ def compute_Chai1(
232
 
233
  # Function to plot the 3D protein structure
234
  def plot_protein(result_df) -> str:
235
- """Plot the 3D structure of a protein using the DataFrame from compute_Chai1.
236
 
237
  Args:
238
  result_df (pd.DataFrame): DataFrame containing model information and scores
@@ -262,7 +262,7 @@ def show_cif_file(cif_file):
262
  """Plot a 3D structure from a CIF file with the Molecule3D library.
263
 
264
  Args:
265
- cif_file: A protein structure file in CIF format. This can be a file uploaded by the user.
266
  If None, the function will return None.
267
 
268
  Returns:
@@ -288,12 +288,12 @@ with gr.Blocks(theme=theme) as demo:
288
  gr.Markdown(
289
  """
290
  # Protein Folding Simulation Interface
291
- This interface provides the tools to fold FASTA chains based on Chai-1 model. Also, this is a MCP server to provide all the tools to automate the process of folding proteins with LLMs.
292
  """)
293
 
294
  with gr.Tab("Introduction 🔭"):
295
 
296
- gr.Image("images/logo1.png", show_label=False, width=600, show_download_button=False, show_fullscreen_button=False)
297
 
298
  gr.Markdown(
299
  """
@@ -302,22 +302,26 @@ with gr.Blocks(theme=theme) as demo:
302
  The industry is undergoing a profound transformation due to the development of Large Language Models (LLMs) and the recent advancements that enable them to access external tools.
303
  For years, companies have leveraged simulation tools to accelerate and reduce the costs of product development.
304
  One of the primary challenges in the coming years will be to create agents capable of setting up, running, and processing simulations to further expedite innovation.
 
305
 
306
  # Objective
307
 
308
- This project represents an initial step towards developing AI agents that can perform simulations using existing engineer softwares. It enables engineers to focus on analysis rather than setup.
309
  Key domains of application include:
310
  - **CFD** (Computational Fluid Dynamics) simulations
311
  - **Biology** (Protein Folding, Molecular Dynamics, etc.)
312
  - **Neural network applications**
313
 
314
- While this project focuses on protein folding, the principles employed can be extended to other domains.
315
- Specifically, it utilizes [Chai-1](https://www.chaidiscovery.com/blog/introducing-chai-1), a multi-modal foundation model for molecular structure prediction that achieves state-of-the-art performance across various benchmarks.
316
  Chai-1 enables unified prediction of proteins, small molecules, DNA, RNA, glycosylations, and more.
317
 
318
- Industrial computations are frequently performed on High-Performance Computing (HPC) clusters with substantial resources, necessitating that simulations typically run on separate servers.
319
- To provide comprehensive answers to users, the LLM must be able to access simulation results. To this end, [Modal Labs](https://modal.com/), a serverless platform that offers a straightforward method to run any application with the latest CPU and GPU hardware, will be used.
320
 
 
 
 
321
  # Benefits
322
 
323
  1. **Efficiency**: The MCP server's connected to high-performance computing capabilities ensure that simulations are run quickly and efficiently.
@@ -326,7 +330,7 @@ with gr.Blocks(theme=theme) as demo:
326
 
327
  3. **Integration**: The seamless integration between the LLM's chat interface and the MCP server allows for a streamlined workflow, from simulation setup to results analysis.
328
 
329
- The following video illustrates a practical use of the MCP server to run a protein folding simulation using the Chai-1 model.
330
  In this scenario, Copilot is used in Agent mode with Claude 3.5 Sonnet to leverage the tools provided by the MCP server.
331
 
332
  """
@@ -348,10 +352,10 @@ with gr.Blocks(theme=theme) as demo:
348
  gr.Markdown(
349
  """
350
  # MCP tools
351
- 1. `create_fasta_file`: Create a FASTA file from a protein sequence string with a unique name.
352
  2. `create_json_config`: Create a JSON configuration file from the Gradio interface inputs.
353
  3. `compute_Chai1`: Compute a Chai-1 simulation on Modal labs server. Return a DataFrame with protein scores.
354
- 4. `plot_protein`: Plot the 3D structure of a protein using the DataFrame from `compute_Chai1` (Use for Gradio interface).
355
  5. `show_cif_file`: Plot a 3D structure from a CIF file with the Molecule3D library (Use for the Gradio interface).
356
  """)
357
 
@@ -366,15 +370,15 @@ with gr.Blocks(theme=theme) as demo:
366
  The simulation was run with the default configuration and the image is 3D view from the Gradio interface.
367
  """)
368
 
369
- gr.Image("images/protein.png", show_label=True, width=400, label="Protein Folding example", show_download_button=False, show_fullscreen_button=False)
370
 
371
  gr.Markdown(
372
  """
373
  # What's next?
374
- 1. Expose additional tools to post-process the results of the simulations.
375
- The current post-processong tools are suited for the Gradio interface (ex: Plot images of the molecule structure from a file).
376
  2. Continue the pipeline by adding softawres like [OpenMM](https://openmm.org/) or [Gromacs](https://www.gromacs.org/) for molecular dynamics simulations.
377
- 3. Perform complete simulation plans including loops over parameters fully automated by the LLM.
378
 
379
  # Contact
380
  For any issues or questions, please contact the developer or refer to the documentation.
 
64
  # Definition of the tools for the MCP server
65
  # Function to return a fasta file
66
  def create_fasta_file(file_content: str, name: Optional[str] = None, seq_name: Optional[str] = None) -> str:
67
+ """Create a FASTA file from a biomolecule sequence string with a unique name.
68
 
69
  Args:
70
  file_content (str): The content of the FASTA file required with optional line breaks
 
232
 
233
  # Function to plot the 3D protein structure
234
  def plot_protein(result_df) -> str:
235
+ """Plot the 3D structure of a biomolecule using the DataFrame from compute_Chai1.
236
 
237
  Args:
238
  result_df (pd.DataFrame): DataFrame containing model information and scores
 
262
  """Plot a 3D structure from a CIF file with the Molecule3D library.
263
 
264
  Args:
265
+ cif_file: A biomolecule structure file in CIF format. This can be a file uploaded by the user.
266
  If None, the function will return None.
267
 
268
  Returns:
 
288
  gr.Markdown(
289
  """
290
  # Protein Folding Simulation Interface
291
+ This interface provides the tools to fold FASTA chains based on Chai-1 model. Also, this is a MCP server to provide all the tools to automate the process of folding biomolecules with LLMs.
292
  """)
293
 
294
  with gr.Tab("Introduction 🔭"):
295
 
296
+ gr.Image("images/logo1.png", show_label=False, width=600, show_download_button=False, show_fullscreen_button=False, show_share_button=False)
297
 
298
  gr.Markdown(
299
  """
 
302
  The industry is undergoing a profound transformation due to the development of Large Language Models (LLMs) and the recent advancements that enable them to access external tools.
303
  For years, companies have leveraged simulation tools to accelerate and reduce the costs of product development.
304
  One of the primary challenges in the coming years will be to create agents capable of setting up, running, and processing simulations to further expedite innovation.
305
+ Engineers will focus on analysis rather than simulation setup, allowing them to concentrate on the most critical aspects of their work.
306
 
307
  # Objective
308
 
309
+ This project represents a first step towards developing AI agents that can perform simulations using existing engineering softwares.
310
  Key domains of application include:
311
  - **CFD** (Computational Fluid Dynamics) simulations
312
  - **Biology** (Protein Folding, Molecular Dynamics, etc.)
313
  - **Neural network applications**
314
 
315
+ While this project focuses on biomolecules folding, the principles employed can be extended to other domains.
316
+ Specifically, it uses [Chai-1](https://www.chaidiscovery.com/blog/introducing-chai-1), a multi-modal foundation model for molecular structure prediction that achieves state-of-the-art performance across various benchmarks.
317
  Chai-1 enables unified prediction of proteins, small molecules, DNA, RNA, glycosylations, and more.
318
 
319
+ Industrial computations frequently require substantial resources (large number of CPUs and GPUs) that are performed on High-Performance Computing (HPC) clusters.
320
+ To this end, [Modal Labs](https://modal.com/), a serverless platform that offers a straightforward method to run any application with the latest CPU and GPU hardware, will be used.
321
 
322
+ MCP servers are an efficient solution to connect LLMs to real world engineering applications by providing access to a set of tools.
323
+ The purpose of this project is to enable users to run biomolecule folding simulations using the Chai-1 model through any LLM chat or with a Gradio interface.
324
+
325
  # Benefits
326
 
327
  1. **Efficiency**: The MCP server's connected to high-performance computing capabilities ensure that simulations are run quickly and efficiently.
 
330
 
331
  3. **Integration**: The seamless integration between the LLM's chat interface and the MCP server allows for a streamlined workflow, from simulation setup to results analysis.
332
 
333
+ The following video illustrates a practical use of the MCP server to run a biomolecule folding simulation using the Chai-1 model.
334
  In this scenario, Copilot is used in Agent mode with Claude 3.5 Sonnet to leverage the tools provided by the MCP server.
335
 
336
  """
 
352
  gr.Markdown(
353
  """
354
  # MCP tools
355
+ 1. `create_fasta_file`: Create a FASTA file from a biomolecule sequence string with a unique name.
356
  2. `create_json_config`: Create a JSON configuration file from the Gradio interface inputs.
357
  3. `compute_Chai1`: Compute a Chai-1 simulation on Modal labs server. Return a DataFrame with protein scores.
358
+ 4. `plot_protein`: Plot the 3D structure of a biomolecule using the DataFrame from `compute_Chai1` (Use for Gradio interface).
359
  5. `show_cif_file`: Plot a 3D structure from a CIF file with the Molecule3D library (Use for the Gradio interface).
360
  """)
361
 
 
370
  The simulation was run with the default configuration and the image is 3D view from the Gradio interface.
371
  """)
372
 
373
+ gr.Image("images/protein.png", show_label=True, width=400, label="Protein Folding example", show_download_button=False, show_fullscreen_button=False, show_share_button=False)
374
 
375
  gr.Markdown(
376
  """
377
  # What's next?
378
+ 1. Expose additional tools to post-process the results of the simulations (ex: Plot images of the molecule structure from a file).
379
+ The current post-processing tools are suited for the Gradio interface.
380
  2. Continue the pipeline by adding softawres like [OpenMM](https://openmm.org/) or [Gromacs](https://www.gromacs.org/) for molecular dynamics simulations.
381
+ 3. Perform full simulation plans including loops over parameters fully automated by the LLM.
382
 
383
  # Contact
384
  For any issues or questions, please contact the developer or refer to the documentation.
introduction_page.md CHANGED
@@ -60,7 +60,7 @@ CCCCCCCCCCCCCC(=O)O
60
  </div>
61
  <small>For a peptide, use `protein` as the molecule type.</small>
62
 
63
- **Other example:**
64
  <div style="background-color:#ffffff; border-radius:8px; padding:18px 24px; margin-bottom:24px; border:1px solid #cccccc;">
65
 
66
  ```fasta
@@ -68,6 +68,12 @@ CCCCCCCCCCCCCC(=O)O
68
  MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPDLNAAKSELDKAIGRNCNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCAAINQVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPDRAKRVITTFRTGTWDAYKNL
69
  ```
70
 
 
 
 
 
 
 
71
  </div>
72
 
73
  ### 3. <span style="color:#e98935;">Select your config and FASTA files</span>
@@ -77,7 +83,7 @@ In the `Run folding simulation 🚀` window, refresh the file list by clicking o
77
 
78
  ### 4. <span style="color:#e98935;">Run the simulation</span>
79
 
80
- Press the `Run Simulation` button to start de folding Simulation. Five proteins folding simulations will be performed. This parameter is hard coded in Chai-1. The simulation time is expected to be from 2min to 10min depending on the molecule.
81
 
82
  ### 5. <span style="color:#e98935;">Analyse the results of your simulation</span>
83
 
 
60
  </div>
61
  <small>For a peptide, use `protein` as the molecule type.</small>
62
 
63
+ **Other examples:**
64
  <div style="background-color:#ffffff; border-radius:8px; padding:18px 24px; margin-bottom:24px; border:1px solid #cccccc;">
65
 
66
  ```fasta
 
68
  MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPDLNAAKSELDKAIGRNCNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCAAINQVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPDRAKRVITTFRTGTWDAYKNL
69
  ```
70
 
71
+ ```fasta
72
+ >rna|Chain B
73
+ UUAGGCGGCCACAGCGGUGGGGUUGCCUCCCGUACCCAUCCCGAACACGGAAGAUAAGCCCACCAGCGUUCCGGGGAGUACUGGAGUGCGCGAGCCUCUGGGAAACCCGGUUCGCCGCCACC
74
+ MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPDLNAAKSELDKAIGRNCNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRCAAINQVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPDRAKRVITTFRTGTWDAYKNL
75
+ ```
76
+
77
  </div>
78
 
79
  ### 3. <span style="color:#e98935;">Select your config and FASTA files</span>
 
83
 
84
  ### 4. <span style="color:#e98935;">Run the simulation</span>
85
 
86
+ Press the `Run Simulation` button to start the folding simulation. Five biomolecules folding simulations will be performed. This parameter is hard coded in Chai-1. The simulation time is expected to be from 2min to 10min depending on the molecule.
87
 
88
  ### 5. <span style="color:#e98935;">Analyse the results of your simulation</span>
89