MoDrAg2 / README.md
cafierom's picture
Update README.md
18ef503 verified

A newer version of the Gradio SDK is available: 6.13.0

Upgrade
metadata
title: MoDrAg2
emoji: 🚀
colorFrom: gray
colorTo: pink
sdk: gradio
sdk_version: 6.4.0
python_version: '3.10'
app_file: app.py
pinned: false
license: mit
short_description: Drug Design Agent with semantic parsing

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

MoDrAg2

A Modular Drug-design AI Agent.

This page describes the semantic version of the agent. For the LanGraph-based agent, see this page.

Modes

MoDrAg2 has several modes:

  • AI mode: runs the selected tools and have the AI interpret the data for you.
  • Manual mode: returns raw tool results, without passing through the AI.
  • Review History mode: This uses tools while reviewing your entire chat history for input data. Only needs to be used if you need to grab a molecules, protein, etc from 2 or more turns ago.
  • Chat mode: this allows you to chat freely with the AI, with the AI being able to access any information from the chat history. Useful to revisit a result previously given by the AI.
  • Web search mode: This is a literature search. It results a list of relevant papers, articles, books, etc ranked according to how relevant they are to your query.

Abilities:

A nicely featured drug-design helper pipeline, including:

Premium Features

  • Dock a molecule in a protein using AutoDock Vina, get the score and the pose.
  • Use LightGBM to create a model to predict IC50 values of novel molecules. Trains itself using a Chembl Dataset which Modrag can find for you.
  • Use a GPT, finetuned on a Chembl dataset (found by MoDrAg as well!), to generate novel ligands for a protein.
*these features will take a bit longer than the standards features. The last two require a Chembl dataset ID, which you can find using modrag.

Standard Features

  • Find targets for a disease,
  • Find Uniprot IDs for a protein,
  • Find Chembl IDs for a given Uniprot ID,
  • Find bioactive molecules for a given Chembl ID,
  • Find PDB IDs for a given protein,
  • Find sequences, ligand and numbers of chains for a PDB ID.
  • Get SMILES strings for molecules, or
  • get names for SMILES strings.
  • Search Pubchem for similar molecules or generate analogues.
  • Find Lipinski properties of molecules.
  • Find pharmacophore overlap between two molecules.

Philosophy

  • Strong human-in-the-loop functionality: You approve the tools before deployment; you check the data going to the tools before sending, and you can edit the data before sending!
  • Let each model do what they do best: employ small models for specific tasks, larger models to handle conversation, and leave the rest for deterministic code!
  • Uses an embedding model (Embedding Gemma), a named entity recognition model, and REGEX for input parsing.
  • LLM is used for chat and for interpretation of tool results (Gemma 3 1B).
  • Everything ‘open,' avoiding using paid services, i.e. not using the OpenAI or Anthropic APIs, etc. where possible.
  • Uses a small LM so it can be deployed almost anywhere!

How to add a function to the Agent

  • A function can be easily added if it only requires input in the form of: SMILES, molecule names, protein names, disease name, Chembl IDs, Uniprot accession codes, or PBD IDs.
  • The function should be of the form (substitute the function task for the word function):
def function_node(agr: type, arg2: type):
  '''
  Doc string with args and returns
  '''
  [function body]

  returns a_list, a_string, an_image_list

The function can take any number of arguements but must return exactly 3: a list (can be nested), a string containing the function results in text form (to be given to the LLM as context), and an optional image or list of images. If it is a single image it should still be given as a list.

  • The function can be aded to either modrag_protein_functions, modrag_property_functions, or modrag_molecule_functions. Dependencies should be added to the top of the file.
  • Dependencies should also be added to the requirements.txt file for installation in the container.
  • A description should be aded to the full_tool_descriptions dictionary in app.py (intake_function.py for Colab). This is the description that will be shown to the user when the AI has selected a tool (for human-in-the-loop approval).
  • A description should be aded to the tool_descriptions dictionary in input_parsing.py. This is the same query that will be used by the embedding model to compare against user embeddings to select the tool.
  • The function name and arguments list should be added to the define_tool_hash function in input_parsing.py. This hash table is used to run the selected tool.
  • The function name, arguments list, and human-readable version of the arguments should be added to the define_tool_reqs function in input_parsing.py. This hash table is used to check for the required data before running a tool and for asking the user to provide any missing data.

That should be it! The code should be able to select for and deplot the new function.