Spaces:

IlPakoZ
/

DLRNA-BERTa

Sleeping

App Files Files Community

IlPakoZ commited on Dec 2, 2025

Commit

4780493

verified ·

1 Parent(s): a81fdbb

Added multiple SMILES compatibility, edited gradio API and updated README.md

Browse files

Files changed (2) hide show

README.md +14 -2
app.py +82 -76

README.md CHANGED Viewed

@@ -12,6 +12,8 @@ pinned: false
 This model predicts drug-target interactions using a novel cross-attention architecture that combines RNA sequence understanding with molecular representation learning. The model processes RNA target sequences and drug SMILES representations to predict binding affinity scores (pKd values).
 ## Architecture
 The model consists of several key components:
@@ -53,6 +55,16 @@ from updated_app import demo
 demo.launch()
 ```
 ### Programmatic usage
 ```python
@@ -101,7 +113,7 @@ with torch.no_grad():
 ## Model inputs
 - **Target sequence**: RNA sequence using nucleotides A, U, G, C (string)
-- **Drug SMILES**: Simplified Molecular Input Line Entry System notation (string)
 ## Model outputs
@@ -196,4 +208,4 @@ This model is released under the MIT License.
   year={2024},
   publisher={Oxford University Press}
 }
-```

 This model predicts drug-target interactions using a novel cross-attention architecture that combines RNA sequence understanding with molecular representation learning. The model processes RNA target sequences and drug SMILES representations to predict binding affinity scores (pKd values).
+**Full model repository**: [https://github.com/IlPakoZ/dlrnaberta-dti-prediction](https://github.com/IlPakoZ/dlrnaberta-dti-prediction)
 ## Architecture
 The model consists of several key components:
 demo.launch()
 ```
+The Gradio interface supports **batch predictions** by allowing multiple SMILES entries. Simply enter each SMILES string on a new line in the drug SMILES input field:
+```
+CC(C)CC1=CC=C(C=C1)C(C)C(=O)O
+C1CCCCC1O
+C1=CC=C(C=C1)NC(=O)C2=CC=CC=N2
+```
+The model will predict binding affinity for each drug-target pair sequentially. For visualizations, the results will display only the last SMILES entry.
 ### Programmatic usage
 ```python
 ## Model inputs
 - **Target sequence**: RNA sequence using nucleotides A, U, G, C (string)
+- **Drug SMILES**: Simplified Molecular Input Line Entry System notation (string or multiple strings, one per line)
 ## Model outputs
   year={2024},
   publisher={Oxford University Press}
 }
+```

app.py CHANGED Viewed

@@ -118,42 +118,52 @@ class DrugTargetInteractionApp:
             logger.error(f"Error loading model: {str(e)}")
             return False
-    def predict_interaction(self, target_sequence, drug_smiles, max_length=512):
-        """Predict drug-target interaction"""
-        if self.model is None:
-            return "Error: Model not loaded. Please load a model first."
-        try:
-            # Tokenize inputs
-            target_inputs = self.target_tokenizer(
-                target_sequence,
-                padding="max_length",
-                truncation=True,
-                max_length=512,
-                return_tensors="pt"
-            ).to(self.device)
             drug_inputs = self.drug_tokenizer(
-                drug_smiles,
                 padding="max_length",
                 truncation=True,
                 max_length=512,
                 return_tensors="pt"
             ).to(self.device)
             # Make prediction
             self.model.INTERPR_DISABLE_MODE()
-            with torch.no_grad():
-                prediction = self.model(target_inputs, drug_inputs)
-                # Unscale if scaler exists
-                if self.model.scaler is not None:
-                    prediction = self.model.unscale(prediction)
-                prediction_value = prediction.cpu().numpy()[0][0]
-            return f"Predicted Binding Affinity: {prediction_value:.4f}"
         except Exception as e:
             logger.error(f"Prediction error: {str(e)}")
             return f"Error during prediction: {str(e)}"
@@ -173,48 +183,34 @@ class DrugTargetInteractionApp:
             return None, None, None, "Error: Model not loaded. Please load a model first."
         try:
-            # Tokenize inputs
-            target_inputs = self.target_tokenizer(
-                target_sequence,
-                padding="max_length",
-                truncation=True,
-                max_length=512,
-                return_tensors="pt"
-            ).to(self.device)
-            drug_inputs = self.drug_tokenizer(
-                drug_smiles,
-                padding="max_length",
-                truncation=True,
-                max_length=512,
-                return_tensors="pt"
-            ).to(self.device)
-            # Enable interpretation mode
             self.model.INTERPR_ENABLE_MODE()
-            # Make prediction and extract visualization data
-            with torch.no_grad():
-                prediction = self.model(target_inputs, drug_inputs)
-                # Unscale if scaler exists
-                if self.model.scaler is not None:
-                    prediction = self.model.unscale(prediction)
-                prediction_value = prediction.cpu().numpy()[0][0]
-                # Extract data needed for visualizations
-                presum_values = self.model.model.presum_layer  # Shape: (1, seq_len)
-                cross_attention_weights = self.model.model.crossattention_weights  # Shape: (batch, heads, seq_len, seq_len)
-                # Get model parameters for scaling
-                w = self.model.model.w.squeeze(1)
-                b = self.model.model.b
-                scaler = self.model.model.scaler
-            logger.info(f"Target inputs shape: {target_inputs['input_ids'].shape}")
-            logger.info(f"Drug inputs shape: {drug_inputs['input_ids'].shape}")
             # Generate visualizations
             try:
                 # 1. Cross-attention heatmap
@@ -318,7 +314,6 @@ class DrugTargetInteractionApp:
                     text="Raw Contribution\nSkipped (pKd ≤ 0)"
                 )
-            status_msg = f"Predicted Binding Affinity: {prediction_value:.4f}"
             if prediction_value <= 0:
                 status_msg += " (Raw contribution visualization skipped due to non-positive pKd)"
             if cross_attention_weights is None:
@@ -344,14 +339,14 @@ def predict_wrapper(target_seq, drug_smiles):
     if not target_seq.strip() or not drug_smiles.strip():
         return "Please provide both target sequence and drug SMILES."
-    return app.predict_interaction(target_seq, drug_smiles)
 def visualize_wrapper(target_seq, drug_smiles):
     """Wrapper function for visualization"""
     if not target_seq.strip() or not drug_smiles.strip():
         return None, None, None, "Please provide both target sequence and drug SMILES."
-    return app.visualize_interaction(target_seq, drug_smiles)
 def load_model_wrapper(model_path):
     """Wrapper function to load model"""
@@ -390,8 +385,11 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
                 drug_input = gr.Textbox(
                     label="Drug SMILES",
-                    placeholder="Enter SMILES notation (e.g., CC(C)CC1=CC=C(C=C1)C(C)C(=O)O)",
-                    lines=2
                 )
                 with gr.Row():
@@ -437,12 +435,16 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
             img1, img2, img3, status = visualize_wrapper(target_seq, drug_smiles)
             # Combine prediction result with visualization status
             combined_status = status + "\n\nVisualization analysis complete. Please navigate to the Visualizations tab to view the generated images."
             return img1, img2, img3, combined_status
         visualize_btn.click(
             fn=visualize_and_update,
             inputs=[target_input, drug_input],
-            outputs=[viz_state1, viz_state2, viz_state3, prediction_output]
         )
     with gr.Tab("📊 Visualizations"):
@@ -587,7 +589,7 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
         gr.Markdown("""
         ## About this application
-        This application implements DLRNA-BERTa, a Dual Langauge RoBERTa Transformer model for predicting drug to RNA target interactions. The model architecture includes:
         - **Target encoder**: Processes RNA sequences using RNA-BERTa
         - **Drug encoder**: Processes molecular SMILES notation using ChemBERTa
@@ -597,18 +599,22 @@ with gr.Blocks(title="Drug-Target Interaction Predictor", theme=gr.themes.Soft()
         ### Input requirements:
         - **Target sequence**: RNA sequence of the target (nucleotide sequences: A, U, G, C)
         - **Drug SMILES**: Simplified Molecular Input Line Entry System notation
         ### Model features:
         - Cross-attention for drug-target interaction modeling
         - Dropout for regularization
         - Layer normalization for stable training
         - Interpretability mode for contribution and attention visualization
         ### Usage tips:
         1. Load a trained model using the Model Settings tab (optional)
         2. Enter a RNA sequence and drug SMILES in the Prediction & Analysis tab
-        3. Click "Predict Interaction" for binding affinity prediction only
-        4. Click "Generate Visualizations" to create detailed interaction analysis - results will appear in the Visualizations tab
         For best results, ensure your input sequences are properly formatted and within reasonable length limits (max 512 tokens).

             logger.error(f"Error loading model: {str(e)}")
             return False
+    def get_target_and_smiles(self, target_sequence, drug_smiles):
+        # Tokenize inputs
+        target_inputs = self.target_tokenizer(
+            target_sequence,
+            padding="max_length",
+            truncation=True,
+            max_length=512,
+            return_tensors="pt"
+        ).to(self.device)
+        all_smiles = []
+        for smiles in drug_smiles:
             drug_inputs = self.drug_tokenizer(
+                smiles.strip(),
                 padding="max_length",
                 truncation=True,
                 max_length=512,
                 return_tensors="pt"
             ).to(self.device)
+            all_smiles.append(drug_inputs)
+        return target_inputs, all_smiles
+    def predict_interaction(self, target_sequence, drug_smiles):
+        """Predict drug-target interaction"""
+        if self.model is None:
+            return "Error: Model not loaded. Please load a model first."
+        try:
+            target_inputs, all_drug_inputs = self.get_target_and_smiles(target_sequence, drug_smiles)
+            to_return  =[]
             # Make prediction
             self.model.INTERPR_DISABLE_MODE()
+            for smile_name, drug_inputs in zip(drug_smiles, all_drug_inputs):
+                with torch.no_grad():
+                    prediction = self.model(target_inputs, drug_inputs)
+                    # Unscale if scaler exists
+                    if self.model.scaler is not None:
+                        prediction = self.model.unscale(prediction)
+                    prediction_value = prediction.cpu().numpy()[0][0]
+                    to_return.append(f"{smile_name} predicted pKd: {prediction_value:.4f}")
+            return "\n".join(to_return)
         except Exception as e:
             logger.error(f"Prediction error: {str(e)}")
             return f"Error during prediction: {str(e)}"
             return None, None, None, "Error: Model not loaded. Please load a model first."
         try:
+            target_inputs, all_drug_inputs = self.get_target_and_smiles(target_sequence, drug_smiles)
+            to_return = []
+            # Make prediction
             self.model.INTERPR_ENABLE_MODE()
+            for smile_name, drug_inputs in zip(drug_smiles, all_drug_inputs):
+                # Make prediction and extract visualization data
+                with torch.no_grad():
+                    prediction = self.model(target_inputs, drug_inputs)
+                    # Unscale if scaler exists
+                    if self.model.scaler is not None:
+                        prediction = self.model.unscale(prediction)
+                    prediction_value = prediction.cpu().numpy()[0][0]
+                    # Extract data needed for visualizations
+                    presum_values = self.model.model.presum_layer  # Shape: (1, seq_len)
+                    cross_attention_weights = self.model.model.crossattention_weights  # Shape: (batch, heads, seq_len, seq_len)
+                    # Get model parameters for scaling
+                    w = self.model.model.w.squeeze(1)
+                    b = self.model.model.b
+                    scaler = self.model.model.scaler
+                    to_return.append(f"{smile_name} predicted pKd: {prediction_value:.4f}")
+            status_msg = "\n".join(to_return)
             # Generate visualizations
             try:
                 # 1. Cross-attention heatmap
                     text="Raw Contribution\nSkipped (pKd ≤ 0)"
                 )
             if prediction_value <= 0:
                 status_msg += " (Raw contribution visualization skipped due to non-positive pKd)"
             if cross_attention_weights is None:
     if not target_seq.strip() or not drug_smiles.strip():
         return "Please provide both target sequence and drug SMILES."
+    return app.predict_interaction(target_seq, drug_smiles.split("\n"))
 def visualize_wrapper(target_seq, drug_smiles):
     """Wrapper function for visualization"""
     if not target_seq.strip() or not drug_smiles.strip():
         return None, None, None, "Please provide both target sequence and drug SMILES."
+    return app.visualize_interaction(target_seq, drug_smiles.split("\n"))
 def load_model_wrapper(model_path):
     """Wrapper function to load model"""
                 drug_input = gr.Textbox(
                     label="Drug SMILES",
+                    placeholder="Enter SMILES notation for one or more drugs.\n"
+                        "For multiple SMILES, enter each on a new line:\n"
+                        "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O\n"
+                        "C1CCCCC1O",
+                    lines=3
                 )
                 with gr.Row():
             img1, img2, img3, status = visualize_wrapper(target_seq, drug_smiles)
             # Combine prediction result with visualization status
             combined_status = status + "\n\nVisualization analysis complete. Please navigate to the Visualizations tab to view the generated images."
+            if len(drug_smiles) > 1:
+                combined_status +="\nVisualizations are shown only for the last SMILES entry."
             return img1, img2, img3, combined_status
         visualize_btn.click(
             fn=visualize_and_update,
             inputs=[target_input, drug_input],
+            outputs=[viz_state1, viz_state2, viz_state3, prediction_output],
+            api_name="visualize_and_update"  # Make this API accessible
         )
     with gr.Tab("📊 Visualizations"):
         gr.Markdown("""
         ## About this application
+        This application implements DLRNA-BERTa, a Dual Language RoBERTa Transformer model for predicting drug to RNA target interactions. The model architecture includes:
         - **Target encoder**: Processes RNA sequences using RNA-BERTa
         - **Drug encoder**: Processes molecular SMILES notation using ChemBERTa
         ### Input requirements:
         - **Target sequence**: RNA sequence of the target (nucleotide sequences: A, U, G, C)
         - **Drug SMILES**: Simplified Molecular Input Line Entry System notation
+          - **Batch prediction**: Enter multiple SMILES strings, one per line, to predict binding affinity for multiple drugs against the same target
         ### Model features:
         - Cross-attention for drug-target interaction modeling
         - Dropout for regularization
         - Layer normalization for stable training
         - Interpretability mode for contribution and attention visualization
+        - Support for batch predictions with multiple SMILES entries
         ### Usage tips:
         1. Load a trained model using the Model Settings tab (optional)
         2. Enter a RNA sequence and drug SMILES in the Prediction & Analysis tab
+        3. For batch predictions, enter multiple SMILES strings (one per line) in the drug SMILES field
+        4. Click "Predict Interaction" for binding affinity prediction only
+        5. Click "Generate Visualizations" to create detailed interaction analysis - results will appear in the Visualizations tab
+        6. Note: Visualizations are generated only for the last SMILES entry when using batch mode
         For best results, ensure your input sequences are properly formatted and within reasonable length limits (max 512 tokens).