---
license: gpl-3.0
language:
- en
base_model:
- facebook/esm2_t33_650M_UR50D
tags:
- Ion Channel Impairing Proteins
- neurotoxicity
- protein-classification
- therapeutic-peptides
- bioinformatics
- esm2
- transformer
---

# 🧠 IonNTxPred: LLM-based Prediction and Designing of Ion Channel Impairing Proteins

IonNTxPred is a fine-tuned transformer model built on top of the [esm2_t33_650M_UR50D](https://huggingface.co/facebook/esm2_t33_650M_UR50D) protein language model. It is specifically trained for **binary classification** of peptide sequences — predicting whether a peptide is **ion channel modulating** or **non-modulating**.

🎯 **Use Case:** Accelerating the identification and design of safe peptide therapeutics by filtering out ion channel impairing/modulating candidates early in the drug development pipeline.

---

### 🖼️ IonNTxPred Workflow

![IonNTxPred Workflow](https://webs.iiitd.edu.in/raghava/ionntxpred/images/IonNTxPred.png)

---

## 🧬 Model Highlights

- **Base Model:** Facebook’s ESM2-t33 (650M parameters)
- **Fine-Tuning Task:** Ion channel toxins prediction (binary classification)
- **Input:** Protein/Peptide sequences
- **Output:** Binary label → `1` (Ion Channel Modulating), `0` (non-modulating)
- **Architecture:** ESM2 encoder + linear classification head

---

## 🗂️ Files Included

 
- `config.json` – Contains configuration settings for the model architecture, hyperparameters, and training details.

- `model.safetensors` – This is the actual trained model weights saved in the SafeTensors format, which is safer and faster than the traditional .bin files.

- `special_tokens_map.json` – Stores mappings for special tokens, like [CLS], [SEP], or any custom tokens used in your tokenizer.

- `tokenizer_config.json` – Contains tokenizer-related settings (like vocabulary size, tokenization method).

- `vocab.txt` – Lists all tokens and their corresponding IDs; it's essential for text tokenization.
---

# 🚀 How to Use
---
# Predict Sodium channel modulating proteins

### 🔧 Install Dependencies

```bash
pip install torch esm biopython huggingface_hub


### Loading the Model from Hugging Face

```python
import torch
import esm
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers import AutoTokenizer, EsmForSequenceClassification

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


print("Downloading fine-tuned models & weights...")
repo_id = "anandr88/IonNTxPred"
subfolder = "saved_model_t33_na"
# Load the tokenizer and model from Hugging Face
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subfolder)
model = EsmForSequenceClassification.from_pretrained(repo_id, subfolder=subfolder)
weights_path = hf_hub_download(repo_id=repo_id, filename="saved_model_t33_na/model.safetensors")

# Create a simple classifier model
class ProteinClassifier(torch.nn.Module):
    def __init__(self, esm_model):
        super().__init__()
        self.esm_model = esm_model
        # We'll dynamically determine the classifier layer size
        self.classifier = None
    
    def forward(self, tokens):
        with torch.no_grad():
            results = self.esm_model(tokens, repr_layers=[33], return_contacts=False)
        embeddings = results["representations"][33].mean(1)
        return self.classifier(embeddings)

# Initialize model
classifier = ProteinClassifier(model)

# Load the state dict and determine architecture
state_dict = load_file(weights_path, device=str(device))

# Find the classifier layer (look for a weight matrix)
for key, tensor in state_dict.items():
    if len(tensor.shape) == 2:  # This should be the weight matrix
        num_classes = tensor.shape[0]
        embedding_dim = tensor.shape[1]
        print(f"Found classifier layer: {key} (input_dim={embedding_dim}, output_dim={num_classes})")
        
        # Initialize the classifier layer
        classifier.classifier = torch.nn.Linear(embedding_dim, num_classes).to(device)
        
        # Create new state dict with proper names
        new_state_dict = {
            'classifier.weight': state_dict[key],
            'classifier.bias': state_dict[key.replace('weight', 'bias')]
        }
        classifier.load_state_dict(new_state_dict, strict=False)
        break

# Move to device and set to eval mode
classifier = classifier.to(device)
classifier.eval()

print(f"\nModel successfully loaded on {device} and ready for inference!")
```


### 🧪 Example Usage (Optional)


```python
from transformers import AutoTokenizer, EsmForSequenceClassification
import torch

# Define the repository ID and subfolder
repo_id = "anandr88/IonNTxPred"
subfolder = "saved_model_t33_na"

# Load the tokenizer and model from Hugging Face
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subfolder)
model = EsmForSequenceClassification.from_pretrained(repo_id, subfolder=subfolder)

# Move the model to the appropriate device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Function to make predictions
def make_predictions(model, inputs, device):
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.softmax(logits, dim=1)[:, 1].cpu().numpy()
    return probs

# Example protein sequence
protein_sequence = "MKASTLVVIFIVIFITISSFSIHDVQASGVEKREQKDCLKKLKLCKENKDCCSKSCKRRGTNIEKRCR"

# Tokenize the input sequence
inputs = tokenizer(protein_sequence, return_tensors="pt", truncation=True, padding=True)
inputs = {key: value.to(device) for key, value in inputs.items()}

# Make predictions
prediction = make_predictions(model, inputs, device)

# Apply threshold for final classification
threshold = 0.5
final_prediction = "Na+ channel modulating" if prediction[0] > threshold else "Not Na+ channel modulating"

print(f"📊 Prediction Probability: {prediction[0]:.4f}")
print(f"🏷️ Final Prediction: {final_prediction}")
```

---

# Predict Potassium channel modulating proteins
---
### 🔧 Install Dependencies

```bash
pip install torch esm biopython huggingface_hub


### Loading the Model from Hugging Face

```python
import torch
import esm
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers import AutoTokenizer, EsmForSequenceClassification

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


print("Downloading fine-tuned models & weights...")
repo_id = "anandr88/IonNTxPred"
subfolder = "saved_model_t33_k"
# Load the tokenizer and model from Hugging Face
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subfolder)
model = EsmForSequenceClassification.from_pretrained(repo_id, subfolder=subfolder)
weights_path = hf_hub_download(repo_id=repo_id, filename="saved_model_t33_na/model.safetensors")

# Create a simple classifier model
class ProteinClassifier(torch.nn.Module):
    def __init__(self, esm_model):
        super().__init__()
        self.esm_model = esm_model
        # We'll dynamically determine the classifier layer size
        self.classifier = None
    
    def forward(self, tokens):
        with torch.no_grad():
            results = self.esm_model(tokens, repr_layers=[33], return_contacts=False)
        embeddings = results["representations"][33].mean(1)
        return self.classifier(embeddings)

# Initialize model
classifier = ProteinClassifier(model)

# Load the state dict and determine architecture
state_dict = load_file(weights_path, device=str(device))

# Find the classifier layer (look for a weight matrix)
for key, tensor in state_dict.items():
    if len(tensor.shape) == 2:  # This should be the weight matrix
        num_classes = tensor.shape[0]
        embedding_dim = tensor.shape[1]
        print(f"Found classifier layer: {key} (input_dim={embedding_dim}, output_dim={num_classes})")
        
        # Initialize the classifier layer
        classifier.classifier = torch.nn.Linear(embedding_dim, num_classes).to(device)
        
        # Create new state dict with proper names
        new_state_dict = {
            'classifier.weight': state_dict[key],
            'classifier.bias': state_dict[key.replace('weight', 'bias')]
        }
        classifier.load_state_dict(new_state_dict, strict=False)
        break

# Move to device and set to eval mode
classifier = classifier.to(device)
classifier.eval()

print(f"\nModel successfully loaded on {device} and ready for inference!")
```


### 🧪 Example Usage (Optional)


```python
from transformers import AutoTokenizer, EsmForSequenceClassification
import torch

# Define the repository ID and subfolder
repo_id = "anandr88/IonNTxPred"
subfolder = "saved_model_t33_k"

# Load the tokenizer and model from Hugging Face
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subfolder)
model = EsmForSequenceClassification.from_pretrained(repo_id, subfolder=subfolder)

# Move the model to the appropriate device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Function to make predictions
def make_predictions(model, inputs, device):
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.softmax(logits, dim=1)[:, 1].cpu().numpy()
    return probs

# Example protein sequence
protein_sequence = "MKASTLVVIFIVIFITISSFSIHDVQASGVEKREQKDCLKKLKLCKENKDCCSKSCKRRGTNIEKRCR"

# Tokenize the input sequence
inputs = tokenizer(protein_sequence, return_tensors="pt", truncation=True, padding=True)
inputs = {key: value.to(device) for key, value in inputs.items()}

# Make predictions
prediction = make_predictions(model, inputs, device)

# Apply threshold for final classification
threshold = 0.5
final_prediction = "K+ channel modulating" if prediction[0] > threshold else "Not K+ channel modulating"

print(f"📊 Prediction Probability: {prediction[0]:.4f}")
print(f"🏷️ Final Prediction: {final_prediction}")
```

---


# Predict Calcilum channel modulating proteins
---
### 🔧 Install Dependencies

```bash
pip install torch esm biopython huggingface_hub


### Loading the Model from Hugging Face

```python
import torch
import esm
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers import AutoTokenizer, EsmForSequenceClassification

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


print("Downloading fine-tuned models & weights...")
repo_id = "anandr88/IonNTxPred"
subfolder = "saved_model_t33_ca"
# Load the tokenizer and model from Hugging Face
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subfolder)
model = EsmForSequenceClassification.from_pretrained(repo_id, subfolder=subfolder)
weights_path = hf_hub_download(repo_id=repo_id, filename="saved_model_t33_ca/model.safetensors")

# Create a simple classifier model
class ProteinClassifier(torch.nn.Module):
    def __init__(self, esm_model):
        super().__init__()
        self.esm_model = esm_model
        # We'll dynamically determine the classifier layer size
        self.classifier = None
    
    def forward(self, tokens):
        with torch.no_grad():
            results = self.esm_model(tokens, repr_layers=[33], return_contacts=False)
        embeddings = results["representations"][33].mean(1)
        return self.classifier(embeddings)

# 4. Initialize model
classifier = ProteinClassifier(model)

# 5. Load the state dict and determine architecture
state_dict = load_file(weights_path, device=str(device))

# Find the classifier layer (look for a weight matrix)
for key, tensor in state_dict.items():
    if len(tensor.shape) == 2:  # This should be the weight matrix
        num_classes = tensor.shape[0]
        embedding_dim = tensor.shape[1]
        print(f"Found classifier layer: {key} (input_dim={embedding_dim}, output_dim={num_classes})")
        
        # Initialize the classifier layer
        classifier.classifier = torch.nn.Linear(embedding_dim, num_classes).to(device)
        
        # Create new state dict with proper names
        new_state_dict = {
            'classifier.weight': state_dict[key],
            'classifier.bias': state_dict[key.replace('weight', 'bias')]
        }
        classifier.load_state_dict(new_state_dict, strict=False)
        break

# Move to device and set to eval mode
classifier = classifier.to(device)
classifier.eval()

print(f"\nModel successfully loaded on {device} and ready for inference!")
```


### 🧪 Example Usage (Optional)


```python
from transformers import AutoTokenizer, EsmForSequenceClassification
import torch

# Define the repository ID and subfolder
repo_id = "anandr88/IonNTxPred"
subfolder = "saved_model_t33_ca"

# Load the tokenizer and model from Hugging Face
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subfolder)
model = EsmForSequenceClassification.from_pretrained(repo_id, subfolder=subfolder)

# Move the model to the appropriate device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Function to make predictions
def make_predictions(model, inputs, device):
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.softmax(logits, dim=1)[:, 1].cpu().numpy()
    return probs

# Example protein sequence
protein_sequence = "MKASTLVVIFIVIFITISSFSIHDVQASGVEKREQKDCLKKLKLCKENKDCCSKSCKRRGTNIEKRCR"

# Tokenize the input sequence
inputs = tokenizer(protein_sequence, return_tensors="pt", truncation=True, padding=True)
inputs = {key: value.to(device) for key, value in inputs.items()}

# Make predictions
prediction = make_predictions(model, inputs, device)

# Apply threshold for final classification
threshold = 0.5
final_prediction = "Ca++ channel modulating" if prediction[0] > threshold else "Not Ca++ channel modulating"

print(f"📊 Prediction Probability: {prediction[0]:.4f}")
print(f"🏷️ Final Prediction: {final_prediction}")
```

---
# Predict other ion channel modulating proteins
---
### 🔧 Install Dependencies

```bash
pip install torch esm biopython huggingface_hub


### Loading the Model from Hugging Face

```python
import torch
import esm
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers import AutoTokenizer, EsmForSequenceClassification

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")


print("Downloading fine-tuned models & weights...")
repo_id = "anandr88/IonNTxPred"
subfolder = "saved_model_t33_other"
# Load the tokenizer and model from Hugging Face
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subfolder)
model = EsmForSequenceClassification.from_pretrained(repo_id, subfolder=subfolder)
weights_path = hf_hub_download(repo_id=repo_id, filename="saved_model_t33_na/model.safetensors")

# Create a simple classifier model
class ProteinClassifier(torch.nn.Module):
    def __init__(self, esm_model):
        super().__init__()
        self.esm_model = esm_model
        # We'll dynamically determine the classifier layer size
        self.classifier = None
    
    def forward(self, tokens):
        with torch.no_grad():
            results = self.esm_model(tokens, repr_layers=[33], return_contacts=False)
        embeddings = results["representations"][33].mean(1)
        return self.classifier(embeddings)

# Initialize model
classifier = ProteinClassifier(model)

# Load the state dict and determine architecture
state_dict = load_file(weights_path, device=str(device))

# Find the classifier layer (look for a weight matrix)
for key, tensor in state_dict.items():
    if len(tensor.shape) == 2:  # This should be the weight matrix
        num_classes = tensor.shape[0]
        embedding_dim = tensor.shape[1]
        print(f"Found classifier layer: {key} (input_dim={embedding_dim}, output_dim={num_classes})")
        
        # Initialize the classifier layer
        classifier.classifier = torch.nn.Linear(embedding_dim, num_classes).to(device)
        
        # Create new state dict with proper names
        new_state_dict = {
            'classifier.weight': state_dict[key],
            'classifier.bias': state_dict[key.replace('weight', 'bias')]
        }
        classifier.load_state_dict(new_state_dict, strict=False)
        break

# Move to device and set to eval mode
classifier = classifier.to(device)
classifier.eval()

print(f"\nModel successfully loaded on {device} and ready for inference!")
```


### 🧪 Example Usage (Optional)

```python
from transformers import AutoTokenizer, EsmForSequenceClassification
import torch

# Define the repository ID and subfolder
repo_id = "anandr88/IonNTxPred"
subfolder = "saved_model_t33_other"

# Load the tokenizer and model from Hugging Face
tokenizer = AutoTokenizer.from_pretrained(repo_id, subfolder=subfolder)
model = EsmForSequenceClassification.from_pretrained(repo_id, subfolder=subfolder)

# Move the model to the appropriate device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
model.eval()

# Function to make predictions
def make_predictions(model, inputs, device):
    with torch.no_grad():
        outputs = model(**inputs)
        logits = outputs.logits
        probs = torch.softmax(logits, dim=1)[:, 1].cpu().numpy()
    return probs

# Example protein sequence
protein_sequence = "MKASTLVVIFIVIFITISSFSIHDVQASGVEKREQKDCLKKLKLCKENKDCCSKSCKRRGTNIEKRCR"

# Tokenize the input sequence
inputs = tokenizer(protein_sequence, return_tensors="pt", truncation=True, padding=True)
inputs = {key: value.to(device) for key, value in inputs.items()}

# Make predictions
prediction = make_predictions(model, inputs, device)

# Apply threshold for final classification
threshold = 0.5
final_prediction = "Other channel modulating" if prediction[0] > threshold else "Not Other channel modulating"

print(f"📊 Prediction Probability: {prediction[0]:.4f}")
print(f"🏷️ Final Prediction: {final_prediction}")
```

---

---

## 📊 Applications

- **Ion channel impairing proteins filtering** in therapeutic design  
- **Toxicity scanning** of synthetic peptides  
- **Dataset annotation** for bioactivity studies  
- **Educational use** in bioinformatics and deep learning for proteins  

---

## 🌐 Related Links

- 🔬 Project Web Server: [IonNTxpred Web Tool](http://webs.iiitd.edu.in/raghava/ionntxpred)
- 🧾 Documentation & Source: [GitHub – raghavagps/IonNTxPred](https://github.com/raghavagps/ionntxpred)

---

## 🧠 Citation

📖 Rathore et al.  
_A Large Language Model for Predicting Neurotoxic Peptides and Neurotoxins._  
**#Coming Soon#**

---

👨‍🔬 Start using **IonNTxPred** today to enhance your protein/peptide screening pipeline with the power of **transformer-based intelligence**!