Spaces:

hemantn
/

ablang2_seq_restore

Sleeping

App Files Files Community

hemantn commited on Jul 27, 2025

Commit

0e2f128

1 Parent(s): 3b6231c

deoloyment file added

Browse files

Files changed (5) hide show

LICENSE +21 -0
README_Spaces.md +55 -0
adapter.py +306 -0
app.py +330 -0
requirements.txt +6 -0

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2024 hemantn
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README_Spaces.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# 🧬 AbLang2 Sequence Restorer - Hugging Face Spaces
+This is a Gradio web application that provides the AbLang2 sequence restoration utility through Hugging Face Spaces.
+## 🎯 What it does
+The AbLang2 Sequence Restorer allows you to:
+- **Restore masked residues** (*) in antibody sequences
+- **Work with paired sequences** (heavy and light chains)
+- **Handle single chains** (heavy or light chain only)
+- **Use alignment** for variable missing lengths
+## 🚀 How to use
+1. **Enter sequences**: Provide heavy chain, light chain, or both sequences
+2. **Mask residues**: Use `*` to indicate residues you want to restore
+3. **Choose alignment**: Enable "Use Alignment" for variable missing lengths
+4. **Get results**: Click "Restore Sequences" to get the restored antibody sequences
+## 📝 Example Usage
+### Example 1: Both chains with masked residues
+- **Heavy Chain**: `EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS`
+- **Light Chain**: `DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK`
+### Example 2: Heavy chain only
+- **Heavy Chain**: `EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMGWVRQAPGKGLEWVSAISGSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCARDY**GMDVWGQGTTVTVSS`
+- **Light Chain**: (leave empty)
+## 🔧 Technical Details
+- **Model**: AbLang2 from Hugging Face Hub (`hemantn/ablang2`)
+- **Framework**: Gradio for the web interface
+- **Backend**: PyTorch with Transformers library
+- **Processing**: Automatic GPU acceleration when available
+## 📚 Related Resources
+- **Original AbLang2**: [https://github.com/TobiasHeOl/AbLang2](https://github.com/TobiasHeOl/AbLang2)
+- **Model Repository**: [https://huggingface.co/hemantn/ablang2](https://huggingface.co/hemantn/ablang2)
+- **Full Documentation**: See the main README.md for comprehensive usage examples
+## 🤝 Citation
+If you use this tool in your research, please cite the original AbLang2 paper:
+```
+@article{Olsen2024,
+  title={Addressing the antibody germline bias and its effect on language models for improved antibody design},
+  author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane},
+  journal={bioRxiv},
+  doi={https://doi.org/10.1101/2024.02.02.578678},
+  year={2024}
+}
+```

adapter.py ADDED Viewed

	@@ -0,0 +1,306 @@

+from ablang2.pretrained_utils.restoration import AbRestore
+from ablang2.pretrained_utils.encodings import AbEncoding
+from ablang2.pretrained_utils.alignment import AbAlignment
+from ablang2.pretrained_utils.scores import AbScores
+import torch
+import numpy as np
+from ablang2.pretrained_utils.extra_utils import res_to_seq, res_to_list
+class HuggingFaceTokenizerAdapter:
+    def __init__(self, tokenizer, device):
+        self.tokenizer = tokenizer
+        self.device = device
+        self.pad_token_id = tokenizer.pad_token_id
+        self.mask_token_id = getattr(tokenizer, 'mask_token_id', None) or tokenizer.convert_tokens_to_ids(tokenizer.mask_token)
+        self.vocab = tokenizer.get_vocab() if hasattr(tokenizer, 'get_vocab') else tokenizer.vocab
+        self.inv_vocab = {v: k for k, v in self.vocab.items()}
+        self.all_special_tokens = tokenizer.all_special_tokens
+    def __call__(self, seqs, pad=True, w_extra_tkns=False, device=None, mode=None):
+        tokens = self.tokenizer(seqs, padding=True, return_tensors='pt')
+        input_ids = tokens['input_ids'].to(self.device if device is None else device)
+        if mode == 'decode':
+            # seqs is a tensor of token ids
+            if isinstance(seqs, torch.Tensor):
+                seqs = seqs.cpu().numpy()
+            decoded = []
+            for i, seq in enumerate(seqs):
+                chars = [self.inv_vocab.get(int(t), '') for t in seq if self.inv_vocab.get(int(t), '') not in {'-', '*', '<', '>'} and self.inv_vocab.get(int(t), '') != '']
+                # Use res_to_seq for formatting, pass (sequence, length) tuple as in original code
+                # The length is not always available, so use len(chars) as fallback
+                formatted = res_to_seq([ ''.join(chars), len(chars) ], mode='restore')
+                decoded.append(formatted)
+            return decoded
+        return input_ids
+class HFAbRestore(AbRestore):
+    def __init__(self, hf_model, hf_tokenizer, spread=11, device='cpu', ncpu=1):
+        super().__init__(spread=spread, device=device, ncpu=ncpu)
+        self.used_device = device
+        self._hf_model = hf_model
+        self.tokenizer = HuggingFaceTokenizerAdapter(hf_tokenizer, device)
+    @property
+    def AbLang(self):
+        def model_call(x):
+            output = self._hf_model(x)
+            if hasattr(output, 'last_hidden_state'):
+                return output.last_hidden_state
+            return output
+        return model_call
+def add_angle_brackets(seq):
+    # Assumes input is 'VH|VL' or 'VH|' or '|VL'
+    if '|' in seq:
+        vh, vl = seq.split('|', 1)
+    else:
+        vh, vl = seq, ''
+    return f"<{vh}>|<{vl}>"
+class AbLang2PairedHuggingFaceAdapter(AbEncoding, AbRestore, AbAlignment, AbScores):
+    """
+    Adapter to use pretrained utilities with a HuggingFace-loaded ablang2_paired model and tokenizer.
+    Automatically uses CUDA if available, otherwise CPU.
+    """
+    def __init__(self, model, tokenizer, device=None, ncpu=1):
+        super().__init__()
+        if device is None:
+            self.used_device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
+        else:
+            self.used_device = torch.device(device)
+        self.AbLang = model  # HuggingFace model instance
+        self.tokenizer = tokenizer
+        self.AbLang.to(self.used_device)
+        self.AbLang.eval()
+        # Always get AbRep from the underlying model
+        if hasattr(self.AbLang, 'model') and hasattr(self.AbLang.model, 'AbRep'):
+            self.AbRep = self.AbLang.model.AbRep
+        else:
+            raise AttributeError("Could not find AbRep in the HuggingFace model or its underlying model.")
+        self.ncpu = ncpu
+        self.spread = 11  # For compatibility with original utilities
+        # The following is no longer needed since all_special_tokens now returns IDs directly
+        # self.tokenizer.all_special_token_ids = [
+        #     self.tokenizer.convert_tokens_to_ids(tok) for tok in self.tokenizer.all_special_tokens
+        # ]
+        # self.tokenizer._all_special_tokens_str = self.tokenizer.all_special_tokens
+        # self.tokenizer.all_special_tokens = [
+        #     self.tokenizer.convert_tokens_to_ids(tok) for tok in self.tokenizer._all_special_tokens_str
+        # ]
+    def freeze(self):
+        self.AbLang.eval()
+    def unfreeze(self):
+        self.AbLang.train()
+    def _encode_sequences(self, seqs):
+        # Use HuggingFace-style padding and return PyTorch tensors
+        tokens = self.tokenizer(seqs, padding=True, return_tensors='pt')
+        tokens = extract_input_ids(tokens, self.used_device)
+        return self.AbRep(tokens).last_hidden_states.detach()
+    def _predict_logits(self, seqs):
+        tokens = self.tokenizer(seqs, padding=True, return_tensors='pt')
+        tokens = extract_input_ids(tokens, self.used_device)
+        output = self.AbLang(tokens)
+        if hasattr(output, 'last_hidden_state'):
+            return output.last_hidden_state.detach()
+        return output.detach()
+    def _preprocess_labels(self, labels):
+        labels = extract_input_ids(labels, self.used_device)
+        return labels
+    def __call__(self, seqs, mode='seqcoding', align=False, stepwise_masking=False, fragmented=False, batch_size=50):
+        """
+        Use different modes for different usecases, mimicking the original pretrained class.
+        """
+        from ablang2.pretrained import format_seq_input
+        valid_modes = [
+            'rescoding', 'seqcoding', 'restore', 'likelihood', 'probability',
+            'pseudo_log_likelihood', 'confidence'
+        ]
+        if mode not in valid_modes:
+            raise SyntaxError(f"Given mode doesn't exist. Please select one of the following: {valid_modes}.")
+        seqs, chain = format_seq_input(seqs, fragmented=fragmented)
+        if align:
+            numbered_seqs, seqs, number_alignment = self.number_sequences(
+                seqs, chain=chain, fragmented=fragmented
+            )
+        else:
+            numbered_seqs = None
+            number_alignment = None
+        subset_list = []
+        for subset in [seqs[x:x+batch_size] for x in range(0, len(seqs), batch_size)]:
+            subset_list.append(getattr(self, mode)(subset, align=align, stepwise_masking=stepwise_masking))
+        return self.reformat_subsets(
+            subset_list,
+            mode=mode,
+            align=align,
+            numbered_seqs=numbered_seqs,
+            seqs=seqs,
+            number_alignment=number_alignment,
+        )
+    def pseudo_log_likelihood(self, seqs, **kwargs):
+        """
+        Original (non-vectorized) pseudo log-likelihood computation matching notebook behavior.
+        """
+        # Format input: join VH and VL with '|'
+        formatted_seqs = []
+        for s in seqs:
+            if isinstance(s, (list, tuple)):
+                formatted_seqs.append('|'.join(s))
+            else:
+                formatted_seqs.append(s)
+        # Tokenize all sequences in batch
+        labels = self.tokenizer(
+            formatted_seqs, padding=True, return_tensors='pt'
+        )
+        labels = extract_input_ids(labels, self.used_device)
+        # Convert special tokens to IDs
+        if isinstance(self.tokenizer.all_special_tokens[0], int):
+            special_token_ids = set(self.tokenizer.all_special_tokens)
+        else:
+            special_token_ids = set(self.tokenizer.convert_tokens_to_ids(tok) for tok in self.tokenizer.all_special_tokens)
+        pad_token_id = self.tokenizer.pad_token_id
+        mask_token_id = getattr(self.tokenizer, 'mask_token_id', None)
+        if mask_token_id is None:
+            mask_token_id = self.tokenizer.convert_tokens_to_ids(self.tokenizer.mask_token)
+        plls = []
+        with torch.no_grad():
+            for i, seq_label in enumerate(labels):
+                seq_pll = []
+                for j, token_id in enumerate(seq_label):
+                    if token_id.item() in special_token_ids or token_id.item() == pad_token_id:
+                        continue
+                    masked = seq_label.clone()
+                    masked[j] = mask_token_id
+                    logits = self.AbLang(masked.unsqueeze(0))
+                    if hasattr(logits, 'last_hidden_state'):
+                        logits = logits.last_hidden_state
+                    logits = logits[0, j]
+                    nll = torch.nn.functional.cross_entropy(
+                        logits.unsqueeze(0), token_id.unsqueeze(0), reduction="none"
+                    )
+                    seq_pll.append(-nll.item())
+                if seq_pll:
+                    plls.append(np.mean(seq_pll))
+                else:
+                    plls.append(float('nan'))
+        return np.array(plls)
+    def confidence(self, seqs, **kwargs):
+        """Confidence calculation - match original ablang2 implementation by excluding all special tokens from loss."""
+        # Format input: join VH and VL with '|'
+        formatted_seqs = []
+        for s in seqs:
+            if isinstance(s, (list, tuple)):
+                formatted_seqs.append('|'.join(s))
+            else:
+                formatted_seqs.append(s)
+        plls = []
+        for seq in formatted_seqs:
+            tokens = self.tokenizer([seq], padding=True, return_tensors='pt')
+            input_ids = extract_input_ids(tokens, self.used_device)
+            with torch.no_grad():
+                output = self.AbLang(input_ids)
+                if hasattr(output, 'last_hidden_state'):
+                    logits = output.last_hidden_state
+                else:
+                    logits = output
+                # Get the sequence (remove batch dimension)
+                logits = logits[0]  # [seq_len, vocab_size]
+                input_ids = input_ids[0]  # [seq_len]
+                # Exclude all special tokens (pad, mask, etc.)
+                if isinstance(self.tokenizer.all_special_tokens[0], int):
+                    special_token_ids = set(self.tokenizer.all_special_tokens)
+                else:
+                    special_token_ids = set(self.tokenizer.convert_tokens_to_ids(tok) for tok in self.tokenizer.all_special_tokens)
+                valid_mask = ~torch.isin(input_ids, torch.tensor(list(special_token_ids), device=input_ids.device))
+                if valid_mask.sum() > 0:
+                    valid_logits = logits[valid_mask]
+                    valid_labels = input_ids[valid_mask]
+                    # Calculate cross-entropy loss
+                    nll = torch.nn.functional.cross_entropy(
+                        valid_logits,
+                        valid_labels,
+                        reduction="mean"
+                    )
+                    pll = -nll.item()
+                else:
+                    pll = 0.0
+                plls.append(pll)
+        return np.array(plls, dtype=np.float32)
+    def probability(self, seqs, align=False, stepwise_masking=False, **kwargs):
+        """
+        Probability of mutations - applies softmax to logits to get probabilities
+        """
+        # Format input: join VH and VL with '|'
+        formatted_seqs = []
+        for s in seqs:
+            if isinstance(s, (list, tuple)):
+                formatted_seqs.append('|'.join(s))
+            else:
+                formatted_seqs.append(s)
+        # Get logits
+        if stepwise_masking:
+            # For stepwise masking, we need to implement it similar to likelihood
+            # This is a simplified version - you might want to implement full stepwise masking
+            logits = self._predict_logits(formatted_seqs)
+        else:
+            logits = self._predict_logits(formatted_seqs)
+        # Apply softmax to get probabilities
+        probs = logits.softmax(-1).cpu().numpy()
+        if align:
+            return probs
+        else:
+            # Return residue-level probabilities (excluding special tokens)
+            return [res_to_list(state, seq) for state, seq in zip(probs, formatted_seqs)]
+    def restore(self, seqs, align=False, **kwargs):
+        hf_abrestore = HFAbRestore(self.AbLang, self.tokenizer, spread=self.spread, device=self.used_device, ncpu=self.ncpu)
+        restored = hf_abrestore.restore(seqs, align=align)
+        # Apply angle brackets formatting
+        if isinstance(restored, np.ndarray):
+            restored = np.array([add_angle_brackets(seq) for seq in restored])
+        else:
+            restored = [add_angle_brackets(seq) for seq in restored]
+        return restored
+def extract_input_ids(tokens, device):
+    if hasattr(tokens, 'input_ids'):
+        return tokens.input_ids.to(device)
+    elif isinstance(tokens, dict):
+        if 'input_ids' in tokens:
+            return tokens['input_ids'].to(device)
+        else:
+            for v in tokens.values():
+                if hasattr(v, 'ndim') or torch.is_tensor(v):
+                    return v.to(device)
+    elif torch.is_tensor(tokens):
+        return tokens.to(device)
+    else:
+        raise ValueError("Could not extract input_ids from tokenizer output")

app.py ADDED Viewed

	@@ -0,0 +1,330 @@

+import gradio as gr
+import sys
+import os
+from transformers import AutoModel, AutoTokenizer
+from transformers.utils import cached_file
+# Load model and tokenizer from Hugging Face Hub
+model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
+# Find the cached model directory and import adapter
+adapter_path = cached_file("hemantn/ablang2", "adapter.py")
+cached_model_dir = os.path.dirname(adapter_path)
+sys.path.insert(0, cached_model_dir)
+# Import and create the adapter
+from adapter import AbLang2PairedHuggingFaceAdapter
+ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
+def restore_sequences(heavy_chain, light_chain, use_align=False):
+    """
+    Restore masked residues in antibody sequences.
+    Args:
+        heavy_chain (str): Heavy chain sequence with masked residues (*)
+        light_chain (str): Light chain sequence with masked residues (*)
+        use_align (bool): Whether to use alignment for variable missing lengths
+    Returns:
+        tuple: (restored_heavy, restored_light, highlighted_heavy, highlighted_light)
+    """
+    try:
+        # Prepare input sequences
+        if heavy_chain.strip() and light_chain.strip():
+            # Both chains provided
+            sequences = [[heavy_chain.strip(), light_chain.strip()]]
+        elif heavy_chain.strip():
+            # Only heavy chain provided
+            sequences = [[heavy_chain.strip(), ""]]
+        elif light_chain.strip():
+            # Only light chain provided
+            sequences = [["", light_chain.strip()]]
+        else:
+            return "Please provide at least one antibody chain sequence.", "", "", ""
+        # Perform restoration
+        restored = ablang(sequences, mode='restore', align=use_align)
+        # Format output
+        if hasattr(restored, '__len__') and len(restored) > 0:
+            result = restored[0]  # Get the first (and only) result
+            # Parse the result to separate heavy and light chains
+            if '>|<' in result:
+                # Both chains present
+                heavy_part = result.split('>|<')[0].replace('<', '').replace('>', '')
+                light_part = result.split('>|<')[1].replace('<', '').replace('>', '')
+            elif result.startswith('<') and result.endswith('>'):
+                # Only one chain present
+                if heavy_chain.strip():
+                    heavy_part = result.replace('<', '').replace('>', '')
+                    light_part = ""
+                else:
+                    heavy_part = ""
+                    light_part = result.replace('<', '').replace('>', '')
+            else:
+                return "Error: Unexpected result format.", "", "", ""
+            # Create highlighted versions
+            highlighted_heavy = highlight_restored_residues(heavy_chain.strip(), heavy_part)
+            highlighted_light = highlight_restored_residues(light_chain.strip(), light_part)
+            # Create HTML outputs with proper styling - no scroll, wrap text
+            heavy_html = f'<div class="restored-sequence-box" style="padding: 10px; background-color: #f8f9fa; border: 1px solid #dee2e6; border-radius: 4px;">{highlighted_heavy}</div>'
+            light_html = f'<div class="restored-sequence-box" style="padding: 10px; background-color: #f8f9fa; border: 1px solid #dee2e6; border-radius: 4px;">{highlighted_light}</div>'
+            return heavy_html, light_html
+        else:
+            return "Error: No restoration result obtained.", "", ""
+    except Exception as e:
+        return f"Error during restoration: {str(e)}", "", ""
+def highlight_restored_residues(original_seq, restored_seq):
+    """
+    Highlight restored residues in green.
+    """
+    if not original_seq or not restored_seq:
+        return restored_seq
+    highlighted = ""
+    for i, (orig_char, rest_char) in enumerate(zip(original_seq, restored_seq)):
+        if orig_char == '*' and rest_char != '*':
+            # This residue was restored
+            highlighted += f'<span class="restored-highlight">{rest_char}</span>'
+        else:
+            highlighted += rest_char
+    # Add any remaining characters from restored sequence
+    if len(restored_seq) > len(original_seq):
+        highlighted += restored_seq[len(original_seq):]
+    return highlighted
+# Create Gradio interface
+with gr.Blocks(title="AbLang2 Sequence Restorer", theme=gr.themes.Soft(), css="""
+    * {
+        font-family: 'Courier New', monospace !important;
+    }
+    .sequence-input, .sequence-output {
+        font-family: 'Courier New', monospace !important;
+        font-size: 14px !important;
+        letter-spacing: 0.5px !important;
+    }
+    .restored-highlight {
+        background-color: #90EE90 !important;
+        color: #000 !important;
+        font-weight: bold !important;
+    }
+    .examples {
+        font-family: 'Courier New', monospace !important;
+        font-size: 14px !important;
+        letter-spacing: 0.5px !important;
+    }
+    .restored-sequence-box {
+        font-family: 'Courier New', monospace !important;
+        font-size: 14px !important;
+        letter-spacing: 0.5px !important;
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+    }
+    .restored-heading {
+        color: #2E8B57 !important;
+        font-weight: bold !important;
+        font-size: 18px !important;
+    }
+    .example-text {
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+    }
+    .examples-table {
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        max-width: none !important;
+        overflow: visible !important;
+    }
+    .examples-table td {
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        max-width: none !important;
+        overflow: visible !important;
+        text-overflow: unset !important;
+    }
+    .sequence-output label {
+        font-weight: bold !important;
+        color: #495057 !important;
+        font-size: 14px !important;
+        margin-bottom: 5px !important;
+    }
+    /* Force full display of examples */
+    .examples-container {
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+    }
+    .examples-container table {
+        width: 100% !important;
+        table-layout: auto !important;
+    }
+    .examples-container td {
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+        max-width: none !important;
+        text-overflow: unset !important;
+        padding: 8px !important;
+        vertical-align: top !important;
+    }
+    .examples-container th {
+        white-space: nowrap !important;
+        padding: 8px !important;
+    }
+    /* Override any Gradio default truncation */
+    .examples table td {
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+        max-width: none !important;
+        text-overflow: unset !important;
+        overflow: visible !important;
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+    }
+    .examples table {
+        table-layout: auto !important;
+        width: 100% !important;
+    }
+    /* Target the specific examples component */
+    div[data-testid="examples"] table td {
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+        max-width: none !important;
+        text-overflow: unset !important;
+        overflow: visible !important;
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+    }
+    /* Force examples to show full content */
+    .examples table, .examples table td, .examples table th {
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+        max-width: none !important;
+        text-overflow: unset !important;
+        overflow: visible !important;
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+        table-layout: auto !important;
+        width: auto !important;
+        min-width: 100% !important;
+    }
+    /* Override any inline styles */
+    .examples * {
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+        max-width: none !important;
+        text-overflow: unset !important;
+        overflow: visible !important;
+    }
+    /* Style output labels to match input labels exactly */
+    .output-label {
+        font-weight: 600 !important;
+        color: var(--label-text-color) !important;
+        font-size: 14px !important;
+        margin-bottom: 8px !important;
+        margin-top: 16px !important;
+        line-height: 1.4 !important;
+        display: block !important;
+    }
+""") as demo:
+    gr.Markdown("""
+    # 🧬 AbLang2 Sequence Restorer
+    This app uses the AbLang2 model to restore masked residues (*) in antibody sequences.
+    You can provide either one or both heavy and light chain sequences.
+    **Instructions:**
+    - Use `*` to mask residues you want to restore
+    - Provide heavy chain, light chain, or both
+    - Enable "Use Alignment" for variable missing lengths
+    """)
+    with gr.Row():
+        with gr.Column():
+            heavy_input = gr.Textbox(
+                label="Heavy Chain Sequence",
+                placeholder="Enter heavy chain sequence with masked residues (*)...",
+                lines=3,
+                max_lines=5,
+                elem_classes=["sequence-input"]
+            )
+            light_input = gr.Textbox(
+                label="Light Chain Sequence",
+                placeholder="Enter light chain sequence with masked residues (*)...",
+                lines=3,
+                max_lines=5,
+                elem_classes=["sequence-input"]
+            )
+            align_checkbox = gr.Checkbox(
+                label="Use Alignment (for variable missing lengths)",
+                value=False
+            )
+            restore_btn = gr.Button("🔄 Restore Sequences", variant="primary")
+        with gr.Column():
+            gr.Markdown("### 🧬 Restored Sequences", elem_classes=["restored-heading"])
+            gr.Markdown("*Green highlighting shows restored residues*")
+            gr.Markdown("**Heavy Chain Sequence**", elem_classes=["output-label"])
+            heavy_output = gr.HTML(label="")
+            gr.Markdown("**Light Chain Sequence**", elem_classes=["output-label"])
+            light_output = gr.HTML(label="")
+    # Example sequences
+    gr.Examples(
+        examples=[
+            [
+                "EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS",
+                "DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK"
+            ],
+            [
+                "EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMGWVRQAPGKGLEWVSAISGSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCARDY**GMDVWGQGTTVTVSS",
+                ""
+            ],
+            [
+                "",
+                "DIQLTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKLLIY*ASSLQSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQSYSTP*TFGQGTKVEIK"
+            ]
+        ],
+        inputs=[heavy_input, light_input],
+        label="Example Sequences"
+    )
+    # Connect the button to the function
+    restore_btn.click(
+        fn=restore_sequences,
+        inputs=[heavy_input, light_input, align_checkbox],
+        outputs=[heavy_output, light_output]
+    )
+    gr.Markdown("""
+    ---
+    **Note:** This app uses the AbLang2 model from Hugging Face Hub.
+    The restoration process may take a few seconds depending on sequence length and complexity.
+    """)
+if __name__ == "__main__":
+    demo.launch()

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+gradio>=4.0.0
+transformers>=4.30.0
+torch>=2.0.0
+numpy>=1.21.0
+pandas>=1.3.0
+anarci>=1.3