hemantn commited on Jul 27, 2025

Commit

712d350

1 Parent(s): e1df3c0

Integrate utility files into main repository - make self-contained

Browse files

Files changed (28) hide show

README_Spaces.md +55 -0
__pycache__/__init__.cpython-310.pyc +0 -0
__pycache__/ablang_encodings.cpython-310.pyc +0 -0
__pycache__/ablang_encodings.cpython-312.pyc +0 -0
__pycache__/adapter.cpython-310.pyc +0 -0
__pycache__/adapter.cpython-312.pyc +0 -0
__pycache__/alignment.cpython-310.pyc +0 -0
__pycache__/alignment.cpython-312.pyc +0 -0
__pycache__/configuration_ablang2paired.cpython-310.pyc +0 -0
__pycache__/extra_utils.cpython-310.pyc +0 -0
__pycache__/extra_utils.cpython-312.pyc +0 -0
__pycache__/modeling_ablang2paired.cpython-310.pyc +0 -0
__pycache__/restoration.cpython-310.pyc +0 -0
__pycache__/restoration.cpython-312.pyc +0 -0
__pycache__/scores.cpython-310.pyc +0 -0
__pycache__/scores.cpython-312.pyc +0 -0
ablang_encodings.py +97 -0
adapter.py +5 -5
alignment.py +87 -0
app.py +336 -0
extra_utils.py +165 -0
requirements.txt +6 -0
restoration.py +96 -0
scores.py +98 -0
test_ablang2_HF_implementation.ipynb +35 -4
test_align.py +34 -0
test_app_output.py +56 -0
test_integrated_adapter.py +158 -0

README_Spaces.md ADDED Viewed

	@@ -0,0 +1,55 @@

+# 🧬 AbLang2 Sequence Restorer - Hugging Face Spaces
+This is a Gradio web application that provides the AbLang2 sequence restoration utility through Hugging Face Spaces.
+## 🎯 What it does
+The AbLang2 Sequence Restorer allows you to:
+- **Restore masked residues** (*) in antibody sequences
+- **Work with paired sequences** (heavy and light chains)
+- **Handle single chains** (heavy or light chain only)
+- **Use alignment** for variable missing lengths
+## 🚀 How to use
+1. **Enter sequences**: Provide heavy chain, light chain, or both sequences
+2. **Mask residues**: Use `*` to indicate residues you want to restore
+3. **Choose alignment**: Enable "Use Alignment" for variable missing lengths
+4. **Get results**: Click "Restore Sequences" to get the restored antibody sequences
+## 📝 Example Usage
+### Example 1: Both chains with masked residues
+- **Heavy Chain**: `EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS`
+- **Light Chain**: `DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK`
+### Example 2: Heavy chain only
+- **Heavy Chain**: `EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMGWVRQAPGKGLEWVSAISGSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCARDY**GMDVWGQGTTVTVSS`
+- **Light Chain**: (leave empty)
+## 🔧 Technical Details
+- **Model**: AbLang2 from Hugging Face Hub (`hemantn/ablang2`)
+- **Framework**: Gradio for the web interface
+- **Backend**: PyTorch with Transformers library
+- **Processing**: Automatic GPU acceleration when available
+## 📚 Related Resources
+- **Original AbLang2**: [https://github.com/TobiasHeOl/AbLang2](https://github.com/TobiasHeOl/AbLang2)
+- **Model Repository**: [https://huggingface.co/hemantn/ablang2](https://huggingface.co/hemantn/ablang2)
+- **Full Documentation**: See the main README.md for comprehensive usage examples
+## 🤝 Citation
+If you use this tool in your research, please cite the original AbLang2 paper:
+```
+@article{Olsen2024,
+  title={Addressing the antibody germline bias and its effect on language models for improved antibody design},
+  author={Tobias H. Olsen, Iain H. Moal and Charlotte M. Deane},
+  journal={bioRxiv},
+  doi={https://doi.org/10.1101/2024.02.02.578678},
+  year={2024}
+}
+```

__pycache__/__init__.cpython-310.pyc ADDED Viewed

Binary file (418 Bytes). View file

__pycache__/ablang_encodings.cpython-310.pyc ADDED Viewed

Binary file (3.73 kB). View file

__pycache__/ablang_encodings.cpython-312.pyc ADDED Viewed

Binary file (5.64 kB). View file

__pycache__/adapter.cpython-310.pyc ADDED Viewed

Binary file (10.3 kB). View file

__pycache__/adapter.cpython-312.pyc ADDED Viewed

Binary file (17 kB). View file

__pycache__/alignment.cpython-310.pyc ADDED Viewed

Binary file (2.98 kB). View file

__pycache__/alignment.cpython-312.pyc ADDED Viewed

Binary file (3.77 kB). View file

__pycache__/configuration_ablang2paired.cpython-310.pyc ADDED Viewed

Binary file (1.05 kB). View file

__pycache__/extra_utils.cpython-310.pyc ADDED Viewed

Binary file (5.9 kB). View file

__pycache__/extra_utils.cpython-312.pyc ADDED Viewed

Binary file (8.55 kB). View file

__pycache__/modeling_ablang2paired.cpython-310.pyc ADDED Viewed

Binary file (3.89 kB). View file

__pycache__/restoration.cpython-310.pyc ADDED Viewed

Binary file (4.19 kB). View file

__pycache__/restoration.cpython-312.pyc ADDED Viewed

Binary file (6.46 kB). View file

__pycache__/scores.cpython-310.pyc ADDED Viewed

Binary file (3.02 kB). View file

__pycache__/scores.cpython-312.pyc ADDED Viewed

Binary file (5.44 kB). View file

ablang_encodings.py ADDED Viewed

	@@ -0,0 +1,97 @@

+import numpy as np
+import torch
+from extra_utils import res_to_list, res_to_seq
+class AbEncoding:
+    def __init__(self, device = 'cpu', ncpu = 1):
+        self.device = device
+        self.ncpu = ncpu
+    def _initiate_abencoding(self, model, tokenizer):
+        self.AbLang = model
+        self.tokenizer = tokenizer
+    def _encode_sequences(self, seqs):
+        tokens = self.tokenizer(seqs, pad=True, w_extra_tkns=False, device=self.used_device)
+        with torch.no_grad():
+            return self.AbLang.AbRep(tokens).last_hidden_states
+    def _predict_logits(self, seqs):
+        tokens = self.tokenizer(seqs, pad=True, w_extra_tkns=False, device=self.used_device)
+        with torch.no_grad():
+            return self.AbLang(tokens)
+    def _predict_logits_with_step_masking(self, seqs):
+        tokens = self.tokenizer(seqs, pad=True, w_extra_tkns=False, device=self.used_device)
+        logits = []
+        for single_seq_tokens in tokens:
+            tkn_len = len(single_seq_tokens)
+            masked_tokens = single_seq_tokens.repeat(tkn_len, 1)
+            for num in range(tkn_len):
+                masked_tokens[num, num] = self.tokenizer.mask_token
+            with torch.no_grad():
+                logits_tmp = self.AbLang(masked_tokens)
+            logits_tmp = torch.stack([logits_tmp[num, num] for num in range(tkn_len)])
+            logits.append(logits_tmp)
+        return torch.stack(logits, dim=0)
+    def seqcoding(self, seqs, **kwargs):
+        """
+        Sequence specific representations
+        """
+        encodings = self._encode_sequences(seqs).cpu().numpy()
+        lens = np.vectorize(len)(seqs)
+        lens = np.tile(lens.reshape(-1,1,1), (encodings.shape[2], 1))
+        return np.apply_along_axis(res_to_seq, 2, np.c_[np.swapaxes(encodings,1,2), lens])
+    def rescoding(self, seqs, align=False, **kwargs):
+        """
+        Residue specific representations.
+        """
+        encodings = self._encode_sequences(seqs).cpu().numpy()
+        if align: return encodings
+        else: return [res_to_list(state, seq) for state, seq in zip(encodings, seqs)]
+    def likelihood(self, seqs, align=False, stepwise_masking=False, **kwargs):
+        """
+        Likelihood of mutations
+        """
+        if stepwise_masking:
+            logits = self._predict_logits_with_step_masking(seqs).cpu().numpy()
+        else:
+            logits = self._predict_logits(seqs).cpu().numpy()
+        if align: return logits
+        else: return [res_to_list(state, seq) for state, seq in zip(logits, seqs)]
+    def probability(self, seqs, align=False, stepwise_masking=False, **kwargs):
+        """
+        Probability of mutations
+        """
+        if stepwise_masking:
+            logits = self._predict_logits_with_step_masking(seqs)
+        else:
+            logits = self._predict_logits(seqs)
+        probs = logits.softmax(-1).cpu().numpy()
+        if align: return probs
+        else: return [res_to_list(state, seq) for state, seq in zip(probs, seqs)]

adapter.py CHANGED Viewed

@@ -1,10 +1,10 @@
-from ablang2.pretrained_utils.restoration import AbRestore
-from ablang2.pretrained_utils.encodings import AbEncoding
-from ablang2.pretrained_utils.alignment import AbAlignment
-from ablang2.pretrained_utils.scores import AbScores
 import torch
 import numpy as np
-from ablang2.pretrained_utils.extra_utils import res_to_seq, res_to_list
 class HuggingFaceTokenizerAdapter:
     def __init__(self, tokenizer, device):

+from restoration import AbRestore
+from ablang_encodings import AbEncoding
+from alignment import AbAlignment
+from scores import AbScores
 import torch
 import numpy as np
+from extra_utils import res_to_seq, res_to_list
 class HuggingFaceTokenizerAdapter:
     def __init__(self, tokenizer, device):

alignment.py ADDED Viewed

	@@ -0,0 +1,87 @@

+from dataclasses import dataclass
+import numpy as np
+import torch
+from extra_utils import paired_msa_numbering, unpaired_msa_numbering, create_alignment
+class AbAlignment:
+    def __init__(self, device = 'cpu', ncpu = 1):
+        self.device = device
+        self.ncpu = ncpu
+    def number_sequences(self, seqs, chain = 'H', fragmented = False):
+        if chain == 'HL':
+            numbered_seqs, seqs, number_alignment = paired_msa_numbering(seqs, fragmented = fragmented, n_jobs = self.ncpu)
+        else:
+            assert chain == 'HL', 'Currently "Align==True" only works for paired sequences. \nPlease use paired sequences or Align=False.'
+            numbered_seqs, seqs, number_alignment = unpaired_msa_numbering(
+                seqs, chain = chain, fragmented = fragmented, n_jobs = self.ncpu
+            )
+        return numbered_seqs, seqs, number_alignment
+    def align_encodings(self, encodings, numbered_seqs, seqs, number_alignment):
+        aligned_encodings = np.concatenate(
+            [[
+                create_alignment(
+                    res_embed, numbered_seq, seq, number_alignment
+                ) for res_embed, numbered_seq, seq in zip(encodings, numbered_seqs, seqs)
+            ]], axis=0
+        )
+        return aligned_encodings
+    def reformat_subsets(
+        self,
+        subset_list,
+        mode = 'seqcoding',
+        align = False,
+        numbered_seqs = None,
+        seqs = None,
+        number_alignment = None,
+    ):
+        if mode in [
+            'seqcoding',
+            'restore',
+            'pseudo_log_likelihood',
+            'confidence'
+        ]:
+            return np.concatenate(subset_list)
+        elif align:
+            subset_list = [
+                self.align_encodings(
+                    subset,
+                    numbered_seqs[num*len(subset):(num+1)*len(subset)],
+                    seqs[num*len(subset):(num+1)*len(subset)],
+                    number_alignment
+                ) for num, subset in enumerate(subset_list)
+            ]
+            subset = np.concatenate(subset_list)
+            return aligned_results(
+                aligned_seqs = [''.join(alist) for alist in subset[:,:,-1]],
+                aligned_embeds = subset[:,:,:-1].astype(float),
+                number_alignment=number_alignment.apply(lambda x: '{}{}'.format(*x[0]), axis=1).values
+            )
+        elif not align:
+            return sum(subset_list, [])
+        else:
+            return np.concatenate(subset_list) # this needs to be changed
+@dataclass
+class aligned_results():
+    """
+    Dataclass used to store output.
+    """
+    aligned_seqs: None
+    aligned_embeds: None
+    number_alignment: None

app.py ADDED Viewed

	@@ -0,0 +1,336 @@

+import gradio as gr
+import sys
+import os
+from transformers import AutoModel, AutoTokenizer
+from transformers.utils import cached_file
+# Load model and tokenizer from Hugging Face Hub
+model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
+# Find the cached model directory and import adapter
+adapter_path = cached_file("hemantn/ablang2", "adapter.py")
+cached_model_dir = os.path.dirname(adapter_path)
+sys.path.insert(0, cached_model_dir)
+# Import and create the adapter
+from adapter import AbLang2PairedHuggingFaceAdapter
+ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
+def restore_sequences(heavy_chain, light_chain, use_align=False):
+    """
+    Restore masked residues in antibody sequences.
+    Args:
+        heavy_chain (str): Heavy chain sequence with masked residues (*)
+        light_chain (str): Light chain sequence with masked residues (*)
+        use_align (bool): Whether to use alignment for variable missing lengths
+    Returns:
+        tuple: (restored_heavy, restored_light, highlighted_heavy, highlighted_light)
+    """
+    try:
+        # Check if alignment is requested but not available
+        if use_align:
+            try:
+                import anarci
+            except ImportError:
+                return "Alignment feature requires 'anarci' package which is not available. Please disable alignment option.", "", ""
+        # Prepare input sequences
+        if heavy_chain.strip() and light_chain.strip():
+            # Both chains provided
+            sequences = [[heavy_chain.strip(), light_chain.strip()]]
+        elif heavy_chain.strip():
+            # Only heavy chain provided
+            sequences = [[heavy_chain.strip(), ""]]
+        elif light_chain.strip():
+            # Only light chain provided
+            sequences = [["", light_chain.strip()]]
+        else:
+            return "Please provide at least one antibody chain sequence.", "", "", ""
+        # Perform restoration
+        restored = ablang(sequences, mode='restore', align=use_align)
+        # Format output
+        if hasattr(restored, '__len__') and len(restored) > 0:
+            result = restored[0]  # Get the first (and only) result
+            # Parse the result to separate heavy and light chains
+            if '>|<' in result:
+                # Both chains present
+                heavy_part = result.split('>|<')[0].replace('<', '').replace('>', '')
+                light_part = result.split('>|<')[1].replace('<', '').replace('>', '')
+            elif result.startswith('<') and result.endswith('>'):
+                # Only one chain present
+                if heavy_chain.strip():
+                    heavy_part = result.replace('<', '').replace('>', '')
+                    light_part = ""
+                else:
+                    heavy_part = ""
+                    light_part = result.replace('<', '').replace('>', '')
+            else:
+                return "Error: Unexpected result format.", "", "", ""
+            # Create highlighted versions
+            highlighted_heavy = highlight_restored_residues(heavy_chain.strip(), heavy_part)
+            highlighted_light = highlight_restored_residues(light_chain.strip(), light_part)
+            # Create HTML outputs with proper styling - no scroll, wrap text
+            heavy_html = f'<div class="restored-sequence-box" style="padding: 10px; background-color: #f8f9fa; border: 1px solid #dee2e6; border-radius: 4px;">{highlighted_heavy}</div>'
+            light_html = f'<div class="restored-sequence-box" style="padding: 10px; background-color: #f8f9fa; border: 1px solid #dee2e6; border-radius: 4px;">{highlighted_light}</div>'
+            return heavy_html, light_html
+        else:
+            return "Error: No restoration result obtained.", "", ""
+    except Exception as e:
+        return f"Error during restoration: {str(e)}", "", ""
+def highlight_restored_residues(original_seq, restored_seq):
+    """
+    Highlight restored residues in green.
+    """
+    if not original_seq or not restored_seq:
+        return restored_seq
+    highlighted = ""
+    for i, (orig_char, rest_char) in enumerate(zip(original_seq, restored_seq)):
+        if orig_char == '*' and rest_char != '*':
+            # This residue was restored
+            highlighted += f'<span class="restored-highlight">{rest_char}</span>'
+        else:
+            highlighted += rest_char
+    # Add any remaining characters from restored sequence
+    if len(restored_seq) > len(original_seq):
+        highlighted += restored_seq[len(original_seq):]
+    return highlighted
+# Create Gradio interface
+with gr.Blocks(title="AbLang2 Sequence Restorer", theme=gr.themes.Soft(), css="""
+    * {
+        font-family: 'Courier New', monospace !important;
+    }
+    .sequence-input, .sequence-output {
+        font-family: 'Courier New', monospace !important;
+        font-size: 14px !important;
+        letter-spacing: 0.5px !important;
+    }
+    .restored-highlight {
+        background-color: #90EE90 !important;
+        color: #000 !important;
+        font-weight: bold !important;
+    }
+    .examples {
+        font-family: 'Courier New', monospace !important;
+        font-size: 14px !important;
+        letter-spacing: 0.5px !important;
+    }
+    .restored-sequence-box {
+        font-family: 'Courier New', monospace !important;
+        font-size: 14px !important;
+        letter-spacing: 0.5px !important;
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+    }
+    .restored-heading {
+        color: #2E8B57 !important;
+        font-weight: bold !important;
+        font-size: 18px !important;
+    }
+    .example-text {
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+    }
+    .examples-table {
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        max-width: none !important;
+        overflow: visible !important;
+    }
+    .examples-table td {
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        max-width: none !important;
+        overflow: visible !important;
+        text-overflow: unset !important;
+    }
+    .sequence-output label {
+        font-weight: bold !important;
+        color: #495057 !important;
+        font-size: 14px !important;
+        margin-bottom: 5px !important;
+    }
+    /* Force full display of examples */
+    .examples-container {
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+    }
+    .examples-container table {
+        width: 100% !important;
+        table-layout: auto !important;
+    }
+    .examples-container td {
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+        max-width: none !important;
+        text-overflow: unset !important;
+        padding: 8px !important;
+        vertical-align: top !important;
+    }
+    .examples-container th {
+        white-space: nowrap !important;
+        padding: 8px !important;
+    }
+    /* Override any Gradio default truncation */
+    .examples table td {
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+        max-width: none !important;
+        text-overflow: unset !important;
+        overflow: visible !important;
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+    }
+    .examples table {
+        table-layout: auto !important;
+        width: 100% !important;
+    }
+    /* Target the specific examples component */
+    div[data-testid="examples"] table td {
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+        max-width: none !important;
+        text-overflow: unset !important;
+        overflow: visible !important;
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+    }
+    /* Force examples to show full content */
+    .examples table, .examples table td, .examples table th {
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+        max-width: none !important;
+        text-overflow: unset !important;
+        overflow: visible !important;
+        font-family: 'Courier New', monospace !important;
+        font-size: 12px !important;
+        table-layout: auto !important;
+        width: auto !important;
+        min-width: 100% !important;
+    }
+    /* Override any inline styles */
+    .examples * {
+        white-space: pre-wrap !important;
+        word-wrap: break-word !important;
+        overflow-wrap: break-word !important;
+        max-width: none !important;
+        text-overflow: unset !important;
+        overflow: visible !important;
+    }
+    /* Style output labels to match input labels exactly */
+    .output-label {
+        font-weight: 600 !important;
+        color: var(--label-text-color) !important;
+        font-size: 14px !important;
+        margin-bottom: 8px !important;
+        margin-top: 16px !important;
+        line-height: 1.4 !important;
+        display: block !important;
+    }
+""") as demo:
+    gr.Markdown("""
+    # 🧬 AbLang2 Sequence Restorer
+    This app uses the AbLang2 model to restore masked residues (*) in antibody sequences.
+    You can provide either one or both heavy and light chain sequences.
+    **Instructions:**
+    - Use `*` to mask residues you want to restore
+    - Provide heavy chain, light chain, or both
+    - Enable "Use Alignment" for variable missing lengths
+    """)
+    with gr.Row():
+        with gr.Column():
+            heavy_input = gr.Textbox(
+                label="Heavy Chain Sequence",
+                placeholder="Enter heavy chain sequence with masked residues (*)...",
+                lines=3,
+                max_lines=5,
+                elem_classes=["sequence-input"]
+            )
+            light_input = gr.Textbox(
+                label="Light Chain Sequence",
+                placeholder="Enter light chain sequence with masked residues (*)...",
+                lines=3,
+                max_lines=5,
+                elem_classes=["sequence-input"]
+            )
+            align_checkbox = gr.Checkbox(
+                label="Use Alignment (for variable missing lengths) - Requires anarci package",
+                value=False
+            )
+            restore_btn = gr.Button("🔄 Restore Sequences", variant="primary")
+        with gr.Column():
+            gr.Markdown("### 🧬 Restored Sequences", elem_classes=["restored-heading"])
+            gr.Markdown("*Green highlighting shows restored residues*")
+            gr.Markdown("**Heavy Chain Sequence**", elem_classes=["output-label"])
+            heavy_output = gr.HTML(label="")
+            gr.Markdown("**Light Chain Sequence**", elem_classes=["output-label"])
+            light_output = gr.HTML(label="")
+    # Example sequences
+    gr.Examples(
+        examples=[
+            [
+                "EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS",
+                "DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK"
+            ],
+            [
+                "EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMGWVRQAPGKGLEWVSAISGSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCARDY**GMDVWGQGTTVTVSS",
+                ""
+            ],
+            [
+                "",
+                "DIQLTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKLLIY*ASSLQSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQSYSTP*TFGQGTKVEIK"
+            ]
+        ],
+        inputs=[heavy_input, light_input],
+        label="Example Sequences"
+    )
+    # Connect the button to the function
+    restore_btn.click(
+        fn=restore_sequences,
+        inputs=[heavy_input, light_input, align_checkbox],
+        outputs=[heavy_output, light_output]
+    )
+    gr.Markdown("""
+    ---
+    **Note:** This app uses the AbLang2 model from Hugging Face Hub.
+    The restoration process may take a few seconds depending on sequence length and complexity.
+    """)
+if __name__ == "__main__":
+    demo.launch()

extra_utils.py ADDED Viewed

	@@ -0,0 +1,165 @@

+import string, re
+import numpy as np
+def res_to_list(logits, seq):
+    return logits[:len(seq)]
+def res_to_seq(a, mode='mean'):
+    """
+    Function for how we go from n_values for each amino acid to n_values for each sequence.
+    We leave out padding tokens.
+    """
+    if mode=='sum':
+        return a[0:(int(a[-1]))].sum()
+    elif mode=='mean':
+        return a[0:(int(a[-1]))].mean()
+    elif mode=='restore':
+        return a[0][0:(int(a[-1]))]
+def get_number_alignment(numbered_seqs):
+    """
+    Creates a number alignment from the anarci results.
+    """
+    import pandas as pd
+    alist = [pd.DataFrame(aligned_seq, columns = [0,1,'resi']) for aligned_seq in numbered_seqs]
+    unsorted_alignment = pd.concat(alist).drop_duplicates(subset=0)
+    max_alignment = get_max_alignment()
+    return max_alignment.merge(unsorted_alignment.query("resi!='-'"), left_on=0, right_on=0)[[0,1]]
+def get_max_alignment():
+    """
+    Create maximum possible alignment for sorting
+    """
+    import pandas as pd
+    sortlist = [[("<", "")]]
+    for num in range(1, 128+1):
+        if num in [33,61,112]:
+            for char in string.ascii_uppercase[::-1]:
+                sortlist.append([(num, char)])
+            sortlist.append([(num,' ')])
+        else:
+            sortlist.append([(num,' ')])
+            for char in string.ascii_uppercase:
+                sortlist.append([(num, char)])
+    return pd.DataFrame(sortlist + [[(">", "")]])
+def paired_msa_numbering(ab_seqs, fragmented = False, n_jobs = 10):
+    import pandas as pd
+    tmp_seqs = [pairs.replace(">", "").replace("<", "").split("|") for pairs in ab_seqs]
+    numbered_seqs_heavy, seqs_heavy, number_alignment_heavy = unpaired_msa_numbering(
+        [i[0] for i in tmp_seqs], 'H', fragmented = fragmented, n_jobs = n_jobs
+    )
+    numbered_seqs_light, seqs_light, number_alignment_light = unpaired_msa_numbering(
+        [i[1] for i in tmp_seqs], 'L', fragmented = fragmented, n_jobs = n_jobs
+    )
+    number_alignment = pd.concat([
+        number_alignment_heavy,
+        pd.DataFrame([[("|",""), "|"]]),
+        number_alignment_light]
+    ).reset_index(drop=True)
+    seqs = [f"{heavy}|{light}" for heavy, light in zip(seqs_heavy, seqs_light)]
+    numbered_seqs = [
+        heavy + [(("|",""), "|", "|")] + light for heavy, light in zip(numbered_seqs_heavy, numbered_seqs_light)
+    ]
+    return numbered_seqs, seqs, number_alignment
+def unpaired_msa_numbering(seqs, chain = 'H', fragmented = False, n_jobs = 10):
+    numbered_seqs = number_with_anarci(seqs, chain = chain, fragmented = fragmented, n_jobs = n_jobs)
+    number_alignment = get_number_alignment(numbered_seqs)
+    number_alignment[1] = chain
+    seqs = [''.join([i[2] for i in numbered_seq]).replace('-','') for numbered_seq in numbered_seqs]
+    return numbered_seqs, seqs, number_alignment
+def number_with_anarci(seqs, chain = 'H', fragmented = False, n_jobs = 1):
+    import anarci
+    import pandas as pd
+    anarci_out = anarci.run_anarci(
+        pd.DataFrame(seqs).reset_index().values.tolist(),
+        ncpu=n_jobs,
+        scheme='imgt',
+        allowed_species=['human', 'mouse'],
+    )
+    numbered_seqs = []
+    for onarci in anarci_out[1]:
+        numbered_seq = []
+        for i in onarci[0][0]:
+            if i[1] != '-':
+                numbered_seq.append((i[0], chain, i[1]))
+        if fragmented:
+            numbered_seqs.append(numbered_seq)
+        else:
+            numbered_seqs.append([(("<",""), chain, "<")] + numbered_seq + [((">",""), chain, ">")])
+    return numbered_seqs
+def create_alignment(res_embeds, numbered_seqs, seq, number_alignment):
+    import pandas as pd
+    datadf = pd.DataFrame(numbered_seqs)
+    sequence_alignment = number_alignment.merge(datadf, how='left', on=[0, 1]).fillna('-')[2]
+    idxs = np.where(sequence_alignment.values == '-')[0]
+    idxs = [idx-num for num, idx in enumerate(idxs)]
+    aligned_embeds = pd.DataFrame(np.insert(res_embeds[:len(seq)], idxs , 0, axis=0))
+    return pd.concat([aligned_embeds, sequence_alignment], axis=1).values
+def get_spread_sequences(seq, spread, start_position):
+    """
+    Test sequences which are 8 positions shorter (position 10 + max CDR1 gap of 7) up to 2 positions longer (possible insertions).
+    """
+    spread_sequences = []
+    for diff in range(start_position-8, start_position+2+1):
+        spread_sequences.append('*'*diff+seq)
+    return np.array(spread_sequences)
+def get_sequences_from_anarci(out_anarci, max_position, spread):
+    """
+    Ensures correct masking on each side of sequence
+    """
+    if out_anarci == 'ANARCI_error':
+        return np.array(['ANARCI-ERR']*spread)
+    end_position = int(re.search(r'\d+', out_anarci[::-1]).group()[::-1])
+    # Fixes ANARCI error of poor numbering of the CDR1 region
+    start_position = int(re.search(r'\d+,\s\'.\'\),\s\'[^-]+\'\),\s\(\(\d+,\s\'.\'\),\s\'[^-]+\'\),\s\(\(\d+,\s\'.\'\),\s\'[^-]+\'\),\s\(\(\d+,\s\'.\'\),\s\'[^-]+',
+                                   out_anarci).group().split(',')[0]) - 1
+    sequence = "".join(re.findall(r"(?i)[A-Z*]", "".join(re.findall(r'\),\s\'[A-Z*]', out_anarci))))
+    sequence_j = ''.join(sequence).replace('-','').replace('X','*') + '*'*(max_position-int(end_position))
+    return get_spread_sequences(sequence_j, spread, start_position)

requirements.txt ADDED Viewed

	@@ -0,0 +1,6 @@

+gradio>=4.0.0
+transformers>=4.30.0
+torch>=2.0.0
+numpy>=1.21.0
+pandas>=1.3.0
+git+https://github.com/oxpig/ANARCI.git

restoration.py ADDED Viewed

	@@ -0,0 +1,96 @@

+import numpy as np
+import torch
+from extra_utils import res_to_seq, get_sequences_from_anarci
+class AbRestore:
+    def __init__(self, spread = 11, device = 'cpu', ncpu = 1):
+        self.spread = spread
+        self.device = device
+        self.ncpu = ncpu
+    def _initiate_abrestore(self, model, tokenizer):
+        self.AbLang = model
+        self.tokenizer = tokenizer
+    def restore(self, seqs, align = False, **kwargs):
+        """
+        Restore sequences
+        """
+        n_seqs = len(seqs)
+        if align:
+            seqs = self._sequence_aligning(seqs)
+            nr_seqs = len(seqs)//self.spread
+            tokens = self.tokenizer(seqs, pad=True, w_extra_tkns=False, device=self.used_device)
+            predictions = self.AbLang(tokens)[:,:,1:21]
+            # Reshape
+            tokens = tokens.reshape(nr_seqs, self.spread, -1)
+            predictions = predictions.reshape(nr_seqs, self.spread, -1, 20)
+            seqs = seqs.reshape(nr_seqs, -1)
+            # Find index of best predictions
+            best_seq_idx = torch.argmax(torch.max(predictions, -1).values[:,:,1:2].mean(2), -1)
+            # Select best predictions
+            tokens = tokens.gather(1, best_seq_idx.view(-1, 1).unsqueeze(1).repeat(1, 1, tokens.shape[-1])).squeeze(1)
+            predictions = predictions[range(predictions.shape[0]), best_seq_idx]
+            seqs = np.take_along_axis(seqs, best_seq_idx.view(-1, 1).cpu().numpy(), axis=1)
+        else:
+            tokens = self.tokenizer(seqs, pad=True, w_extra_tkns=False, device=self.used_device)
+            predictions = self.AbLang(tokens)[:,:,1:21]
+        predicted_tokens = torch.max(predictions, -1).indices + 1
+        restored_tokens = torch.where(tokens==23, predicted_tokens, tokens)
+        restored_seqs = self.tokenizer(restored_tokens, mode="decode")
+        if n_seqs < len(restored_seqs):
+            restored_seqs = [f"{h}|{l}".replace('-','') for h,l in zip(restored_seqs[:n_seqs], restored_seqs[n_seqs:])]
+            seqs = [f"{h}|{l}" for h,l in zip(seqs[:n_seqs], seqs[n_seqs:])]
+        return np.array([res_to_seq(seq, 'restore') for seq in np.c_[restored_seqs, np.vectorize(len)(seqs)]])
+    def _create_spread_of_sequences(self, seqs, chain = 'H'):
+        import pandas as pd
+        import anarci
+        chain_idx = 0 if chain == 'H' else 1
+        numbered_seqs = anarci.run_anarci(
+            pd.DataFrame([seq[chain_idx].replace('*', 'X') for seq in seqs]).reset_index().values.tolist(),
+            ncpu=self.ncpu,
+            scheme='imgt',
+            allowed_species=['human', 'mouse'],
+        )
+        anarci_data = pd.DataFrame(
+            [str(anarci[0][0]) if anarci else 'ANARCI_error' for anarci in numbered_seqs[1]],
+            columns=['anarci']
+        ).astype('<U90')
+        max_position = 128 if chain == 'H' else 127
+        seqs = anarci_data.apply(
+            lambda x: get_sequences_from_anarci(
+                x.anarci,
+                max_position,
+                self.spread
+            ), axis=1, result_type='expand'
+        ).to_numpy().reshape(-1)
+        return seqs
+    def _sequence_aligning(self, seqs):
+        tmp_seqs = [pairs.replace(">", "").replace("<", "").split("|") for pairs in seqs]
+        spread_heavy = [f"<{seq}>" for seq in self._create_spread_of_sequences(tmp_seqs, chain = 'H')]
+        spread_light = [f"<{seq}>" for seq in self._create_spread_of_sequences(tmp_seqs, chain = 'L')]
+        return np.concatenate([np.array(spread_heavy),np.array(spread_light)])

scores.py ADDED Viewed

	@@ -0,0 +1,98 @@

+import numpy as np
+import torch
+from extra_utils import res_to_list, res_to_seq
+class AbScores:
+    def __init__(self, device = 'cpu', ncpu = 1):
+        self.device = device
+        self.ncpu = ncpu
+    def _initiate_abencoding(self, model, tokenizer):
+        self.AbLang = model
+        self.tokenizer = tokenizer
+    def _encode_sequences(self, seqs):
+        tokens = self.tokenizer(seqs, pad=True, w_extra_tkns=False, device=self.used_device)
+        with torch.no_grad():
+            return self.AbLang.AbRep(tokens).last_hidden_states.numpy()
+    def _predict_logits(self, seqs):
+        tokens = self.tokenizer(seqs, pad=True, w_extra_tkns=False, device=self.used_device)
+        with torch.no_grad():
+            return self.AbLang(tokens), tokens
+    def pseudo_log_likelihood(self, seqs, **kwargs):
+        """
+        Pseudo log likelihood of sequences.
+        """
+        plls = []
+        for seq in seqs:
+            labels = self.tokenizer(
+                seq, pad=True, w_extra_tkns=False, device=self.used_device
+            )
+            idxs = (
+                ~torch.isin(labels, torch.Tensor(self.tokenizer.all_special_tokens).to(self.used_device))
+            ).nonzero()
+            masked_tokens = labels.repeat(len(idxs), 1)
+            for num, idx in enumerate(idxs):
+                masked_tokens[num, idx[1]] = self.tokenizer.mask_token
+            with torch.no_grad():
+                logits = self.AbLang(masked_tokens)
+            logits[:, :, self.tokenizer.all_special_tokens] = -float("inf")
+            logits = torch.stack([logits[num, idx[1]] for num, idx in enumerate(idxs)])
+            labels = labels[:,idxs[:,1:]].squeeze(2)[0]
+            nll = torch.nn.functional.cross_entropy(
+                    logits,
+                    labels,
+                    reduction="mean",
+                )
+            pll = -nll
+            plls.append(pll)
+        plls = torch.stack(plls, dim=0).cpu().numpy()
+        return plls
+    def confidence(self, seqs, **kwargs):
+        """
+        Log likelihood of sequences without masking.
+        """
+        labels = self.tokenizer(
+                seqs, pad=True, w_extra_tkns=False, device=self.used_device
+            )
+        with torch.no_grad():
+            logits = self.AbLang(labels)
+            logits[:, :, self.tokenizer.all_special_tokens] = -float("inf")
+        plls = []
+        for label, logit in zip(labels, logits):
+            idxs = (
+                ~torch.isin(label, torch.Tensor(self.tokenizer.all_special_tokens).to(self.used_device))
+            ).nonzero().squeeze(1)
+            nll = torch.nn.functional.cross_entropy(
+                        logit[idxs],
+                        label[idxs],
+                        reduction="mean",
+                    )
+            pll = -nll
+            plls.append(pll)
+        return torch.stack(plls, dim=0).cpu().numpy()

test_ablang2_HF_implementation.ipynb CHANGED Viewed

@@ -10,7 +10,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 11,
    "id": "7ae54cd0-6253-46dd-a316-4f20b12041e0",
    "metadata": {},
    "outputs": [],
@@ -40,7 +40,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
    "id": "99192978-a008-4a32-a80e-bba238e0ec7c",
    "metadata": {},
    "outputs": [],
@@ -82,10 +82,41 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
    "id": "6d66ad84",
    "metadata": {},
-   "outputs": [],
    "source": [
     "# Load model and tokenizer from Hugging Face Hub\n",
     "model = AutoModel.from_pretrained(\"hemantn/ablang2\", trust_remote_code=True)\n",

   },
   {
    "cell_type": "code",
+   "execution_count": 1,
    "id": "7ae54cd0-6253-46dd-a316-4f20b12041e0",
    "metadata": {},
    "outputs": [],
   },
   {
    "cell_type": "code",
+   "execution_count": 2,
    "id": "99192978-a008-4a32-a80e-bba238e0ec7c",
    "metadata": {},
    "outputs": [],
   },
   {
    "cell_type": "code",
+   "execution_count": 3,
    "id": "6d66ad84",
    "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "A new version of the following files was downloaded from https://huggingface.co/hemantn/ablang2:\n",
+      "- configuration_ablang2paired.py\n",
+      ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n",
+      "A new version of the following files was downloaded from https://huggingface.co/hemantn/ablang2:\n",
+      "- modeling_ablang2paired.py\n",
+      ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n",
+      "/home/hn533621/.conda/envs/lib_transformer/lib/python3.10/site-packages/huggingface_hub/file_download.py:943: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "✅ Loaded custom weights from: /home/hn533621/.cache/huggingface/hub/models--hemantn--ablang2/snapshots/e1df3c0a25269eaeb91c4891125dd9a8580a01b7/model.pt\n"
+     ]
+    },
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "A new version of the following files was downloaded from https://huggingface.co/hemantn/ablang2:\n",
+      "- tokenizer_ablang2paired.py\n",
+      ". Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.\n"
+     ]
+    }
+   ],
    "source": [
     "# Load model and tokenizer from Hugging Face Hub\n",
     "model = AutoModel.from_pretrained(\"hemantn/ablang2\", trust_remote_code=True)\n",

test_align.py ADDED Viewed

	@@ -0,0 +1,34 @@

+import sys
+import os
+from transformers import AutoModel, AutoTokenizer
+from transformers.utils import cached_file
+# Load model and tokenizer from Hugging Face Hub
+print("Loading model and tokenizer...")
+model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
+# Find the cached model directory and import adapter
+adapter_path = cached_file("hemantn/ablang2", "adapter.py")
+cached_model_dir = os.path.dirname(adapter_path)
+sys.path.insert(0, cached_model_dir)
+# Import and create the adapter
+from adapter import AbLang2PairedHuggingFaceAdapter
+ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
+# Test sequences from the notebook
+test_sequences = [
+    ['EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS',
+     'DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK']
+]
+print("Testing restore without alignment:")
+result_no_align = ablang(test_sequences, mode='restore', align=False)
+print(f"Result (no align): {result_no_align[0]}")
+print("\nTesting restore with alignment:")
+result_with_align = ablang(test_sequences, mode='restore', align=True)
+print(f"Result (with align): {result_with_align[0]}")
+print("\nBoth options work correctly!")

test_app_output.py ADDED Viewed

	@@ -0,0 +1,56 @@

+import sys
+import os
+from transformers import AutoModel, AutoTokenizer
+from transformers.utils import cached_file
+# Load model and tokenizer from Hugging Face Hub
+print("Loading model and tokenizer...")
+model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
+tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
+# Find the cached model directory and import adapter
+adapter_path = cached_file("hemantn/ablang2", "adapter.py")
+cached_model_dir = os.path.dirname(adapter_path)
+sys.path.insert(0, cached_model_dir)
+# Import and create the adapter
+from adapter import AbLang2PairedHuggingFaceAdapter
+ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
+def restore_sequences(heavy_chain, light_chain, use_align=False):
+    try:
+        # Prepare input sequences
+        if heavy_chain.strip() and light_chain.strip():
+            sequences = [[heavy_chain.strip(), light_chain.strip()]]
+        elif heavy_chain.strip():
+            sequences = [[heavy_chain.strip(), ""]]
+        elif light_chain.strip():
+            sequences = [["", light_chain.strip()]]
+        else:
+            return "Please provide at least one antibody chain sequence."
+        # Perform restoration
+        restored = ablang(sequences, mode='restore', align=use_align)
+        # Format output
+        if hasattr(restored, '__len__') and len(restored) > 0:
+            result = restored[0]  # Get the first (and only) result
+            return result
+        else:
+            return "Error: No restoration result obtained."
+    except Exception as e:
+        return f"Error during restoration: {str(e)}"
+# Test the function
+heavy_chain = "EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS"
+light_chain = "DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK"
+result = restore_sequences(heavy_chain, light_chain, False)
+print("="*80)
+print("APP OUTPUT TEST:")
+print("="*80)
+print(result)
+print("="*80)
+print(f"Result length: {len(result)}")
+print(f"Result type: {type(result)}")

test_integrated_adapter.py ADDED Viewed

	@@ -0,0 +1,158 @@

+#!/usr/bin/env python3
+"""
+Test script for the integrated AbLang2 adapter functionality.
+This script tests that all the utility files are properly integrated and the adapter works correctly.
+"""
+import sys
+import os
+# Global variable to store the adapter class
+AbLang2PairedHuggingFaceAdapter = None
+def test_imports():
+    """Test that all imports work correctly"""
+    global AbLang2PairedHuggingFaceAdapter
+    print("🔍 Testing imports...")
+    try:
+        # Test utility imports
+        from restoration import AbRestore
+        print("✅ AbRestore imported successfully")
+        from ablang_encodings import AbEncoding
+        print("✅ AbEncoding imported successfully")
+        from alignment import AbAlignment
+        print("✅ AbAlignment imported successfully")
+        from scores import AbScores
+        print("✅ AbScores imported successfully")
+        from extra_utils import res_to_seq, res_to_list
+        print("✅ extra_utils functions imported successfully")
+        # Test adapter import
+        from adapter import AbLang2PairedHuggingFaceAdapter
+        print("✅ AbLang2PairedHuggingFaceAdapter imported successfully")
+        print("\n🎉 All imports successful!")
+        return True
+    except ImportError as e:
+        print(f"❌ Import error: {e}")
+        return False
+    except Exception as e:
+        print(f"❌ Unexpected error: {e}")
+        return False
+def test_model_loading():
+    """Test model loading from Hugging Face"""
+    global AbLang2PairedHuggingFaceAdapter
+    print("\n🔍 Testing model loading...")
+    try:
+        from transformers import AutoModel, AutoTokenizer
+        print("Loading model from Hugging Face...")
+        model = AutoModel.from_pretrained("hemantn/ablang2", trust_remote_code=True)
+        print("✅ Model loaded successfully")
+        print("Loading tokenizer from Hugging Face...")
+        tokenizer = AutoTokenizer.from_pretrained("hemantn/ablang2", trust_remote_code=True)
+        print("✅ Tokenizer loaded successfully")
+        print("Creating adapter...")
+        ablang = AbLang2PairedHuggingFaceAdapter(model=model, tokenizer=tokenizer)
+        print("✅ Adapter created successfully")
+        return True, ablang
+    except ImportError as e:
+        print(f"❌ Transformers not available: {e}")
+        print("   This is expected if transformers is not installed")
+        return False, None
+    except Exception as e:
+        print(f"❌ Model loading error: {e}")
+        return False, None
+def test_restore_functionality(ablang):
+    """Test the restore functionality"""
+    print("\n🔍 Testing restore functionality...")
+    try:
+        # Test sequences
+        test_sequences = [
+            ["EVQ***SGGEVKKPGASVKVSCRASGYTFRNYGLTWVRQAPGQGLEWMGWISAYNGNTNYAQKFQGRVTLTTDTSTSTAYMELRSLRSDDTAVYFCAR**PGHGAAFMDVWGTGTTVTVSS",
+             "DIQLTQSPLSLPVTLGQPASISCRSS*SLEASDTNIYLSWFQQRPGQSPRRLIYKI*NRDSGVPDRFSGSGSGTHFTLRISRVEADDVAVYYCMQGTHWPPAFGQGTKVDIK"]
+        ]
+        print("Testing restore without alignment...")
+        result = ablang(test_sequences, mode='restore', align=False)
+        print(f"✅ Restore result: {result}")
+        print("Testing restore with alignment...")
+        result_align = ablang(test_sequences, mode='restore', align=True)
+        print(f"✅ Restore with alignment result: {result_align}")
+        return True
+    except Exception as e:
+        print(f"❌ Restore functionality error: {e}")
+        return False
+def test_encoding_functionality(ablang):
+    """Test the encoding functionality"""
+    print("\n🔍 Testing encoding functionality...")
+    try:
+        test_sequences = [
+            ["EVQLVESGGGLVQPGGSLRLSCAASGFTFSSYAMGWVRQAPGKGLEWVSAISGSGGSTYYADSVKGRFTISRDNSKNTLYLQMNSLRAEDTAVYYCARDYPGHGAAFMDVWGQGTTVTVSS",
+             "DIQLTQSPSSLSASVGDRVTITCRASQSISSYLNWYQQKPGKAPKLLIYASSLQSGVPSRFSGSGSGTDFTLTISSLQPEDFATYYCQQSYSTPTTFGQGTKVEIK"]
+        ]
+        print("Testing sequence coding...")
+        result = ablang(test_sequences, mode='seqcoding')
+        print(f"✅ Sequence coding result shape: {result.shape if hasattr(result, 'shape') else len(result)}")
+        return True
+    except Exception as e:
+        print(f"❌ Encoding functionality error: {e}")
+        return False
+def main():
+    """Main test function"""
+    print("🧬 AbLang2 Integrated Adapter Test")
+    print("=" * 50)
+    # Test imports
+    if not test_imports():
+        print("\n❌ Import tests failed. Exiting.")
+        return
+    # Test model loading
+    model_loaded, ablang = test_model_loading()
+    if model_loaded and ablang is not None:
+        # Test functionality
+        test_restore_functionality(ablang)
+        test_encoding_functionality(ablang)
+        print("\n🎉 All tests completed successfully!")
+        print("\n📋 Summary:")
+        print("✅ All utility files integrated")
+        print("✅ Adapter imports working")
+        print("✅ Model loading successful")
+        print("✅ Restore functionality working")
+        print("✅ Encoding functionality working")
+    else:
+        print("\n⚠️  Model loading test skipped (transformers not available)")
+        print("✅ Core integration tests passed")
+        print("✅ Ready for deployment")
+if __name__ == "__main__":
+    main()