Spaces:

thaidaev
/

zsp

Running

App Files Files

mgtotaro commited on Nov 17, 2025

Commit

7d6838f

1 Parent(s): 08446bb

README update

Browse files

Files changed (3) hide show

LICENSE +1 -1
header.md +7 -3
instructions.md +18 -10

LICENSE CHANGED Viewed

@@ -1,4 +1,4 @@
-Copyright (c) 2024, Massimo G. Totaro All rights reserved.
 Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:


1	+ Copyright (c) 2024-2025, Massimo G. Totaro All rights reserved.
2
3	Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
4

header.md CHANGED Viewed

@@ -1,5 +1,9 @@
-Calculate the fitness of single amino acid substitutions on proteins, using a [zero-shot](https://doi.org/10.1101/2021.07.09.450648) [language model predictor](https://github.com/facebookresearch/esm)
-**WARNING:**
 Due to high server traffic, the tool might become slow or unresponsive.
-In this case, it is recommended to duplicate and clone the space in your personal HuggingFace account by clicking the top right menu.

+Calculate the fitness of single amino acid substitutions on proteins, using a [zero-shot](https://doi.org/10.1101/2021.07.09.450648) protein language predictor of the [ESM model family](https://huggingface.co/facebook/esm2_t6_8M_UR50D).
+**UPDATE:**
+[Profluent-Bio](https://huggingface.co/Profluent-Bio)'s [E1 model family](https://huggingface.co/Profluent-Bio/E1-150m) is now available for inference.
+**WARNING:**
 Due to high server traffic, the tool might become slow or unresponsive.
+In this case, it is recommended to duplicate and clone the space in your personal HuggingFace account by clicking [here](https://huggingface.co/spaces/thaidaev/zsp?duplicate=true).
+In the top right corner, there are options to run the app locally or clone the repository.

instructions.md CHANGED Viewed

@@ -7,7 +7,7 @@ If the server remains idle for a period, it will enter standby mode. Running a c
 ## Input
 **Sequence**: Enter the full amino acid sequence to be analyzed in the **Sequence** text box.
-  Note: While jolly characters (e.g., `-X.B`) can be included, they currently cannot be visualised.
 **Substitutions**: Specify the substitutions you wish to test in the **Substitutions** box. The tool supports three running modes based on your input:
@@ -16,22 +16,25 @@ If the server remains idle for a period, it will enter standby mode. Running a c
 - **Same-Length Sequence**: Analyze differing amino acid substitutions one by one within sequences of equal length.
 - **Different Inputs**: For any other input format, a deep mutational scan of the full sequence will be performed.
-**Model Selection**: Choose an ESM model for calculations from those available on Hugging Face Model Hub.
-  The model `esm2_t33_650M_UR50D` offers an optimal balance between cost and accuracy [*](https://doi.org/10.1126/science.ade2574).
 **Accuracy Option**: The **Use higher accuracy** option applies a masked-marginals scoring strategy, which considers sequence context during inference.
-  While this method is slower, it enhances accuracy. If you experience long runtimes, unchecking this option can significantly speed up calculations at the cost of some accuracy.
-**Deep Mutational Scan Recommendations**: When performing a deep mutational scan, it is advisable to use smaller models (8M, 35M, or 150M parameters) due to significant runtime concerns—especially with longer sequences or during peak server usage times.
-  For example, calculating a 300-residue-long sequence with larger models may require over 30 minutes.
-  Generally, accuracy is more affected by the scoring strategy than by model size; therefore, prioritise reducing model size when optimizing for runtime.
-  The computational cost of the scoring strategy scales with the number of substitutions tested, while model cost scales with wild-type sequence length.
-**Concurrent Substitutions**: To calculate the effect of multiple concurrent substitutions, you must manually change the input sequence and rerun the calculation. Accuracy is not guaranteed as this use case is yet untested.
 ## Output
-Results are displayed in a color-coded table, except for deep mutational scans, which produce a heatmap.
 In the table:
 - Beneficial substitutions are highlighted in green with positive values.
@@ -44,6 +47,11 @@ As a rule of thumb, score differences of *4* or more are considered significant.
 The **Download raw data** button lets you download the output in CSV format.
 **If you use this tool in your research, please cite**:

 ## Input
 **Sequence**: Enter the full amino acid sequence to be analyzed in the **Sequence** text box.
+Note: While jolly characters (e.g., `-X.B`) can be included, they currently cannot be visualized.
 **Substitutions**: Specify the substitutions you wish to test in the **Substitutions** box. The tool supports three running modes based on your input:
 - **Same-Length Sequence**: Analyze differing amino acid substitutions one by one within sequences of equal length.
 - **Different Inputs**: For any other input format, a deep mutational scan of the full sequence will be performed.
+**Model Selection**: Choose a model for calculations from those available on Hugging Face Model Hub.
+The `esm2_t33_650M_UR50D` model offers an optimal balance between cost and accuracy [*](https://doi.org/10.1126/science.ade2574).
 **Accuracy Option**: The **Use higher accuracy** option applies a masked-marginals scoring strategy, which considers sequence context during inference.
+While this method is slower, it enhances accuracy.
+If you experience long runtimes, unchecking this option can significantly speed up calculations at the cost of some accuracy.
+**Deep Mutational Scan Recommendations**: When performing a deep mutational scan, it is advisable to use smaller models (8M, 35M, or 150M parameters) due to significant runtime concerns, especially with longer sequences or during peak server usage times.
+For example, calculating a 300-residue-long sequence with larger models may require over 30 minutes.
+Generally, accuracy is more affected by the scoring strategy than by model size; therefore, prioritize reducing model size when optimizing for runtime.
+The computational cost of the scoring strategy scales with the number of substitutions tested, while model cost scales with wild-type sequence length.
+**Concurrent Substitutions**:
+To calculate the effect of multiple concurrent substitutions, you must manually change the input sequence and rerun the calculation.
+Accuracy is not guaranteed as this use case is yet untested.
 ## Output
+Results are displayed in a colour-coded table, except for deep mutational scans, which produce a heatmap.
 In the table:
 - Beneficial substitutions are highlighted in green with positive values.
 The **Download raw data** button lets you download the output in CSV format.
+## Debugging
+A basic error message will be displayed if the tool fails to process your input, but you can also check the server's [logs](https://huggingface.co/spaces/thaidaev/zsp?logs=container) for additional information.
+The logs also show a progress bar that indicates how far along the calculation is.
 **If you use this tool in your research, please cite**: