Spaces:
Running
Running
README update
Browse files- LICENSE +1 -1
- header.md +7 -3
- instructions.md +18 -10
LICENSE
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
Copyright (c) 2024, Massimo G. Totaro All rights reserved.
|
| 2 |
|
| 3 |
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
|
| 4 |
|
|
|
|
| 1 |
+
Copyright (c) 2024-2025, Massimo G. Totaro All rights reserved.
|
| 2 |
|
| 3 |
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
|
| 4 |
|
header.md
CHANGED
|
@@ -1,5 +1,9 @@
|
|
| 1 |
-
Calculate the fitness of single amino acid substitutions on proteins, using a [zero-shot](https://doi.org/10.1101/2021.07.09.450648)
|
| 2 |
|
| 3 |
-
**
|
|
|
|
|
|
|
|
|
|
| 4 |
Due to high server traffic, the tool might become slow or unresponsive.
|
| 5 |
-
In this case, it is recommended to duplicate and clone the space in your personal HuggingFace account by clicking
|
|
|
|
|
|
| 1 |
+
Calculate the fitness of single amino acid substitutions on proteins, using a [zero-shot](https://doi.org/10.1101/2021.07.09.450648) protein language predictor of the [ESM model family](https://huggingface.co/facebook/esm2_t6_8M_UR50D).
|
| 2 |
|
| 3 |
+
**UPDATE:**
|
| 4 |
+
[Profluent-Bio](https://huggingface.co/Profluent-Bio)'s [E1 model family](https://huggingface.co/Profluent-Bio/E1-150m) is now available for inference.
|
| 5 |
+
|
| 6 |
+
**WARNING:**
|
| 7 |
Due to high server traffic, the tool might become slow or unresponsive.
|
| 8 |
+
In this case, it is recommended to duplicate and clone the space in your personal HuggingFace account by clicking [here](https://huggingface.co/spaces/thaidaev/zsp?duplicate=true).
|
| 9 |
+
In the top right corner, there are options to run the app locally or clone the repository.
|
instructions.md
CHANGED
|
@@ -7,7 +7,7 @@ If the server remains idle for a period, it will enter standby mode. Running a c
|
|
| 7 |
## Input
|
| 8 |
|
| 9 |
**Sequence**: Enter the full amino acid sequence to be analyzed in the **Sequence** text box.
|
| 10 |
-
|
| 11 |
|
| 12 |
**Substitutions**: Specify the substitutions you wish to test in the **Substitutions** box. The tool supports three running modes based on your input:
|
| 13 |
|
|
@@ -16,22 +16,25 @@ If the server remains idle for a period, it will enter standby mode. Running a c
|
|
| 16 |
- **Same-Length Sequence**: Analyze differing amino acid substitutions one by one within sequences of equal length.
|
| 17 |
- **Different Inputs**: For any other input format, a deep mutational scan of the full sequence will be performed.
|
| 18 |
|
| 19 |
-
**Model Selection**: Choose
|
| 20 |
-
|
| 21 |
|
| 22 |
**Accuracy Option**: The **Use higher accuracy** option applies a masked-marginals scoring strategy, which considers sequence context during inference.
|
| 23 |
-
|
|
|
|
| 24 |
|
| 25 |
-
**Deep Mutational Scan Recommendations**: When performing a deep mutational scan, it is advisable to use smaller models (8M, 35M, or 150M parameters) due to significant runtime concerns
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
|
| 29 |
|
| 30 |
-
**Concurrent Substitutions**:
|
|
|
|
|
|
|
| 31 |
|
| 32 |
## Output
|
| 33 |
|
| 34 |
-
Results are displayed in a
|
| 35 |
In the table:
|
| 36 |
|
| 37 |
- Beneficial substitutions are highlighted in green with positive values.
|
|
@@ -44,6 +47,11 @@ As a rule of thumb, score differences of *4* or more are considered significant.
|
|
| 44 |
|
| 45 |
The **Download raw data** button lets you download the output in CSV format.
|
| 46 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 47 |
|
| 48 |
**If you use this tool in your research, please cite**:
|
| 49 |
|
|
|
|
| 7 |
## Input
|
| 8 |
|
| 9 |
**Sequence**: Enter the full amino acid sequence to be analyzed in the **Sequence** text box.
|
| 10 |
+
Note: While jolly characters (e.g., `-X.B`) can be included, they currently cannot be visualized.
|
| 11 |
|
| 12 |
**Substitutions**: Specify the substitutions you wish to test in the **Substitutions** box. The tool supports three running modes based on your input:
|
| 13 |
|
|
|
|
| 16 |
- **Same-Length Sequence**: Analyze differing amino acid substitutions one by one within sequences of equal length.
|
| 17 |
- **Different Inputs**: For any other input format, a deep mutational scan of the full sequence will be performed.
|
| 18 |
|
| 19 |
+
**Model Selection**: Choose a model for calculations from those available on Hugging Face Model Hub.
|
| 20 |
+
The `esm2_t33_650M_UR50D` model offers an optimal balance between cost and accuracy [*](https://doi.org/10.1126/science.ade2574).
|
| 21 |
|
| 22 |
**Accuracy Option**: The **Use higher accuracy** option applies a masked-marginals scoring strategy, which considers sequence context during inference.
|
| 23 |
+
While this method is slower, it enhances accuracy.
|
| 24 |
+
If you experience long runtimes, unchecking this option can significantly speed up calculations at the cost of some accuracy.
|
| 25 |
|
| 26 |
+
**Deep Mutational Scan Recommendations**: When performing a deep mutational scan, it is advisable to use smaller models (8M, 35M, or 150M parameters) due to significant runtime concerns, especially with longer sequences or during peak server usage times.
|
| 27 |
+
For example, calculating a 300-residue-long sequence with larger models may require over 30 minutes.
|
| 28 |
+
Generally, accuracy is more affected by the scoring strategy than by model size; therefore, prioritize reducing model size when optimizing for runtime.
|
| 29 |
+
The computational cost of the scoring strategy scales with the number of substitutions tested, while model cost scales with wild-type sequence length.
|
| 30 |
|
| 31 |
+
**Concurrent Substitutions**:
|
| 32 |
+
To calculate the effect of multiple concurrent substitutions, you must manually change the input sequence and rerun the calculation.
|
| 33 |
+
Accuracy is not guaranteed as this use case is yet untested.
|
| 34 |
|
| 35 |
## Output
|
| 36 |
|
| 37 |
+
Results are displayed in a colour-coded table, except for deep mutational scans, which produce a heatmap.
|
| 38 |
In the table:
|
| 39 |
|
| 40 |
- Beneficial substitutions are highlighted in green with positive values.
|
|
|
|
| 47 |
|
| 48 |
The **Download raw data** button lets you download the output in CSV format.
|
| 49 |
|
| 50 |
+
## Debugging
|
| 51 |
+
|
| 52 |
+
A basic error message will be displayed if the tool fails to process your input, but you can also check the server's [logs](https://huggingface.co/spaces/thaidaev/zsp?logs=container) for additional information.
|
| 53 |
+
The logs also show a progress bar that indicates how far along the calculation is.
|
| 54 |
+
|
| 55 |
|
| 56 |
**If you use this tool in your research, please cite**:
|
| 57 |
|