Spaces:
Running
Running
| # K-mer–based Sequence Predictor | |
| This Space predicts the most likely group of **unknown sequences** using | |
| group-specific **unique k-mers** generated by the companion Space: | |
| ?? **Unique k-mer discovery Space:** | |
| https://huggingface.co/spaces/<your-username>/<space-1-name> | |
| --- | |
| ## Overview | |
| This tool assigns each unknown sequence to a group by detecting | |
| group-specific k-mers and computing a confidence score. | |
| It is designed to work directly with the `kmer_results.zip` | |
| produced by the Unique k-mer discovery Space. | |
| --- | |
| ## Inputs | |
| ### 1. Unknown sequences | |
| Upload one or more FASTA files containing unknown sequences: | |
| - `.fa`, `.fasta`, `.fas`, `.fna` | |
| ### 2. K-mer results ZIP | |
| Upload **`kmer_results.zip`** generated by the Unique k-mer discovery Space. | |
| > ?? This Space only accepts ZIP input for k-mers to ensure compatibility | |
| > and reproducibility. | |
| --- | |
| ## Parameters | |
| - **Sequence type** | |
| - `dna` or `protein` | |
| - **Mode** | |
| - **fast**: exact k-mer matching (recommended) | |
| - **full**: alignment-based matching + Fisher test + FDR (slower) | |
| - **Identity / Coverage / FDR** | |
| - Used only in *full* mode | |
| --- | |
| ## Outputs | |
| - **predictions_by_alignment.csv** | |
| - One row per sequence | |
| - Predicted group and confidence metrics | |
| - **predicted_results_summary.png** | |
| - Group counts and confidence distribution | |
| - **prediction_outputs.zip** | |
| - ZIP containing all outputs | |
| --- | |
| ## Performance notes | |
| - The **fast** mode is recommended for large datasets. | |
| - The **full** mode is computationally intensive and best suited for | |
| small validation sets. | |
| --- | |
| ## Citation | |
| If you use this tool, please cite: | |
| Muhamed-Kheir TAHA, Institut Pasteur, Paris France. | |
| --- | |
| ## License | |
| Others | |