Spaces:
Running
Running
File size: 1,683 Bytes
47a59ac | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 | # K-mer–based Sequence Predictor This Space predicts the most likely group of **unknown sequences** using group-specific **unique k-mers** generated by the companion Space: ?? **Unique k-mer discovery Space:** https://huggingface.co/spaces/<your-username>/<space-1-name> --- ## Overview This tool assigns each unknown sequence to a group by detecting group-specific k-mers and computing a confidence score. It is designed to work directly with the `kmer_results.zip` produced by the Unique k-mer discovery Space. --- ## Inputs ### 1. Unknown sequences Upload one or more FASTA files containing unknown sequences: - `.fa`, `.fasta`, `.fas`, `.fna` ### 2. K-mer results ZIP Upload **`kmer_results.zip`** generated by the Unique k-mer discovery Space. > ?? This Space only accepts ZIP input for k-mers to ensure compatibility > and reproducibility. --- ## Parameters - **Sequence type** - `dna` or `protein` - **Mode** - **fast**: exact k-mer matching (recommended) - **full**: alignment-based matching + Fisher test + FDR (slower) - **Identity / Coverage / FDR** - Used only in *full* mode --- ## Outputs - **predictions_by_alignment.csv** - One row per sequence - Predicted group and confidence metrics - **predicted_results_summary.png** - Group counts and confidence distribution - **prediction_outputs.zip** - ZIP containing all outputs --- ## Performance notes - The **fast** mode is recommended for large datasets. - The **full** mode is computationally intensive and best suited for small validation sets. --- ## Citation If you use this tool, please cite: Muhamed-Kheir TAHA, Institut Pasteur, Paris France. --- ## License Others |