# K-mer–based Sequence Predictor This Space predicts the most likely group of **unknown sequences** using group-specific **unique k-mers** generated by the companion Space: ?? **Unique k-mer discovery Space:** https://huggingface.co/spaces// --- ## Overview This tool assigns each unknown sequence to a group by detecting group-specific k-mers and computing a confidence score. It is designed to work directly with the `kmer_results.zip` produced by the Unique k-mer discovery Space. --- ## Inputs ### 1. Unknown sequences Upload one or more FASTA files containing unknown sequences: - `.fa`, `.fasta`, `.fas`, `.fna` ### 2. K-mer results ZIP Upload **`kmer_results.zip`** generated by the Unique k-mer discovery Space. > ?? This Space only accepts ZIP input for k-mers to ensure compatibility > and reproducibility. --- ## Parameters - **Sequence type** - `dna` or `protein` - **Mode** - **fast**: exact k-mer matching (recommended) - **full**: alignment-based matching + Fisher test + FDR (slower) - **Identity / Coverage / FDR** - Used only in *full* mode --- ## Outputs - **predictions_by_alignment.csv** - One row per sequence - Predicted group and confidence metrics - **predicted_results_summary.png** - Group counts and confidence distribution - **prediction_outputs.zip** - ZIP containing all outputs --- ## Performance notes - The **fast** mode is recommended for large datasets. - The **full** mode is computationally intensive and best suited for small validation sets. --- ## Citation If you use this tool, please cite: Muhamed-Kheir TAHA, Institut Pasteur, Paris France. --- ## License Others