FtsI_Classifier / README.txt
Muhamed-Kheir's picture
Update README.txt
47a59ac verified
# K-mer–based Sequence Predictor
This Space predicts the most likely group of **unknown sequences** using
group-specific **unique k-mers** generated by the companion Space:
?? **Unique k-mer discovery Space:**
https://huggingface.co/spaces/<your-username>/<space-1-name>
---
## Overview
This tool assigns each unknown sequence to a group by detecting
group-specific k-mers and computing a confidence score.
It is designed to work directly with the `kmer_results.zip`
produced by the Unique k-mer discovery Space.
---
## Inputs
### 1. Unknown sequences
Upload one or more FASTA files containing unknown sequences:
- `.fa`, `.fasta`, `.fas`, `.fna`
### 2. K-mer results ZIP
Upload **`kmer_results.zip`** generated by the Unique k-mer discovery Space.
> ?? This Space only accepts ZIP input for k-mers to ensure compatibility
> and reproducibility.
---
## Parameters
- **Sequence type**
- `dna` or `protein`
- **Mode**
- **fast**: exact k-mer matching (recommended)
- **full**: alignment-based matching + Fisher test + FDR (slower)
- **Identity / Coverage / FDR**
- Used only in *full* mode
---
## Outputs
- **predictions_by_alignment.csv**
- One row per sequence
- Predicted group and confidence metrics
- **predicted_results_summary.png**
- Group counts and confidence distribution
- **prediction_outputs.zip**
- ZIP containing all outputs
---
## Performance notes
- The **fast** mode is recommended for large datasets.
- The **full** mode is computationally intensive and best suited for
small validation sets.
---
## Citation
If you use this tool, please cite:
Muhamed-Kheir TAHA, Institut Pasteur, Paris France.
---
## License
Others