Spaces:

IBIBoW
/

FtsI_Classifier

Running

App Files Files Community

FtsI_Classifier / README.txt

Muhamed-Kheir

Update README.txt

47a59ac verified 3 months ago

raw

history blame contribute delete

1.68 kB

	# K-mer–based Sequence Predictor

	This Space predicts the most likely group of unknown sequences using
	group-specific unique k-mers generated by the companion Space:

	?? Unique k-mer discovery Space:
	https://huggingface.co/spaces/<your-username>/<space-1-name>

	---

	## Overview

	This tool assigns each unknown sequence to a group by detecting
	group-specific k-mers and computing a confidence score.
	It is designed to work directly with the `kmer_results.zip`
	produced by the Unique k-mer discovery Space.

	---

	## Inputs

	### 1. Unknown sequences
	Upload one or more FASTA files containing unknown sequences:
	- `.fa`, `.fasta`, `.fas`, `.fna`

	### 2. K-mer results ZIP
	Upload `kmer_results.zip` generated by the Unique k-mer discovery Space.

	> ?? This Space only accepts ZIP input for k-mers to ensure compatibility
	> and reproducibility.

	---

	## Parameters

	- Sequence type
	- `dna` or `protein`
	- Mode
	- fast: exact k-mer matching (recommended)
	- full: alignment-based matching + Fisher test + FDR (slower)
	- Identity / Coverage / FDR
	- Used only in full mode

	---

	## Outputs

	- predictions_by_alignment.csv
	- One row per sequence
	- Predicted group and confidence metrics
	- predicted_results_summary.png
	- Group counts and confidence distribution
	- prediction_outputs.zip
	- ZIP containing all outputs

	---

	## Performance notes

	- The fast mode is recommended for large datasets.
	- The full mode is computationally intensive and best suited for
	small validation sets.

	---

	## Citation

	If you use this tool, please cite:

	Muhamed-Kheir TAHA, Institut Pasteur, Paris France.

	---

	## License
	Others