Spaces:

IBIBoW
/

FtsI_Classifier

Sleeping

App Files Files Community

Muhamed-Kheir commited on Dec 17, 2025

Commit

47a59ac

verified ·

1 Parent(s): aa33062

Update README.txt

Browse files

Files changed (1) hide show

README.txt +75 -18

README.txt CHANGED Viewed

@@ -1,18 +1,75 @@
-# Multi-group unique k-mer analysis
-This tool compares multiple groups of FASTA sequences (one directory per group) and identifies **k-mers unique to each group** relative to all other groups. It outputs per-group TSV files, a summary Excel file, and two plots.
-## Install
-pip install -r requirements.txt
-## Run
-python kmer_unique.py \
-  --group-dirs path/to/groupA path/to/groupB path/to/groupC \
-  --k-min 15 --k-max 31 --min-freq 5 \
-  --outdir results
-## Outputs
-- `results/unique_k{k}_{group}.tsv` : unique k-mers and counts
-- `results/kmer_summary.xlsx` : summary table across k
-- `results/unique_kmers_per_group.png`
-- `results/total_freq_per_group.png`

+# K-mer–based Sequence Predictor
+This Space predicts the most likely group of **unknown sequences** using
+group-specific **unique k-mers** generated by the companion Space:
+?? **Unique k-mer discovery Space:**
+https://huggingface.co/spaces/<your-username>/<space-1-name>
+---
+## Overview
+This tool assigns each unknown sequence to a group by detecting
+group-specific k-mers and computing a confidence score.
+It is designed to work directly with the `kmer_results.zip`
+produced by the Unique k-mer discovery Space.
+---
+## Inputs
+### 1. Unknown sequences
+Upload one or more FASTA files containing unknown sequences:
+- `.fa`, `.fasta`, `.fas`, `.fna`
+### 2. K-mer results ZIP
+Upload **`kmer_results.zip`** generated by the Unique k-mer discovery Space.
+> ?? This Space only accepts ZIP input for k-mers to ensure compatibility
+> and reproducibility.
+---
+## Parameters
+- **Sequence type**
+  - `dna` or `protein`
+- **Mode**
+  - **fast**: exact k-mer matching (recommended)
+  - **full**: alignment-based matching + Fisher test + FDR (slower)
+- **Identity / Coverage / FDR**
+  - Used only in *full* mode
+---
+## Outputs
+- **predictions_by_alignment.csv**
+  - One row per sequence
+  - Predicted group and confidence metrics
+- **predicted_results_summary.png**
+  - Group counts and confidence distribution
+- **prediction_outputs.zip**
+  - ZIP containing all outputs
+---
+## Performance notes
+- The **fast** mode is recommended for large datasets.
+- The **full** mode is computationally intensive and best suited for
+  small validation sets.
+---
+## Citation
+If you use this tool, please cite:
+Muhamed-Kheir TAHA, Institut Pasteur, Paris France.
+---
+## License
+Others