DeepActionPotential commited on
Commit
49a4bb8
·
verified ·
1 Parent(s): b773593

🚀 Initial upload of my app

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ demo/demo.mp4 filter=lfs diff=lfs merge=lfs -text
LICENSE ADDED
@@ -0,0 +1,21 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ MIT License
2
+
3
+ Copyright (c) 2025 Eslam Tarek
4
+
5
+ Permission is hereby granted, free of charge, to any person obtaining a copy
6
+ of this software and associated documentation files (the "Software"), to deal
7
+ in the Software without restriction, including without limitation the rights
8
+ to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9
+ copies of the Software, and to permit persons to whom the Software is
10
+ furnished to do so, subject to the following conditions:
11
+
12
+ The above copyright notice and this permission notice shall be included in all
13
+ copies or substantial portions of the Software.
14
+
15
+ THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16
+ IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17
+ FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18
+ AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19
+ LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20
+ OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21
+ SOFTWARE.
README.md CHANGED
@@ -1,12 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
- title: GeneTypeClassifier
3
- emoji: 😻
4
- colorFrom: blue
5
- colorTo: purple
6
- sdk: gradio
7
- sdk_version: 5.47.2
8
- app_file: app.py
9
- pinned: false
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ---
11
 
12
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
1
+ # GeneTypeClassifier — Fast gene type prediction with a trained Gradient Boosting pipeline
2
+
3
+ [![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
4
+
5
+ A lightweight Gradio app to classify gene records using a pre-trained Gradient Boosting model. Point it at a nucleotide sequence and a short description, and get a predicted gene type.
6
+
7
+ ---
8
+
9
+ ## Table of Contents
10
+ - **[Demo](#demo)**
11
+ - **[Features](#features)**
12
+ - **[Installation / Setup](#installation--setup)**
13
+ - **[Usage](#usage)**
14
+ - **[Configuration / Options](#configuration--options)**
15
+ - **[Contributing](#contributing)**
16
+ - **[License](#license)**
17
+ - **[Acknowledgements / Credits](#acknowledgements--credits)**
18
+
19
+ ---
20
+
21
+ ## Demo
22
+
23
+ Below are real assets from `./demo/`:
24
+
25
+ ![App screenshot](./demo/demo.png)
26
+
27
+ <video src="./demo/demo.mp4" controls width="720" title="Demo video"></video>
28
+
29
+ ---
30
+
31
+ ## Features
32
+ - **Pretrained model**: Ships with `models/gradient_boosting_pipeline.pkl` and `models/label_encoder.pkl`.
33
+ - **Simple UI**: Gradio interface for quick local testing and sharing.
34
+ - **Deterministic preprocessing**: `get_kmers()` utility for k-mer tokenization baked into the pipeline serialization context.
35
+ - **Reproducible setup**: Minimal, pinned `requirements.txt`.
36
+
37
  ---
38
+
39
+ ## Installation / Setup
40
+
41
+ ```bash
42
+ # Create a virtual environment
43
+ python -m venv .venv
44
+
45
+ # Activate it
46
+ # On Linux/Mac:
47
+ source .venv/bin/activate
48
+ # On Windows:
49
+ .venv\Scripts\activate
50
+
51
+ # Install dependencies
52
+ pip install -r requirements.txt
53
+ ```
54
+
55
+ ---
56
+
57
+ ## Usage
58
+
59
+ Run the Gradio app locally:
60
+
61
+ ```bash
62
+ python app.py
63
+ ```
64
+
65
+ This launches the UI defined in `app.py`/`ui.py` and loads the pretrained artifacts from `models/`:
66
+ - `models/gradient_boosting_pipeline.pkl`
67
+ - `models/label_encoder.pkl`
68
+
69
+ In the UI, provide:
70
+ - `Nucleotide Sequence` (e.g., ATG...)
71
+ - `Description`
72
+
73
+ The app returns the predicted gene type (e.g., `PROTEIN_CODING`, `ncRNA`, etc.).
74
+
75
+ > Note: The pickled pipeline expects the helper `get_kmers()` from `utils.py`. Keep the file layout unchanged when running the app.
76
+
77
+ ---
78
+
79
+ ## Configuration / Options
80
+ - **Model paths**: The UI loads from `models/`. To swap models, replace the `.pkl` files with compatible artifacts and keep the filenames or update the paths in `ui.py` (`pipeline` and `label_encoder` loaders).
81
+ - **Gradio server options**: To customize host/port, edit `demo.launch()` in `app.py`, e.g. `demo.launch(server_name="0.0.0.0", server_port=7860)`.
82
+
83
+ ---
84
+
85
+ ## Contributing
86
+ - **Issues & ideas**: Open an issue describing the change and rationale.
87
+ - **PRs**: Keep changes focused, add clear descriptions, and update docs if behavior changes.
88
+ - **Style**: Prefer small, readable functions and explicit dependencies.
89
+
90
+ ---
91
+
92
+ ## License
93
+
94
+ This project is licensed under the [MIT License](LICENSE).
95
+
96
  ---
97
 
98
+ ## Acknowledgements / Credits
99
+ - Built with **Gradio** for the UI and **scikit-learn** for the model pipeline. 🧬🚀
__pycache__/app.cpython-311.pyc ADDED
Binary file (1.41 kB). View file
 
__pycache__/ui.cpython-311.pyc ADDED
Binary file (2.49 kB). View file
 
__pycache__/utils.cpython-311.pyc ADDED
Binary file (935 Bytes). View file
 
app.py ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import joblib
2
+ import pandas as pd
3
+ from utils import get_kmers
4
+ from ui import build_ui
5
+
6
+
7
+
8
+
9
+ # 🔹 Entry point
10
+ if __name__ == "__main__":
11
+ demo = build_ui()
12
+ demo.launch()
demo/demo.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:057d6b5a9b4bfda588f0581ba98c3206e2bf3fc35922047259c278d22f7a05d3
3
+ size 236085
demo/demo.png ADDED
gene-type-classifier-using-gbc-f1-97.ipynb ADDED
The diff for this file is too large to render. See raw diff
 
models/gradient_boosting_pipeline.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9c58b51e8f3f3d0f190268f2db5b9e64320e0af7d78eecf969e913654a7e3b39
3
+ size 1221382
models/label_encoder.pkl ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:54eb9b27a26c49ad3f9ffe0318c29f5732977686d0334ad76a66b5c3a7b4874c
3
+ size 407
requirements.txt ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ gradio==4.44.0
2
+ pandas==2.2.2
3
+ scikit-learn==1.5.2
4
+ joblib==1.4.2
ui.py ADDED
@@ -0,0 +1,58 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import gradio as gr
2
+ from utils import get_kmers # ensure this is accessible for joblib
3
+ import joblib
4
+ import pandas as pd
5
+
6
+ # 🔹 Load trained pipeline & label encoder
7
+ pipeline = joblib.load("models/gradient_boosting_pipeline.pkl")
8
+ label_encoder = joblib.load("models/label_encoder.pkl")
9
+
10
+ idx_to_label = {
11
+ 0: "BIOLOGICAL_REGION",
12
+ 1: "OTHER",
13
+ 2: "PROTEIN_CODING",
14
+ 3: "PSEUDO",
15
+ 4: "ncRNA",
16
+ 5: "rRNA",
17
+ 6: "scRNA",
18
+ 7: "snRNA",
19
+ 8: "snoRNA",
20
+ 9: "tRNA"
21
+ }
22
+
23
+
24
+ def predict_gene(sequence, description):
25
+ """
26
+ Run prediction for a single sample (called from UI).
27
+ """
28
+ data = {
29
+ "NucleotideSequence": sequence,
30
+ "GeneGroupMethod": 'NCBI Ortholog',
31
+ "Description": description,
32
+ "SequenceLength": int(len(sequence)),
33
+ }
34
+ df = pd.DataFrame([data])
35
+
36
+ pred = pipeline.predict(df)[0]
37
+ return idx_to_label.get(pred, "Unknown")
38
+
39
+
40
+
41
+
42
+ def build_ui():
43
+ with gr.Blocks() as demo:
44
+ gr.Markdown("# 🧬 Gene Type Classifier")
45
+ gr.Markdown("Enter gene details below and get predictions:")
46
+
47
+ sequence = gr.Textbox(label="Nucleotide Sequence", placeholder="Enter sequence...")
48
+ description = gr.Textbox(label="Description", placeholder="Enter description...")
49
+
50
+ output = gr.Textbox(label="Prediction Result")
51
+
52
+ gr.Button("Predict").click(
53
+ predict_gene,
54
+ inputs=[sequence, description],
55
+ outputs=output
56
+ )
57
+
58
+ return demo
utils.py ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ def get_kmers(sequence, size=3):
2
+ """
3
+ Convert a nucleotide sequence into k-mers (default: 3-mers).
4
+ Example: 'ATGC' -> 'ATG TGC'
5
+ """
6
+ if not isinstance(sequence, str):
7
+ return ""
8
+ return " ".join([sequence[i:i+size] for i in range(len(sequence) - size + 1)])