Spaces:

DeepActionPotential
/

GeneTypeClassifier

Sleeping

App Files Files Community

DeepActionPotential commited on Sep 29, 2025

Commit

49a4bb8

verified ·

1 Parent(s): b773593

🚀 Initial upload of my app

Browse files

Files changed (15) hide show

.gitattributes +1 -0
LICENSE +21 -0
README.md +96 -9
__pycache__/app.cpython-311.pyc +0 -0
__pycache__/ui.cpython-311.pyc +0 -0
__pycache__/utils.cpython-311.pyc +0 -0
app.py +12 -0
demo/demo.mp4 +3 -0
demo/demo.png +0 -0
gene-type-classifier-using-gbc-f1-97.ipynb +0 -0
models/gradient_boosting_pipeline.pkl +3 -0
models/label_encoder.pkl +3 -0
requirements.txt +4 -0
ui.py +58 -0
utils.py +8 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+demo/demo.mp4 filter=lfs diff=lfs merge=lfs -text

LICENSE ADDED Viewed

	@@ -0,0 +1,21 @@

+MIT License
+Copyright (c) 2025 Eslam Tarek
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

README.md CHANGED Viewed

@@ -1,12 +1,99 @@
 ---
-title: GeneTypeClassifier
-emoji: 😻
-colorFrom: blue
-colorTo: purple
-sdk: gradio
-sdk_version: 5.47.2
-app_file: app.py
-pinned: false
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

+# GeneTypeClassifier — Fast gene type prediction with a trained Gradient Boosting pipeline
+[![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org/) [![License: MIT](https://img.shields.io/badge/License-MIT-green.svg)](LICENSE)
+A lightweight Gradio app to classify gene records using a pre-trained Gradient Boosting model. Point it at a nucleotide sequence and a short description, and get a predicted gene type.
+---
+## Table of Contents
+- **[Demo](#demo)**
+- **[Features](#features)**
+- **[Installation / Setup](#installation--setup)**
+- **[Usage](#usage)**
+- **[Configuration / Options](#configuration--options)**
+- **[Contributing](#contributing)**
+- **[License](#license)**
+- **[Acknowledgements / Credits](#acknowledgements--credits)**
+---
+## Demo
+Below are real assets from `./demo/`:
+![App screenshot](./demo/demo.png)
+<video src="./demo/demo.mp4" controls width="720" title="Demo video"></video>
+---
+## Features
+- **Pretrained model**: Ships with `models/gradient_boosting_pipeline.pkl` and `models/label_encoder.pkl`.
+- **Simple UI**: Gradio interface for quick local testing and sharing.
+- **Deterministic preprocessing**: `get_kmers()` utility for k-mer tokenization baked into the pipeline serialization context.
+- **Reproducible setup**: Minimal, pinned `requirements.txt`.
 ---
+## Installation / Setup
+```bash
+# Create a virtual environment
+python -m venv .venv
+# Activate it
+# On Linux/Mac:
+source .venv/bin/activate
+# On Windows:
+.venv\Scripts\activate
+# Install dependencies
+pip install -r requirements.txt
+```
+---
+## Usage
+Run the Gradio app locally:
+```bash
+python app.py
+```
+This launches the UI defined in `app.py`/`ui.py` and loads the pretrained artifacts from `models/`:
+- `models/gradient_boosting_pipeline.pkl`
+- `models/label_encoder.pkl`
+In the UI, provide:
+- `Nucleotide Sequence` (e.g., ATG...)
+- `Description`
+The app returns the predicted gene type (e.g., `PROTEIN_CODING`, `ncRNA`, etc.).
+> Note: The pickled pipeline expects the helper `get_kmers()` from `utils.py`. Keep the file layout unchanged when running the app.
+---
+## Configuration / Options
+- **Model paths**: The UI loads from `models/`. To swap models, replace the `.pkl` files with compatible artifacts and keep the filenames or update the paths in `ui.py` (`pipeline` and `label_encoder` loaders).
+- **Gradio server options**: To customize host/port, edit `demo.launch()` in `app.py`, e.g. `demo.launch(server_name="0.0.0.0", server_port=7860)`.
+---
+## Contributing
+- **Issues & ideas**: Open an issue describing the change and rationale.
+- **PRs**: Keep changes focused, add clear descriptions, and update docs if behavior changes.
+- **Style**: Prefer small, readable functions and explicit dependencies.
+---
+## License
+This project is licensed under the [MIT License](LICENSE).
 ---
+## Acknowledgements / Credits
+- Built with **Gradio** for the UI and **scikit-learn** for the model pipeline. 🧬🚀

__pycache__/app.cpython-311.pyc ADDED Viewed

Binary file (1.41 kB). View file

__pycache__/ui.cpython-311.pyc ADDED Viewed

Binary file (2.49 kB). View file

__pycache__/utils.cpython-311.pyc ADDED Viewed

Binary file (935 Bytes). View file

app.py ADDED Viewed

	@@ -0,0 +1,12 @@

+import joblib
+import pandas as pd
+from utils import get_kmers
+from ui import build_ui
+# 🔹 Entry point
+if __name__ == "__main__":
+    demo = build_ui()
+    demo.launch()

demo/demo.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:057d6b5a9b4bfda588f0581ba98c3206e2bf3fc35922047259c278d22f7a05d3
+size 236085

demo/demo.png ADDED Viewed

gene-type-classifier-using-gbc-f1-97.ipynb ADDED Viewed

The diff for this file is too large to render. See raw diff

models/gradient_boosting_pipeline.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9c58b51e8f3f3d0f190268f2db5b9e64320e0af7d78eecf969e913654a7e3b39
+size 1221382

models/label_encoder.pkl ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:54eb9b27a26c49ad3f9ffe0318c29f5732977686d0334ad76a66b5c3a7b4874c
+size 407

requirements.txt ADDED Viewed

	@@ -0,0 +1,4 @@

+gradio==4.44.0
+pandas==2.2.2
+scikit-learn==1.5.2
+joblib==1.4.2

ui.py ADDED Viewed

	@@ -0,0 +1,58 @@

+import gradio as gr
+from utils import get_kmers  # ensure this is accessible for joblib
+import joblib
+import pandas as pd
+# 🔹 Load trained pipeline & label encoder
+pipeline = joblib.load("models/gradient_boosting_pipeline.pkl")
+label_encoder = joblib.load("models/label_encoder.pkl")
+idx_to_label = {
+    0: "BIOLOGICAL_REGION",
+    1: "OTHER",
+    2: "PROTEIN_CODING",
+    3: "PSEUDO",
+    4: "ncRNA",
+    5: "rRNA",
+    6: "scRNA",
+    7: "snRNA",
+    8: "snoRNA",
+    9: "tRNA"
+}
+def predict_gene(sequence, description):
+    """
+    Run prediction for a single sample (called from UI).
+    """
+    data = {
+        "NucleotideSequence": sequence,
+        "GeneGroupMethod": 'NCBI Ortholog',
+        "Description": description,
+        "SequenceLength": int(len(sequence)),
+    }
+    df = pd.DataFrame([data])
+    pred = pipeline.predict(df)[0]
+    return idx_to_label.get(pred, "Unknown")
+def build_ui():
+    with gr.Blocks() as demo:
+        gr.Markdown("# 🧬 Gene Type Classifier")
+        gr.Markdown("Enter gene details below and get predictions:")
+        sequence = gr.Textbox(label="Nucleotide Sequence", placeholder="Enter sequence...")
+        description = gr.Textbox(label="Description", placeholder="Enter description...")
+        output = gr.Textbox(label="Prediction Result")
+        gr.Button("Predict").click(
+            predict_gene,
+            inputs=[sequence, description],
+            outputs=output
+        )
+    return demo

utils.py ADDED Viewed

	@@ -0,0 +1,8 @@

+def get_kmers(sequence, size=3):
+    """
+    Convert a nucleotide sequence into k-mers (default: 3-mers).
+    Example: 'ATGC' -> 'ATG TGC'
+    """
+    if not isinstance(sequence, str):
+        return ""
+    return " ".join([sequence[i:i+size] for i in range(len(sequence) - size + 1)])