yangliz5 commited on
Commit
896a01a
Β·
1 Parent(s): c55d8da

dev: first commit

Browse files
Files changed (5) hide show
  1. README.md +129 -7
  2. app.py +769 -0
  3. logo-pixel.svg +77 -0
  4. logo.png +0 -0
  5. requirements.txt +33 -0
README.md CHANGED
@@ -1,14 +1,136 @@
1
  ---
2
- title: ChimeraLM
3
- emoji: πŸ”₯
4
- colorFrom: purple
5
  colorTo: purple
6
  sdk: gradio
7
- sdk_version: 5.49.1
8
  app_file: app.py
9
- pinned: false
10
  license: apache-2.0
11
- short_description: 'A genomic language model that distinguishes true structural '
 
 
 
 
 
 
 
 
12
  ---
13
 
14
- Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ title: ChimeraLM - Chimeric Read Detector
3
+ emoji: 🧬
4
+ colorFrom: blue
5
  colorTo: purple
6
  sdk: gradio
7
+ sdk_version: "5.0.0"
8
  app_file: app.py
9
+ pinned: true
10
  license: apache-2.0
11
+ tags:
12
+ - genomics
13
+ - bioinformatics
14
+ - deep-learning
15
+ - dna-sequence
16
+ - chimera-detection
17
+ - whole-genome-amplification
18
+ - pytorch
19
+ - lightning
20
  ---
21
 
22
+ # 🧬 ChimeraLM: Chimeric Read Detector
23
+
24
+ <div align="center">
25
+
26
+ ![ChimeraLM](https://img.shields.io/badge/ChimeraLM-v1.0.4-blue?style=for-the-badge)
27
+ ![Python](https://img.shields.io/badge/Python-3.10+-green?style=for-the-badge)
28
+ ![PyTorch](https://img.shields.io/badge/PyTorch-2.5-orange?style=for-the-badge)
29
+ ![License](https://img.shields.io/badge/License-Apache_2.0-red?style=for-the-badge)
30
+
31
+ **Advanced Chimeric Read Detection using Deep Learning**
32
+
33
+ [🏠 Homepage](https://github.com/ylab-hi/ChimeraLM) | [πŸ“š Documentation](https://ylab-hi.github.io/ChimeraLM/) | [πŸ€— Model](https://huggingface.co/yangliz5/chimeralm) | [πŸ“¦ PyPI](https://pypi.org/project/chimeralm/)
34
+
35
+ </div>
36
+
37
+ ---
38
+
39
+ ## πŸš€ What is ChimeraLM?
40
+
41
+ ChimeraLM is a state-of-the-art genomic language model designed to identify **chimeric artifacts** introduced by whole genome amplification (WGA). Chimeric reads are artificial DNA sequences where fragments from different genomic locations are incorrectly joined together during the amplification process.
42
+
43
+ ### ⚑ Key Features
44
+
45
+ - **🎯 High Accuracy**: 98%+ accuracy in detecting chimeric vs biological reads
46
+ - **⚑ Fast Inference**: Optimized for both CPU and GPU (CUDA/MPS)
47
+ - **πŸ“ Long Sequences**: Supports DNA sequences up to 32,768 nucleotides
48
+ - **πŸ€– Pre-trained Model**: Ready-to-use model from Hugging Face Hub
49
+ - **πŸ”¬ Research-Grade**: Trained on real WGA data from genomic studies
50
+
51
+ ### 🧬 How It Works
52
+
53
+ 1. **Input**: DNA sequence (A, C, G, T, N nucleotides)
54
+ 2. **Processing**: HyenaDNA-based transformer model analyzes the sequence
55
+ 3. **Output**: Binary classification (Biological vs Chimeric) with confidence scores
56
+
57
+ ---
58
+
59
+ ## πŸ’‘ Use Cases
60
+
61
+ - **Quality Control**: Filter chimeric artifacts from WGA sequencing data
62
+ - **Genomic Analysis**: Improve accuracy of variant calling and assembly
63
+ - **Research**: Study patterns in whole genome amplification
64
+ - **Education**: Learn about chimeric artifacts and deep learning in genomics
65
+
66
+ ---
67
+
68
+ ## πŸ› οΈ Installation & CLI Usage
69
+
70
+ For batch processing and production use, install the CLI tool:
71
+
72
+ ```bash
73
+ # Install via pip
74
+ pip install chimeralm
75
+
76
+ # Predict chimeric reads
77
+ chimeralm predict your_data.bam --gpus 1 --batch-size 24
78
+
79
+ # Filter BAM to remove chimeric reads
80
+ chimeralm filter your_data.bam predictions/
81
+ ```
82
+
83
+ **Requirements**: Python 3.10, 3.11, or 3.12
84
+
85
+ ---
86
+
87
+ ## πŸ“Š Model Architecture
88
+
89
+ - **Backbone**: HyenaDNA-small-32k (256-dim embeddings)
90
+ - **Head**: Binary Sequence Classifier with attention pooling
91
+ - **Loss**: CrossEntropyLoss
92
+ - **Optimizer**: AdamW (lr=1e-4, weight_decay=0.01)
93
+ - **Training Data**: Real WGA chimeric artifacts from genomic studies
94
+
95
+ **Model Size**: ~50M parameters
96
+ **Inference Speed**: ~1000 sequences/second (GPU)
97
+
98
+ ---
99
+
100
+ ## πŸ“– Citation
101
+
102
+ If you use ChimeraLM in your research, please cite:
103
+
104
+ ```bibtex
105
+ @software{chimeralm2025,
106
+ title={ChimeraLM: A genomic language model to identify chimera artifacts},
107
+ author={Li, Yangyang and Guo, Qingxiang and Yang, Rendong},
108
+ year={2025},
109
+ url={https://github.com/ylab-hi/ChimeraLM}
110
+ }
111
+ ```
112
+
113
+ ---
114
+
115
+ ## πŸ”— Resources
116
+
117
+ - **GitHub**: [ylab-hi/ChimeraLM](https://github.com/ylab-hi/ChimeraLM)
118
+ - **Documentation**: [ylab-hi.github.io/ChimeraLM](https://ylab-hi.github.io/ChimeraLM/)
119
+ - **Model Hub**: [yangliz5/chimeralm](https://huggingface.co/yangliz5/chimeralm)
120
+ - **PyPI Package**: [pypi.org/project/chimeralm](https://pypi.org/project/chimeralm/)
121
+
122
+ ---
123
+
124
+ ## πŸ“ License
125
+
126
+ This project is licensed under the **Apache License 2.0** - see the [LICENSE](https://github.com/ylab-hi/ChimeraLM/blob/main/LICENSE) file for details.
127
+
128
+ ---
129
+
130
+ <div align="center">
131
+
132
+ **⭐ Star us on GitHub if you find this useful!**
133
+
134
+ [⭐ Star on GitHub](https://github.com/ylab-hi/ChimeraLM) | [πŸ› Report Bug](https://github.com/ylab-hi/ChimeraLM/issues) | [πŸ’‘ Request Feature](https://github.com/ylab-hi/ChimeraLM/issues)
135
+
136
+ </div>
app.py ADDED
@@ -0,0 +1,769 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ """Gradio Web UI for ChimeraLM - Hugging Face Spaces Version."""
2
+
3
+ import logging
4
+ import os
5
+
6
+ import gradio as gr
7
+ import plotly.graph_objects as go
8
+ import torch
9
+
10
+ import chimeralm
11
+ from chimeralm.data.tokenizer import load_tokenizer_from_hyena_model
12
+
13
+ # Set up logging
14
+ logging.basicConfig(
15
+ level=logging.INFO,
16
+ format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
17
+ )
18
+ logger = logging.getLogger(__name__)
19
+
20
+
21
+ class ChimeraLMPredictor:
22
+ """ChimeraLM predictor for web interface."""
23
+
24
+ def __init__(self):
25
+ self.model = None
26
+ self.tokenizer = None
27
+ self.device = "cuda" if torch.cuda.is_available() else "cpu"
28
+ logger.info(f"Using device: {self.device}")
29
+ self._load_model()
30
+
31
+ def _load_model(self):
32
+ """Load the ChimeraLM model and tokenizer."""
33
+ try:
34
+ logger.info("Loading ChimeraLM model from Hugging Face Hub...")
35
+ self.model = chimeralm.models.ChimeraLM.from_pretrained("yangliz5/chimeralm")
36
+ self.model.eval()
37
+ self.model.to(self.device)
38
+
39
+ logger.info("Loading tokenizer...")
40
+ self.tokenizer = load_tokenizer_from_hyena_model("hyenadna-small-32k-seqlen")
41
+ logger.info(f"βœ… Model loaded successfully on {self.device}")
42
+ except Exception as e:
43
+ logger.error(f"❌ Failed to load model: {e}")
44
+ raise
45
+
46
+ def predict(self, sequence: str) -> tuple[str, float, dict]:
47
+ """Predict if a DNA sequence is chimeric or biological."""
48
+ if not sequence or not sequence.strip():
49
+ return "Please enter a DNA sequence", 0.0, {}
50
+
51
+ # Clean and validate sequence
52
+ sequence = sequence.strip().upper()
53
+ valid_chars = set("ACGTNacgtn")
54
+ if not all(c in valid_chars for c in sequence):
55
+ return "Invalid characters in sequence. Only A, C, G, T, N are allowed.", 0.0, {}
56
+
57
+ sequence = sequence.upper()
58
+
59
+ try:
60
+ # Tokenize sequence
61
+ tokenized = self.tokenizer(
62
+ sequence,
63
+ truncation=True,
64
+ padding=True,
65
+ max_length=32768,
66
+ return_tensors="pt",
67
+ )
68
+
69
+ # Extract input_ids and move to device
70
+ input_ids = tokenized["input_ids"].to(self.device)
71
+ input_quals = None # We don't have quality scores for web input
72
+
73
+ # Make prediction
74
+ with torch.no_grad():
75
+ logits = self.model(input_ids, input_quals)
76
+ probabilities = torch.softmax(logits, dim=-1)
77
+ predicted_class = torch.argmax(probabilities, dim=-1).item()
78
+ confidence = probabilities[0][predicted_class].item()
79
+
80
+ # Interpret results
81
+ class_names = ["Biological", "Chimeric Artifact"]
82
+ prediction = class_names[predicted_class]
83
+
84
+ # Create confidence breakdown
85
+ confidence_breakdown = {
86
+ "Biological": f"{probabilities[0][0].item():.3f}",
87
+ "Chimeric Artifact": f"{probabilities[0][1].item():.3f}",
88
+ }
89
+
90
+ logger.info(f"Prediction: {prediction} (confidence: {confidence:.3f})")
91
+ return prediction, confidence, confidence_breakdown
92
+
93
+ except Exception as e:
94
+ logger.error(f"Prediction error: {e}")
95
+ return f"Prediction failed: {e}", 0.0, {}
96
+
97
+
98
+ def create_interface():
99
+ """Create the Gradio interface."""
100
+ predictor = ChimeraLMPredictor()
101
+
102
+ def predict_sequence(sequence):
103
+ prediction, confidence, breakdown = predictor.predict(sequence)
104
+
105
+ # Format output with enhanced styling
106
+ if "❌" in prediction or "⚠️" in prediction or "Please" in prediction or "Invalid" in prediction or "Prediction failed" in prediction:
107
+ result_text = f"### {prediction}"
108
+ else:
109
+ # Color-coded results with better styling
110
+ color = "#4CAF50" if prediction == "Biological" else "#F44336"
111
+ icon = "βœ…" if prediction == "Biological" else "⚠️"
112
+ result_text = f"""
113
+ ### {icon} Prediction Result
114
+
115
+ <div style="background: {color}; color: white; padding: 1.5rem; border-radius: 15px; text-align: center; margin: 1rem 0; box-shadow: 0 4px 15px rgba(0,0,0,0.15);">
116
+ <h2 style="margin: 0; font-size: 2rem; font-weight: 700; color: white;">{prediction}</h2>
117
+ <p style="margin: 0.5rem 0 0 0; font-size: 1.2rem; color: rgba(255,255,255,0.95);">Confidence: {confidence:.1%}</p>
118
+ </div>
119
+ """
120
+
121
+ if breakdown:
122
+ result_text += "\n\n### πŸ“Š Detailed Confidence Scores:\n"
123
+ for class_name, prob in breakdown.items():
124
+ emoji = "βœ…" if class_name == "Biological" else "⚠️"
125
+ prob_value = float(prob)
126
+ result_text += f"- {emoji} **{class_name}**: {prob_value:.1%}\n"
127
+
128
+ # Create bar plot with proper contrast
129
+ if breakdown:
130
+ classes = list(breakdown.keys())
131
+ probabilities = [float(prob) for prob in breakdown.values()]
132
+
133
+ # Create colors based on prediction with better contrast
134
+ colors = []
135
+ text_colors = []
136
+ for class_name in classes:
137
+ if class_name == prediction:
138
+ # Vibrant colors for predicted class with white text
139
+ if prediction == "Biological":
140
+ colors.append("#4CAF50") # Green
141
+ else:
142
+ colors.append("#F44336") # Red
143
+ text_colors.append("white")
144
+ else:
145
+ # Medium gray for non-predicted class with dark text
146
+ colors.append("#BDBDBD")
147
+ text_colors.append("#424242")
148
+
149
+ # Create individual bars with appropriate text colors
150
+ bars = []
151
+ for i, (class_name, prob, color, text_color) in enumerate(zip(classes, probabilities, colors, text_colors)):
152
+ bars.append(
153
+ go.Bar(
154
+ x=[class_name],
155
+ y=[prob],
156
+ marker_color=color,
157
+ text=[f"{prob:.1%}"],
158
+ textposition="auto",
159
+ textfont={"size": 20, "color": text_color, "family": "Inter, sans-serif", "weight": 600},
160
+ marker_line={"width": 2, "color": "rgba(255,255,255,0.3)"},
161
+ width=0.5,
162
+ opacity=0.95,
163
+ name=class_name,
164
+ showlegend=False,
165
+ )
166
+ )
167
+
168
+ fig = go.Figure(data=bars)
169
+
170
+ fig.update_layout(
171
+ title={
172
+ "text": "🎯 Prediction Confidence",
173
+ "font": {"size": 20, "color": "#424242", "family": "Arial, sans-serif"},
174
+ "x": 0.5,
175
+ "xanchor": "center",
176
+ },
177
+ xaxis={
178
+ "title": {"text": "Classification", "font": {"size": 14, "color": "#616161"}},
179
+ "tickfont": {"size": 12, "color": "#424242"},
180
+ "gridcolor": "rgba(0,0,0,0.05)",
181
+ "linecolor": "rgba(0,0,0,0.1)",
182
+ "showgrid": True,
183
+ "zeroline": False,
184
+ },
185
+ yaxis={
186
+ "title": {"text": "Probability", "font": {"size": 14, "color": "#616161"}},
187
+ "tickfont": {"size": 12, "color": "#424242"},
188
+ "range": [0, 1.1],
189
+ "gridcolor": "rgba(0,0,0,0.05)",
190
+ "linecolor": "rgba(0,0,0,0.1)",
191
+ "showgrid": True,
192
+ "zeroline": True,
193
+ "zerolinecolor": "rgba(0,0,0,0.1)",
194
+ },
195
+ height=450,
196
+ showlegend=False,
197
+ plot_bgcolor="rgba(255,255,255,1)",
198
+ paper_bgcolor="rgba(255,255,255,1)",
199
+ margin={"l": 60, "r": 60, "t": 80, "b": 60},
200
+ font={"family": "Arial, sans-serif"},
201
+ )
202
+
203
+ fig.update_traces(
204
+ textfont_size=16,
205
+ textfont_color="white",
206
+ textfont_family="Arial, sans-serif",
207
+ marker_line={"width": 1, "color": "rgba(255,255,255,0.8)"},
208
+ width=0.6,
209
+ opacity=0.9,
210
+ )
211
+ else:
212
+ # Create empty plot for error cases
213
+ fig = go.Figure()
214
+ fig.update_layout(
215
+ title={
216
+ "text": "🎯 Prediction Confidence",
217
+ "font": {"size": 20, "color": "#424242", "family": "Arial, sans-serif"},
218
+ "x": 0.5,
219
+ "xanchor": "center",
220
+ },
221
+ xaxis={
222
+ "title": {"text": "Classification", "font": {"size": 14, "color": "#616161"}},
223
+ "tickfont": {"size": 12, "color": "#424242"},
224
+ "gridcolor": "rgba(0,0,0,0.05)",
225
+ "linecolor": "rgba(0,0,0,0.1)",
226
+ },
227
+ yaxis={
228
+ "title": {"text": "Probability", "font": {"size": 14, "color": "#616161"}},
229
+ "tickfont": {"size": 12, "color": "#424242"},
230
+ "range": [0, 1.1],
231
+ "gridcolor": "rgba(0,0,0,0.05)",
232
+ "linecolor": "rgba(0,0,0,0.1)",
233
+ },
234
+ height=450,
235
+ showlegend=False,
236
+ plot_bgcolor="rgba(255,255,255,1)",
237
+ paper_bgcolor="rgba(255,255,255,1)",
238
+ margin={"l": 60, "r": 60, "t": 80, "b": 60},
239
+ font={"family": "Arial, sans-serif"},
240
+ )
241
+
242
+ return result_text, fig
243
+
244
+ # Example sequences - more realistic with varied patterns
245
+ # 1, 1, 0
246
+ examples = [
247
+ ["TTGTGTGCCTTCATTAGTTATATACTAGTTCCTGATAAATTCATTTATAGAACAGAAAGACCACAGATTCAATTATATGGAATAGATCTGCTGGTGAATGTAAGAAAGTCTTCTGAACTGCGAAGGGAAAATAAATGATTTAATTCCCACCACCTCTCAACAGCTACCTTCTGTTTTAGAGACACTGGTAAAACTTCTGGGGCTCTTACTTGACATACCTACATCGTATTATAGGCCTATTGGTTTTATCAGAATAATATGCTTTCCTCACATAAGTTATTTCTTTCTGTTACTTGCTTGCAGTACAGATTTAAAGGGGCATTCAGGCAGCCTCCAGATGCCATGATGGATTAACTCTCATGTTACACAGTAATGTAGAAGCTTCTCTTCATTCTCAGACTTTATCTGACAATGAAGAGAAGCTTCTAATTATACTGTGTAAGTTGATCATGTAACACATCTGGAGGCTGCCTGAATGCCCCTTAAATCTGTACTGCAAGCAAGTAACAGAAAGAAATAACTTATGTGAGGAAAGCATATTATTCTGATAAAACCAATACACCCTTATAATACGATGTAGGTATGTCAAGTAAGAGCCCCAGAAGTTTGCAGTATCTAAAACAAAGGTGTTTGTTGAGGTAGTGAGGAAAATAAATCATTTATTTTCCCTTCGCAGTTCAGGCAACACTTTCTTTACATTCACCACCAGATTCCATATAATCTGTGGGAGTCTTTGGCTGTTCTATAAAATGAATTTATCAGTAAATGA"],
248
+ ["CAATGGTAAATGAATTCAATAAATATTTGAGGTGATTAAATTTCCTTTCCTAACACATTTTATTTCAAATTCTATTTGAAAGAAAAAATGCTAACAACATAAGAGATCAAATTCAGCTACCTATTTTTTCAACATTCAAATATGCATTAATTGTCTACACTTTGCTAAGCTTGGGCTGATTTCTAGGGCTATAAACATAAATTAAATTTATTCATGGATCTTAAGTGGCTCATGAGCATTAGTACAGCATATTTATAAGCCGAGCATAGTGTCTCATACCTATAATCCCAACACTGGGAGGCTGAGGTGGGAGGATCTCTTGAAGCCGGGAGTTCAAGAACTGCCTGGAAAACATAGCAAGACCCTGTCTCTACCAAAAACAAACAAATAAAACTTAGCCGGGAGTGGCTGCACCTGTAGCTACTCAGGAGTCTGTGATTGGAGGGTAATTTGAACACAGGAGTTTGAGATAGCAGCAAGCTATGATCATGCCACTGTACTCCAGCCTAATTGACAGAACAAGAGCCTGTCTCTAAAATCATTCCATATGTCTATATATAGATATATATATCAAGAAAACTTTACTTTCTAGATTCTAGTTTGTTTTATTGCTCATTCTTTTCTAAATTTATTCATTAGGAGGTATATACAATGTGTTTCAGAGATATAAGAATAGTAAACTTAGAGTGAAAAGGGAAAGATATTTCTTGTTAAAATTCCTAAAATAAAGTATTAAACTTATCTATGAAAAGGCATACATTTCTGTCTGATATTTTATATAAAATAATGGGAACATAATCATATATAATATTTTCTATAAAATGCTTAACAGGTTTTCATAACTTAAATTGTACTTAATATTTTAGGAATTTTAACAATATTCTTCCCTTTTCACTCTAAGTTTACTGTCTTAACCCCCAAAAAACACATTGTCTGTACACCTCCTAATGAATAAATTTAGAAAAAGAAAAAATACAGCAATAAAACAAACTAGTAATACTGGAAGAGTCAAACTTTCTGATATTGTGTACCTCTTCTTATAAAGACATATGGAATGATTTTGAGGACAGGTATTGTTCTGATTAGGCTGGAGTACAGTGGCATGATCATAGCTTACTGCTATCTCGAACTCCTGTGTTAAATTCTCTCCAATCACAGACTCCTGAGTAGCTACAGGTGAGCCACTGCCCGGCTAAGTTTTATTTGTTTTGTTTTTGGTAGAGACAGGGTCTTGCTATGTTTGCCAGGCTGGTCTTGAACTCCCGGCTTCAAGAGATCCTCCCACCTCAGCCTCCCAGTGTTGGGATTATAGGTATGAGACACTATGCTCAGCTAACAAATATATAATGCTCATGAGCCACTAATCAAGTCAAGAATTTAAATTTATGTTTATAGCCCCATCAGCCCCAAGCTTAGCAAAGTGTAGACAATTAATGTAACATTTGAATGCTGAAAAAATAGGTATAGAAATTTGATCTTACCCTATATTGTAGCATTTTTTTCTTTCAAATAAATTTGAAATAAAATGTGTAGGGAAAAGGAAATTAAATCACCTCAACATTTTATAAAAATCATTTACCATTGGCTAT"],
249
+ ["ATGTTGTGTACCTGGTTCGGTTCGTCTATGGTATGCACCTTGGCTATCATCACCCGATGAGGCAACCAGCCGGGAGACACCTAAACCCATCATCTCCTGTACCACCCTAGTAGGCTCCCTTCCCCTACTCATCGCACTAATTTACACTCACAACACCCTAGGCTCACTAAACATTCTACTACTCACTCTCACTGCCCAACTAAACTCCTGGCCATCCCCTTATGAGCGGGCGCAGTGATTATAGGCTTTCGCTCTAAGATTAAAAATGCCCTAGCCCACTTCTTACCACAAGGCACACCTACACCCCTTATCCCCATACTGGCTGTTGTGAAAACCATAGCCTACTATCGTTCAACAATAGCCCTGGCCGTACGCCTAACCGCTAACATTACTGCAGGCCACCTACTCATGCACCTAATTGGAAGCGCCACCCTAGCAATATCAACCATTAACCTTACCTACACTTATAGTCTTTCACAATTCTAATTCTACTGACTATCCTAGAAATCGCTGTCGCCTTAATCCAAGCCTACGTTTTCACACTTCTAGTAAGCCTCTACCTGCACGACAACACATAATGACCCACCAATCACATGCCTATCATGGCTAAACCCAGCCCATGACCCCTAACAGGGGCCCTCTCAGCCCTCCTAATGACCTCCGGCCTAGCCATGTGATTTCACTTCCACTCCATAACGCTCCTCATACTAGGCCTACTAACCAACACACTAACCATATACCAATAATGGCAATGTAACGCAAAGCACATACCAAGGCCACCACACACCACCTCTATTAAAAAGGCC"],
250
+ ]
251
+
252
+ # Custom CSS for modern, visually appealing styling
253
+ custom_css = """
254
+ @import url('https://fonts.googleapis.com/css2?family=Inter:wght@400;500;600;700&display=swap');
255
+
256
+ * {
257
+ font-family: 'Inter', -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif;
258
+ }
259
+
260
+ /* Global text color improvements */
261
+ body {
262
+ background: linear-gradient(135deg, #f5f7fa 0%, #e3e7ed 100%);
263
+ min-height: 100vh;
264
+ }
265
+
266
+ /* Ensure all headings have good contrast */
267
+ h1, h2, h3, h4, h5, h6 {
268
+ color: #2C3E50 !important;
269
+ font-weight: 700 !important;
270
+ }
271
+
272
+ /* Ensure all paragraphs and text have good contrast */
273
+ p, li, span, div {
274
+ color: #37474F !important;
275
+ }
276
+
277
+ /* Universal text color fix for all content */
278
+ strong, b {
279
+ color: #2C3E50 !important;
280
+ font-weight: 700 !important;
281
+ }
282
+
283
+ /* Ensure all text in Gradio blocks has proper contrast */
284
+ .gradio-block p, .gradio-block li, .gradio-block span,
285
+ .gradio-block div, .gradio-block strong, .gradio-block b {
286
+ color: #37474F !important;
287
+ }
288
+
289
+ .gradio-block h1, .gradio-block h2, .gradio-block h3,
290
+ .gradio-block h4, .gradio-block h5, .gradio-block h6,
291
+ .gradio-block strong, .gradio-block b {
292
+ color: #2C3E50 !important;
293
+ font-weight: 700 !important;
294
+ }
295
+
296
+ /* Label styling */
297
+ label {
298
+ color: #2C3E50 !important;
299
+ font-weight: 600 !important;
300
+ }
301
+
302
+ .main-header {
303
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
304
+ color: white;
305
+ padding: 3rem 2rem;
306
+ border-radius: 20px;
307
+ text-align: center;
308
+ margin-bottom: 2rem;
309
+ box-shadow: 0 15px 40px rgba(102, 126, 234, 0.3);
310
+ position: relative;
311
+ overflow: hidden;
312
+ }
313
+
314
+ .main-header::before {
315
+ content: '';
316
+ position: absolute;
317
+ top: -50%;
318
+ right: -50%;
319
+ bottom: -50%;
320
+ left: -50%;
321
+ background: linear-gradient(45deg, transparent, rgba(255,255,255,0.1), transparent);
322
+ transform: rotate(45deg);
323
+ animation: shine 3s infinite;
324
+ }
325
+
326
+ @keyframes shine {
327
+ 0% { transform: translateX(-100%) rotate(45deg); }
328
+ 100% { transform: translateX(100%) rotate(45deg); }
329
+ }
330
+
331
+ .dna-icon {
332
+ font-size: 4rem;
333
+ margin-bottom: 1rem;
334
+ animation: pulse 2s ease-in-out infinite;
335
+ display: inline-block;
336
+ filter: drop-shadow(0 4px 6px rgba(0,0,0,0.2));
337
+ }
338
+
339
+ @keyframes pulse {
340
+ 0%, 100% { transform: scale(1); }
341
+ 50% { transform: scale(1.08); }
342
+ }
343
+
344
+ .input-column {
345
+ background: white;
346
+ padding: 2.5rem;
347
+ border-radius: 20px;
348
+ box-shadow: 0 8px 30px rgba(0,0,0,0.1);
349
+ margin: 0.5rem;
350
+ border: 1px solid rgba(102, 126, 234, 0.1);
351
+ transition: transform 0.3s ease, box-shadow 0.3s ease;
352
+ }
353
+
354
+ .input-column:hover {
355
+ transform: translateY(-2px);
356
+ box-shadow: 0 12px 40px rgba(0,0,0,0.15);
357
+ }
358
+
359
+ /* Ensure input column text has good contrast */
360
+ .input-column h1, .input-column h2, .input-column h3,
361
+ .input-column h4, .input-column h5, .input-column h6 {
362
+ color: #2C3E50 !important;
363
+ font-weight: 700 !important;
364
+ }
365
+
366
+ .input-column p, .input-column li, .input-column span,
367
+ .input-column div, .input-column strong, .input-column b,
368
+ .input-column code, .input-column pre {
369
+ color: #37474F !important;
370
+ }
371
+
372
+ /* Ensure markdown content in input column has proper colors */
373
+ .input-column .markdown, .input-column .markdown *,
374
+ .input-column [class*="markdown"], .input-column [class*="markdown"] * {
375
+ color: #37474F !important;
376
+ }
377
+
378
+ .input-column .markdown h1, .input-column .markdown h2, .input-column .markdown h3,
379
+ .input-column [class*="markdown"] h1, .input-column [class*="markdown"] h2, .input-column [class*="markdown"] h3 {
380
+ color: #2C3E50 !important;
381
+ font-weight: 700 !important;
382
+ }
383
+
384
+ .input-column .markdown strong, .input-column .markdown b,
385
+ .input-column [class*="markdown"] strong, .input-column [class*="markdown"] b {
386
+ color: #2C3E50 !important;
387
+ font-weight: 700 !important;
388
+ }
389
+
390
+ .result-column {
391
+ background: linear-gradient(135deg, #ffffff 0%, #f8f9fa 100%);
392
+ padding: 2.5rem;
393
+ border-radius: 20px;
394
+ box-shadow: 0 8px 30px rgba(0,0,0,0.1);
395
+ margin: 0.5rem;
396
+ border: 1px solid rgba(102, 126, 234, 0.1);
397
+ min-height: 500px;
398
+ }
399
+
400
+ /* Ensure text readability in result column */
401
+ .result-column h1, .result-column h2, .result-column h3,
402
+ .result-column h4, .result-column h5, .result-column h6 {
403
+ color: #2C3E50 !important;
404
+ font-weight: 700 !important;
405
+ }
406
+
407
+ .result-column p, .result-column li, .result-column span,
408
+ .result-column div, .result-column markdown {
409
+ color: #37474F !important;
410
+ }
411
+
412
+ /* Markdown content styling - comprehensive */
413
+ .markdown, .markdown *, [class*="markdown"], [class*="prose"] {
414
+ color: #37474F !important;
415
+ }
416
+
417
+ .markdown h1, .markdown h2, .markdown h3,
418
+ .markdown h4, .markdown h5, .markdown h6,
419
+ [class*="markdown"] h1, [class*="markdown"] h2, [class*="markdown"] h3,
420
+ [class*="markdown"] h4, [class*="markdown"] h5, [class*="markdown"] h6 {
421
+ color: #2C3E50 !important;
422
+ font-weight: 700 !important;
423
+ }
424
+
425
+ .markdown p, .markdown li, .markdown span,
426
+ .markdown div, .markdown code, .markdown pre,
427
+ .markdown strong, .markdown b,
428
+ [class*="markdown"] p, [class*="markdown"] li, [class*="markdown"] span,
429
+ [class*="markdown"] div, [class*="markdown"] strong, [class*="markdown"] b {
430
+ color: #37474F !important;
431
+ }
432
+
433
+ .markdown code, [class*="markdown"] code {
434
+ background: #f5f7fa !important;
435
+ color: #2C3E50 !important;
436
+ padding: 2px 6px !important;
437
+ border-radius: 4px !important;
438
+ }
439
+
440
+ /* Target all Gradio markdown blocks */
441
+ .gradio-markdown, .gradio-markdown *,
442
+ div[class*="markdown"], div[class*="prose"] {
443
+ color: #37474F !important;
444
+ }
445
+
446
+ div[class*="markdown"] h1, div[class*="markdown"] h2, div[class*="markdown"] h3,
447
+ div[class*="markdown"] h4, div[class*="markdown"] h5, div[class*="markdown"] h6 {
448
+ color: #2C3E50 !important;
449
+ font-weight: 700 !important;
450
+ }
451
+
452
+ div[class*="markdown"] strong, div[class*="markdown"] b,
453
+ div[class*="markdown"] p, div[class*="markdown"] li,
454
+ div[class*="markdown"] span, div[class*="markdown"] div {
455
+ color: #37474F !important;
456
+ }
457
+
458
+ .footer-section {
459
+ background: linear-gradient(135deg, #ffffff 0%, #f8f9fa 100%);
460
+ padding: 2.5rem;
461
+ border-radius: 20px;
462
+ margin-top: 2rem;
463
+ border: 2px solid #dee2e6;
464
+ box-shadow: 0 5px 20px rgba(0,0,0,0.08);
465
+ }
466
+
467
+ /* Ensure footer text has good contrast */
468
+ .footer-section h1, .footer-section h2, .footer-section h3,
469
+ .footer-section h4, .footer-section h5, .footer-section h6 {
470
+ color: #2C3E50 !important;
471
+ font-weight: 700 !important;
472
+ }
473
+
474
+ .footer-section p, .footer-section li, .footer-section span,
475
+ .footer-section div, .footer-section a, .footer-section code,
476
+ .footer-section strong, .footer-section b {
477
+ color: #37474F !important;
478
+ }
479
+
480
+ /* Ensure markdown content in footer has proper colors */
481
+ .footer-section .markdown, .footer-section .markdown *,
482
+ .footer-section [class*="markdown"], .footer-section [class*="markdown"] * {
483
+ color: #37474F !important;
484
+ }
485
+
486
+ .footer-section .markdown h1, .footer-section .markdown h2, .footer-section .markdown h3,
487
+ .footer-section [class*="markdown"] h1, .footer-section [class*="markdown"] h2, .footer-section [class*="markdown"] h3 {
488
+ color: #2C3E50 !important;
489
+ font-weight: 700 !important;
490
+ }
491
+
492
+ .footer-section .markdown strong, .footer-section .markdown b,
493
+ .footer-section [class*="markdown"] strong, .footer-section [class*="markdown"] b {
494
+ color: #2C3E50 !important;
495
+ font-weight: 700 !important;
496
+ }
497
+
498
+ .footer-section a {
499
+ color: #667eea !important;
500
+ text-decoration: none !important;
501
+ font-weight: 600 !important;
502
+ }
503
+
504
+ .footer-section a:hover {
505
+ color: #764ba2 !important;
506
+ text-decoration: underline !important;
507
+ }
508
+
509
+ .footer-section code {
510
+ background: #f5f7fa !important;
511
+ color: #2C3E50 !important;
512
+ padding: 2px 6px !important;
513
+ border-radius: 4px !important;
514
+ border: 1px solid #e0e0e0 !important;
515
+ }
516
+
517
+ .gradio-container {
518
+ max-width: 1400px !important;
519
+ margin: 0 auto !important;
520
+ padding: 2rem 1rem !important;
521
+ }
522
+
523
+ .analyze-btn {
524
+ background: linear-gradient(135deg, #667eea 0%, #764ba2 100%) !important;
525
+ border: none !important;
526
+ border-radius: 30px !important;
527
+ padding: 18px 40px !important;
528
+ font-size: 17px !important;
529
+ font-weight: 600 !important;
530
+ color: white !important;
531
+ box-shadow: 0 6px 20px rgba(102, 126, 234, 0.4) !important;
532
+ transition: all 0.3s cubic-bezier(0.4, 0, 0.2, 1) !important;
533
+ text-transform: uppercase;
534
+ letter-spacing: 0.5px;
535
+ }
536
+
537
+ .analyze-btn:hover {
538
+ transform: translateY(-3px) scale(1.02) !important;
539
+ box-shadow: 0 12px 35px rgba(102, 126, 234, 0.6) !important;
540
+ }
541
+
542
+ .analyze-btn:active {
543
+ transform: translateY(-1px) scale(0.98) !important;
544
+ }
545
+
546
+ /* Enhanced textbox styling */
547
+ textarea {
548
+ border: 2px solid #e9ecef !important;
549
+ border-radius: 12px !important;
550
+ font-family: 'Courier New', monospace !important;
551
+ font-size: 14px !important;
552
+ line-height: 1.6 !important;
553
+ transition: border-color 0.3s ease !important;
554
+ background-color: #ffffff !important;
555
+ color: #2C3E50 !important;
556
+ padding: 14px !important;
557
+ }
558
+
559
+ textarea::placeholder {
560
+ color: #90A4AE !important;
561
+ opacity: 0.8 !important;
562
+ }
563
+
564
+ textarea:focus {
565
+ border-color: #667eea !important;
566
+ box-shadow: 0 0 0 3px rgba(102, 126, 234, 0.1) !important;
567
+ background-color: #ffffff !important;
568
+ outline: none !important;
569
+ }
570
+
571
+ /* Info cards */
572
+ .info-card {
573
+ background: white;
574
+ padding: 1.5rem;
575
+ border-radius: 15px;
576
+ box-shadow: 0 4px 15px rgba(0,0,0,0.08);
577
+ margin: 1rem 0;
578
+ border-left: 4px solid #667eea;
579
+ transition: transform 0.2s ease;
580
+ }
581
+
582
+ .info-card:hover {
583
+ transform: translateX(5px);
584
+ }
585
+
586
+ /* Examples styling */
587
+ #examples {
588
+ border-radius: 15px;
589
+ overflow: hidden;
590
+ margin-top: 1.5rem;
591
+ }
592
+
593
+ /* Enhanced examples button styling */
594
+ .example-btn {
595
+ background: #f8f9fa !important;
596
+ border: 2px solid #e0e0e0 !important;
597
+ color: #2C3E50 !important;
598
+ border-radius: 8px !important;
599
+ padding: 12px 20px !important;
600
+ transition: all 0.3s ease !important;
601
+ }
602
+
603
+ .example-btn:hover {
604
+ background: #667eea !important;
605
+ border-color: #667eea !important;
606
+ color: white !important;
607
+ transform: translateY(-2px) !important;
608
+ box-shadow: 0 4px 12px rgba(102, 126, 234, 0.3) !important;
609
+ }
610
+
611
+ /* Better spacing and visual hierarchy */
612
+ .gradio-block {
613
+ margin-bottom: 1.5rem;
614
+ }
615
+
616
+ /* Improved scrollbar styling */
617
+ ::-webkit-scrollbar {
618
+ width: 10px;
619
+ height: 10px;
620
+ }
621
+
622
+ ::-webkit-scrollbar-track {
623
+ background: #f1f1f1;
624
+ border-radius: 10px;
625
+ }
626
+
627
+ ::-webkit-scrollbar-thumb {
628
+ background: #667eea;
629
+ border-radius: 10px;
630
+ }
631
+
632
+ ::-webkit-scrollbar-thumb:hover {
633
+ background: #764ba2;
634
+ }
635
+ """
636
+
637
+ with gr.Blocks(
638
+ title="ChimeraLM - Chimeric Read Detector",
639
+ theme=gr.themes.Default(
640
+ primary_hue="blue",
641
+ secondary_hue="gray",
642
+ neutral_hue="slate",
643
+ ),
644
+ css=custom_css,
645
+ ) as interface:
646
+ # Header Section
647
+ with gr.Row():
648
+ gr.HTML("""
649
+ <div class="main-header">
650
+ <div class="dna-icon">🧬</div>
651
+ <h1 style="margin: 0; font-size: 3rem; font-weight: 700; position: relative; z-index: 1;">ChimeraLM</h1>
652
+ <p style="margin: 0.5rem 0 0 0; font-size: 1.3rem; opacity: 0.95; font-weight: 500; position: relative; z-index: 1;">
653
+ Advanced Chimeric Read Detection using Deep Learning
654
+ </p>
655
+ <p style="margin: 1rem 0 0 0; font-size: 1.05rem; opacity: 0.85; position: relative; z-index: 1;">
656
+ Identify chimeric artifacts from whole genome amplification with state-of-the-art accuracy
657
+ </p>
658
+ <div style="margin-top: 1.5rem; position: relative; z-index: 1;">
659
+ <span style="display: inline-block; background: rgba(255,255,255,0.2); padding: 0.5rem 1rem; border-radius: 20px; margin: 0.25rem; font-size: 0.9rem;">
660
+ ⚑ High Performance
661
+ </span>
662
+ <span style="display: inline-block; background: rgba(255,255,255,0.2); padding: 0.5rem 1rem; border-radius: 20px; margin: 0.25rem; font-size: 0.9rem;">
663
+ 🎯 98% Accuracy
664
+ </span>
665
+ <span style="display: inline-block; background: rgba(255,255,255,0.2); padding: 0.5rem 1rem; border-radius: 20px; margin: 0.25rem; font-size: 0.9rem;">
666
+ πŸš€ Pre-trained
667
+ </span>
668
+ </div>
669
+ </div>
670
+ """)
671
+
672
+ # Main Content
673
+ with gr.Row():
674
+ with gr.Column(scale=1, elem_classes="input-column"):
675
+ # Input Section
676
+ gr.Markdown("""
677
+ ## πŸ“ DNA Sequence Input
678
+
679
+ **Quick Start Guide:**
680
+ 1. 🧬 Enter your DNA sequence (supports up to 32,768 bp)
681
+ 2. βœ… Use standard nucleotides: **A**, **C**, **G**, **T**, **N**
682
+ 3. πŸ”¬ Click "Analyze Sequence" for instant results
683
+ 4. πŸ“Š View confidence scores and visualization below
684
+
685
+ **What is Chimeric DNA?**
686
+ Chimeric reads are artificial DNA sequences created during whole genome amplification (WGA),
687
+ where fragments from different genomic locations are incorrectly joined together.
688
+ """)
689
+
690
+ sequence_input = gr.Textbox(
691
+ label="🧬 DNA Sequence",
692
+ placeholder="Enter your DNA sequence here...\nExample: ACGTACGTACGTACGT...",
693
+ lines=8,
694
+ max_lines=15,
695
+ show_label=True,
696
+ container=True,
697
+ scale=2,
698
+ )
699
+
700
+ with gr.Row():
701
+ predict_btn = gr.Button(
702
+ "πŸ”¬ Analyze Sequence", variant="primary", size="lg", elem_classes=["analyze-btn"]
703
+ )
704
+
705
+ gr.Examples(
706
+ examples=examples, inputs=[sequence_input], label="πŸ“š Example Sequences", elem_id="examples"
707
+ )
708
+
709
+ with gr.Column(scale=1, elem_classes="result-column"):
710
+ # Results Section
711
+
712
+ gr.Markdown("## πŸ“Š Analysis Results")
713
+
714
+ result_output = gr.Markdown(
715
+ value="✨ Enter a sequence and click 'Analyze Sequence' to see detailed results and visualizations.",
716
+ elem_id="results",
717
+ )
718
+
719
+ # Enhanced plot component
720
+ plot_output = gr.Plot(label="πŸ“ˆ Probability Distribution", value=None, elem_id="probability-plot")
721
+
722
+ # Footer Section
723
+ with gr.Row():
724
+ gr.Markdown(
725
+ """
726
+ ## πŸš€ About ChimeraLM
727
+
728
+ **Advanced Features:**
729
+ - ⚑ **High Performance**: Optimized for speed and accuracy
730
+ - 🎯 **Binary Classification**: Distinguishes biological vs chimeric sequences
731
+ - πŸ“ **Long Sequences**: Handles up to 32,768 nucleotides
732
+ - πŸ€– **Pre-trained Model**: Ready-to-use with `yangliz5/chimeralm`
733
+
734
+ **Technical Specifications:**
735
+ - **Model Type**: Binary Sequence Classifier
736
+ - **Input**: DNA sequences with standard nucleotides
737
+ - **Output**: Classification + confidence scores
738
+ - **Training**: Whole genome amplification artifact detection
739
+
740
+ ---
741
+
742
+ **πŸ“– Citation:**
743
+ ```
744
+ @software{chimeralm2025,
745
+ title={ChimeraLM: A genomic language model to identify chimera artifacts},
746
+ author={Li, Yangyang, Guo, Qingxiang and Yang, Rendong},
747
+ year={2025},
748
+ url={https://github.com/ylab-hi/ChimeraLM}
749
+ }
750
+ ```
751
+
752
+ **πŸ”— Links:**
753
+ - [GitHub Repository](https://github.com/ylab-hi/ChimeraLM)
754
+ - [Model Hub](https://huggingface.co/yangliz5/chimeralm)
755
+ - [Documentation](https://github.com/ylab-hi/ChimeraLM#readme)
756
+ """,
757
+ elem_classes="footer-section",
758
+ )
759
+
760
+ # Connect the button click
761
+ predict_btn.click(fn=predict_sequence, inputs=[sequence_input], outputs=[result_output, plot_output])
762
+
763
+ return interface
764
+
765
+
766
+ if __name__ == "__main__":
767
+ logger.info("πŸš€ Starting ChimeraLM Web Interface...")
768
+ interface = create_interface()
769
+ interface.launch(share=False)
logo-pixel.svg ADDED
logo.png ADDED
requirements.txt ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Core dependencies for ChimeraLM web UI
2
+ chimeralm>=1.0.4
3
+
4
+ # Deep Learning
5
+ torch==2.5.1
6
+ torchvision>=0.20.1
7
+ torchaudio>=2.5.0
8
+ lightning>=2.4.0
9
+ torchmetrics>=1.6.0
10
+
11
+ # ML/NLP
12
+ transformers>=4.47.1
13
+ datasets>=3.2.0
14
+ einops>=0.8.0
15
+ evaluate>=0.4.3
16
+
17
+ # Bioinformatics
18
+ pysam>=0.22.1
19
+ pyfastx>=2.2.0
20
+
21
+ # Web UI
22
+ gradio>=5.0.0
23
+ plotly>=5.24.0
24
+
25
+ # Configuration
26
+ hydra-core>=1.3.2
27
+ omegaconf>=2.3.0
28
+
29
+ # Utilities
30
+ rich>=13.9.4
31
+ typer>=0.15.1
32
+ joblib>=1.5.2
33
+ hf-xet>=1.1.10