TANTCHEU Noussi Cédric commited on
Commit
7603b2e
·
1 Parent(s): 30d0e41

Initial space upload: Interactive PhytoAI Assistant

Browse files
Files changed (3) hide show
  1. README.md +146 -14
  2. app.py +220 -0
  3. requirements.txt +4 -3
README.md CHANGED
@@ -1,20 +1,152 @@
1
  ---
2
- title: Phytoai Assistant
3
- emoji: 🚀
4
- colorFrom: red
5
- colorTo: red
6
- sdk: docker
7
- app_port: 8501
8
- tags:
9
- - streamlit
10
  pinned: false
11
- short_description: Outil IA interactif pour la recherche en phytothérapie
12
- license: mit
 
 
 
 
 
 
13
  ---
14
 
15
- # Welcome to Streamlit!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
- Edit `/src/streamlit_app.py` to customize this app to your heart's desire. :heart:
18
 
19
- If you have any questions, checkout our [documentation](https://docs.streamlit.io) and [community
20
- forums](https://discuss.streamlit.io).
 
1
  ---
2
+ title: PhytoAI Assistant
3
+ emoji: 🌿
4
+ colorFrom: green
5
+ colorTo: blue
6
+ sdk: streamlit
7
+ sdk_version: 1.28.0
8
+ app_file: app.py
 
9
  pinned: false
10
+ license: cc-by-4.0
11
+ tags:
12
+ - phytotherapy
13
+ - natural-compounds
14
+ - bioactivity
15
+ - drug-discovery
16
+ - ai-assistant
17
+ - research-tool
18
  ---
19
 
20
+ # PhytoAI Assistant 🌿
21
+
22
+ **Interactive AI Assistant for Phytotherapy Research** - Explore natural compounds and their bioactivities using cutting-edge AI technology.
23
+
24
+ ## 🎯 Overview
25
+
26
+ The PhytoAI Assistant is an interactive web application built with Streamlit that provides researchers, students, and pharmaceutical professionals with easy access to a comprehensive database of natural compounds and their documented bioactivities. This tool leverages the [PhytoAI MEGA Dataset](https://huggingface.co/datasets/Gatescrispy/phytoai-mega-dataset) containing 352 unique natural compounds and 1,314 bioactivities.
27
+
28
+ ## ✨ Key Features
29
+
30
+ ### 🔍 Advanced Search Capabilities
31
+ - **Compound Name Search**: Find specific natural compounds by name (e.g., curcumin, resveratrol, quercetin)
32
+ - **Therapeutic Activity Search**: Discover compounds by their therapeutic properties:
33
+ - Anti-inflammatory
34
+ - Antioxidant
35
+ - Cardiovascular protective
36
+ - Neuroprotective
37
+ - Anti-cancer
38
+ - Antimicrobial
39
+
40
+ ### 📊 Interactive Data Visualization
41
+ - **Real-time Statistics**: Live metrics on compound count, bioactivities, and therapeutic coverage
42
+ - **Distribution Charts**: Visual analysis of therapeutic activities using Plotly
43
+ - **Pie Charts**: Therapeutic area distribution for quick insights
44
+ - **Bar Charts**: Activity type frequency analysis
45
+
46
+ ### 🧬 Comprehensive Compound Information
47
+ - **Molecular Properties**: Chemical formulas, SMILES notation, molecular weights
48
+ - **Database Cross-references**: PubChem CID mappings for further research
49
+ - **Bioactivity Profiles**: Detailed activity descriptions with experimental context
50
+ - **Literature References**: Links to original research and validation studies
51
+
52
+ ## 🚀 How to Use
53
+
54
+ 1. **Launch the Application**: The interface loads automatically with dataset statistics
55
+ 2. **Search Compounds**: Use the sidebar to search by compound name or therapeutic activity
56
+ 3. **Explore Results**: Click on compound cards to see detailed molecular and bioactivity information
57
+ 4. **Analyze Data**: Review interactive charts to understand therapeutic distribution patterns
58
+ 5. **Cross-reference**: Use PubChem CIDs for additional research in external databases
59
+
60
+ ## 📈 Dataset Integration
61
+
62
+ This application seamlessly integrates with the **PhytoAI MEGA Dataset** through Hugging Face's `hf_hub_download` functionality, ensuring:
63
+
64
+ - **Always Up-to-date**: Automatic synchronization with the latest dataset version
65
+ - **Efficient Loading**: Cached data loading for optimal performance
66
+ - **Reliable Access**: Robust error handling and fallback mechanisms
67
+
68
+ ## 🔬 Research Applications
69
+
70
+ ### Academic Research
71
+ - **Drug Discovery**: Identify promising natural compounds for pharmaceutical development
72
+ - **Ethnopharmacology**: Validate traditional medicine uses with modern bioactivity data
73
+ - **Chemical Biology**: Explore structure-activity relationships in natural products
74
+
75
+ ### Pharmaceutical Industry
76
+ - **Lead Compound Identification**: Screen natural products for specific therapeutic targets
77
+ - **Bioactivity Prediction**: Use existing data to guide synthetic chemistry efforts
78
+ - **Competitive Intelligence**: Monitor natural product research trends and opportunities
79
+
80
+ ### Educational Use
81
+ - **Teaching Tool**: Interactive exploration of phytochemistry and pharmacology concepts
82
+ - **Student Projects**: Real-world dataset for bioinformatics and cheminformatics training
83
+ - **Research Training**: Hands-on experience with pharmaceutical data analysis
84
+
85
+ ## 🛠️ Technical Architecture
86
+
87
+ ### Frontend
88
+ - **Streamlit**: Modern, responsive web interface
89
+ - **Plotly**: Interactive data visualizations
90
+ - **Pandas**: Efficient data manipulation and analysis
91
+
92
+ ### Backend
93
+ - **Hugging Face Hub**: Dataset storage and version control
94
+ - **JSON/CSV**: Structured data formats optimized for research
95
+ - **Caching**: Optimized performance with Streamlit's caching system
96
+
97
+ ### Data Pipeline
98
+ ```Hugging Face Dataset → hf_hub_download → JSON Loading →
99
+ Pandas Processing → Streamlit Interface → Interactive Visualizations
100
+ ```
101
+
102
+ ## 📊 Dataset Statistics
103
+
104
+ - **🧬 Compounds**: 352 unique natural products
105
+ - **🔬 Bioactivities**: 1,314 documented activities
106
+ - **🎯 Therapeutic Areas**: 6+ major categories
107
+ - **📚 Sources**: PubChem, ChEMBL, peer-reviewed literature
108
+ - **🔄 Updates**: Regularly maintained and expanded
109
+
110
+ ## 🌐 Related Resources
111
+
112
+ - **📊 Dataset**: [PhytoAI MEGA Dataset](https://huggingface.co/datasets/Gatescrispy/phytoai-mega-dataset)
113
+ - **🤖 Models**: [PhytoAI Discovery Models](https://huggingface.co/Gatescrispy/phytoai-discovery-models)
114
+ - **📖 Documentation**: Comprehensive API and usage documentation
115
+ - **💬 Community**: Research collaboration and support forum
116
+
117
+ ## 🏆 Impact & Recognition
118
+
119
+ This tool has been designed to bridge the gap between traditional phytotherapy knowledge and modern AI-driven drug discovery, providing researchers worldwide with:
120
+
121
+ - **Accessible Data**: User-friendly interface for complex bioactivity data
122
+ - **Research Acceleration**: Rapid compound screening and hypothesis generation
123
+ - **Global Collaboration**: Shared platform for international research initiatives
124
+ - **Educational Value**: Training resource for next-generation researchers
125
+
126
+ ## 📄 Citation & License
127
+
128
+ **License**: CC BY 4.0 - Free for academic and commercial use with attribution
129
+
130
+ **Citation**: If you use PhytoAI Assistant in your research, please cite:
131
+ ```bibtex
132
+ @software{phytoai_assistant_2025,
133
+ title={PhytoAI Assistant: Interactive AI Tool for Phytotherapy Research},
134
+ author={Tantcheu, Cedric},
135
+ year={2025},
136
+ url={https://huggingface.co/spaces/Gatescrispy/phytoai-assistant},
137
+ note={Interactive Streamlit application for natural compound bioactivity exploration}
138
+ }
139
+ ```
140
+
141
+ ## 👨‍💻 About the Developer
142
+
143
+ **Cedric Tantcheu** - AI & Phytotherapy Research Specialist
144
+ - 🎓 Expertise in cheminformatics, machine learning, and natural product research
145
+ - 🔬 Focus on AI-driven drug discovery from traditional medicine
146
+ - 🌍 Committed to open science and global health solutions
147
+
148
+ ---
149
 
150
+ **🌿 Advancing phytotherapy research through intelligent technology**
151
 
152
+ *Built with ❤️ for the global research community*
 
app.py ADDED
@@ -0,0 +1,220 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import streamlit as st
2
+ import json
3
+ import pandas as pd
4
+ from huggingface_hub import hf_hub_download
5
+ import plotly.express as px
6
+
7
+ st.set_page_config(
8
+ page_title="PhytoAI Assistant",
9
+ page_icon="🌿",
10
+ layout="wide"
11
+ )
12
+
13
+ @st.cache_data
14
+ def load_phytoai_data():
15
+ """Load PhytoAI data from HF dataset"""
16
+ try:
17
+ dataset_path = hf_hub_download(
18
+ repo_id="Gatescrispy/phytoai-mega-dataset",
19
+ filename="mega_final_dataset.json",
20
+ repo_type="dataset"
21
+ )
22
+ with open(dataset_path, 'r') as f:
23
+ return json.load(f)
24
+ except Exception as e:
25
+ st.error(f"Data loading error: {e}")
26
+ return None
27
+
28
+ def main():
29
+ st.title("🌿 PhytoAI Assistant")
30
+ st.markdown("### AI Assistant for Phytotherapy Research")
31
+ st.markdown("---")
32
+
33
+ # Load data
34
+ with st.spinner("Loading PhytoAI data..."):
35
+ data = load_phytoai_data()
36
+
37
+ if data is None:
38
+ st.error("❌ Unable to load PhytoAI data")
39
+ st.info("The dataset will be available once uploaded to Hugging Face")
40
+
41
+ # Demo data
42
+ st.subheader("📊 PhytoAI Dataset Preview")
43
+ st.write("**Dataset content:**")
44
+ st.write("• 352 unique natural compounds")
45
+ st.write("• 1,314 documented bioactivities")
46
+ st.write("• Sources: PubChem, ChEMBL, scientific literature")
47
+
48
+ return
49
+
50
+ # Search interface
51
+ st.sidebar.header("🔍 Compound Search")
52
+
53
+ search_type = st.sidebar.selectbox(
54
+ "Search type:",
55
+ ["Compound name", "Therapeutic activity"]
56
+ )
57
+
58
+ if search_type == "Compound name":
59
+ compound_search = st.sidebar.text_input(
60
+ "Compound name",
61
+ placeholder="curcumin, resveratrol, quercetin..."
62
+ )
63
+
64
+ if compound_search:
65
+ search_compounds_by_name(data, compound_search)
66
+
67
+ elif search_type == "Therapeutic activity":
68
+ activity_search = st.sidebar.selectbox(
69
+ "Select an activity:",
70
+ ["", "anti-inflammatory", "antioxidant", "cardiovascular",
71
+ "neuroprotective", "anti-cancer", "antimicrobial"]
72
+ )
73
+
74
+ if activity_search:
75
+ search_by_therapeutic_activity(data, activity_search)
76
+
77
+ # Main statistics
78
+ display_main_statistics(data)
79
+
80
+ # Visualizations
81
+ create_visualizations(data)
82
+
83
+ # Footer
84
+ st.markdown("---")
85
+ st.markdown("**🌿 PhytoAI** - AI Assistant for Phytotherapy Research")
86
+ st.markdown("📊 [PhytoAI Dataset](https://huggingface.co/datasets/Gatescrispy/phytoai-mega-dataset) | 🔬 Research & Development")
87
+
88
+ def search_compounds_by_name(data, search_term):
89
+ """Search by compound name"""
90
+ st.subheader(f"🔍 Results for '{search_term}'")
91
+
92
+ results = []
93
+ for compound_id, compound_data in data.items():
94
+ compound_name = compound_data.get('compound_name', '').lower()
95
+ if search_term.lower() in compound_name:
96
+ results.append((compound_id, compound_data))
97
+
98
+ if results:
99
+ for compound_id, compound_data in results[:5]:
100
+ with st.expander(f"🧬 {compound_data.get('compound_name', 'Unknown compound')}"):
101
+ col1, col2 = st.columns(2)
102
+
103
+ with col1:
104
+ st.write("**Molecular Properties:**")
105
+ st.write(f"• Formula: `{compound_data.get('molecular_formula', 'N/A')}`")
106
+ st.write(f"• SMILES: `{compound_data.get('smiles', 'N/A')}`")
107
+ st.write(f"• PubChem CID: `{compound_data.get('pubchem_cid', 'N/A')}`")
108
+
109
+ with col2:
110
+ st.write("**Bioactivities:**")
111
+ bioactivities = compound_data.get('bioactivities', [])
112
+ for i, activity in enumerate(bioactivities[:5]):
113
+ st.write(f"• {activity.get('activity_type', 'N/A')}")
114
+ if i >= 4 and len(bioactivities) > 5:
115
+ st.write(f"... and {len(bioactivities) - 5} others")
116
+ break
117
+ else:
118
+ st.info("No compounds found for this search")
119
+
120
+ def search_by_therapeutic_activity(data, activity_type):
121
+ """Search by therapeutic activity"""
122
+ st.subheader(f"🎯 Compounds with activity: {activity_type}")
123
+
124
+ matching_compounds = []
125
+ for compound_id, compound_data in data.items():
126
+ bioactivities = compound_data.get('bioactivities', [])
127
+ for activity in bioactivities:
128
+ if activity_type.lower() in activity.get('activity_type', '').lower():
129
+ matching_compounds.append({
130
+ 'Compound': compound_data.get('compound_name', 'N/A'),
131
+ 'Formula': compound_data.get('molecular_formula', 'N/A'),
132
+ 'Activity': activity.get('activity_type', 'N/A'),
133
+ 'CID': compound_data.get('pubchem_cid', 'N/A')
134
+ })
135
+ break
136
+
137
+ if matching_compounds:
138
+ df = pd.DataFrame(matching_compounds)
139
+ st.dataframe(df, use_container_width=True)
140
+ st.info(f"📊 {len(matching_compounds)} compounds found with this activity")
141
+ else:
142
+ st.warning("No compounds found for this activity")
143
+
144
+ def display_main_statistics(data):
145
+ """Display main statistics"""
146
+ st.header("📈 PhytoAI Dataset Statistics")
147
+
148
+ col1, col2, col3, col4 = st.columns(4)
149
+
150
+ with col1:
151
+ st.metric("🧬 Total compounds", len(data))
152
+
153
+ with col2:
154
+ total_bioactivities = sum(len(comp.get('bioactivities', [])) for comp in data.values())
155
+ st.metric("🔬 Total bioactivities", f"{total_bioactivities:,}")
156
+
157
+ with col3:
158
+ therapeutic_areas = set()
159
+ for compound_data in data.values():
160
+ for activity in compound_data.get('bioactivities', []):
161
+ activity_type = activity.get('activity_type', '').lower()
162
+ if any(term in activity_type for term in ['anti-inflammatory', 'antioxidant', 'cardiovascular', 'neuroprotective', 'anti-cancer', 'antimicrobial']):
163
+ therapeutic_areas.add(activity_type.split()[0] if activity_type else 'unknown')
164
+ st.metric("🎯 Therapeutic areas", len(therapeutic_areas))
165
+
166
+ with col4:
167
+ compounds_with_pubchem = sum(1 for comp in data.values() if comp.get('pubchem_cid'))
168
+ coverage = (compounds_with_pubchem / len(data)) * 100
169
+ st.metric("📊 PubChem coverage", f"{coverage:.1f}%")
170
+
171
+ def create_visualizations(data):
172
+ """Create interactive visualizations"""
173
+ st.header("📊 Interactive Visualizations")
174
+
175
+ # Therapeutic activity analysis
176
+ activity_counts = {}
177
+ for compound_data in data.values():
178
+ for activity in compound_data.get('bioactivities', []):
179
+ activity_type = activity.get('activity_type', '').lower()
180
+ # Categorize activities
181
+ if 'anti-inflammatory' in activity_type:
182
+ activity_counts['Anti-inflammatory'] = activity_counts.get('Anti-inflammatory', 0) + 1
183
+ elif 'antioxidant' in activity_type:
184
+ activity_counts['Antioxidant'] = activity_counts.get('Antioxidant', 0) + 1
185
+ elif 'cardiovascular' in activity_type:
186
+ activity_counts['Cardiovascular'] = activity_counts.get('Cardiovascular', 0) + 1
187
+ elif 'neuroprotective' in activity_type:
188
+ activity_counts['Neuroprotective'] = activity_counts.get('Neuroprotective', 0) + 1
189
+ elif 'anti-cancer' in activity_type or 'anticancer' in activity_type:
190
+ activity_counts['Anti-cancer'] = activity_counts.get('Anti-cancer', 0) + 1
191
+ elif 'antimicrobial' in activity_type:
192
+ activity_counts['Antimicrobial'] = activity_counts.get('Antimicrobial', 0) + 1
193
+
194
+ if activity_counts:
195
+ col1, col2 = st.columns(2)
196
+
197
+ with col1:
198
+ # Bar chart
199
+ fig_bar = px.bar(
200
+ x=list(activity_counts.keys()),
201
+ y=list(activity_counts.values()),
202
+ title="Distribution of Therapeutic Activities",
203
+ labels={'x': 'Activity Type', 'y': 'Number of Compounds'},
204
+ color=list(activity_counts.values()),
205
+ color_continuous_scale="Viridis"
206
+ )
207
+ fig_bar.update_layout(showlegend=False)
208
+ st.plotly_chart(fig_bar, use_container_width=True)
209
+
210
+ with col2:
211
+ # Pie chart
212
+ fig_pie = px.pie(
213
+ values=list(activity_counts.values()),
214
+ names=list(activity_counts.keys()),
215
+ title="Therapeutic Areas Distribution"
216
+ )
217
+ st.plotly_chart(fig_pie, use_container_width=True)
218
+
219
+ if __name__ == "__main__":
220
+ main()
requirements.txt CHANGED
@@ -1,3 +1,4 @@
1
- altair
2
- pandas
3
- streamlit
 
 
1
+ streamlit>=1.28.0
2
+ huggingface_hub>=0.16.0
3
+ pandas>=1.5.0
4
+ plotly>=5.0.0