Spaces:
Sleeping
Sleeping
TANTCHEU Noussi Cédric commited on
Commit ·
7603b2e
1
Parent(s): 30d0e41
Initial space upload: Interactive PhytoAI Assistant
Browse files- README.md +146 -14
- app.py +220 -0
- requirements.txt +4 -3
README.md
CHANGED
|
@@ -1,20 +1,152 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
-
emoji:
|
| 4 |
-
colorFrom:
|
| 5 |
-
colorTo:
|
| 6 |
-
sdk:
|
| 7 |
-
|
| 8 |
-
|
| 9 |
-
- streamlit
|
| 10 |
pinned: false
|
| 11 |
-
|
| 12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 13 |
---
|
| 14 |
|
| 15 |
-
#
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 16 |
|
| 17 |
-
|
| 18 |
|
| 19 |
-
|
| 20 |
-
forums](https://discuss.streamlit.io).
|
|
|
|
| 1 |
---
|
| 2 |
+
title: PhytoAI Assistant
|
| 3 |
+
emoji: 🌿
|
| 4 |
+
colorFrom: green
|
| 5 |
+
colorTo: blue
|
| 6 |
+
sdk: streamlit
|
| 7 |
+
sdk_version: 1.28.0
|
| 8 |
+
app_file: app.py
|
|
|
|
| 9 |
pinned: false
|
| 10 |
+
license: cc-by-4.0
|
| 11 |
+
tags:
|
| 12 |
+
- phytotherapy
|
| 13 |
+
- natural-compounds
|
| 14 |
+
- bioactivity
|
| 15 |
+
- drug-discovery
|
| 16 |
+
- ai-assistant
|
| 17 |
+
- research-tool
|
| 18 |
---
|
| 19 |
|
| 20 |
+
# PhytoAI Assistant 🌿
|
| 21 |
+
|
| 22 |
+
**Interactive AI Assistant for Phytotherapy Research** - Explore natural compounds and their bioactivities using cutting-edge AI technology.
|
| 23 |
+
|
| 24 |
+
## 🎯 Overview
|
| 25 |
+
|
| 26 |
+
The PhytoAI Assistant is an interactive web application built with Streamlit that provides researchers, students, and pharmaceutical professionals with easy access to a comprehensive database of natural compounds and their documented bioactivities. This tool leverages the [PhytoAI MEGA Dataset](https://huggingface.co/datasets/Gatescrispy/phytoai-mega-dataset) containing 352 unique natural compounds and 1,314 bioactivities.
|
| 27 |
+
|
| 28 |
+
## ✨ Key Features
|
| 29 |
+
|
| 30 |
+
### 🔍 Advanced Search Capabilities
|
| 31 |
+
- **Compound Name Search**: Find specific natural compounds by name (e.g., curcumin, resveratrol, quercetin)
|
| 32 |
+
- **Therapeutic Activity Search**: Discover compounds by their therapeutic properties:
|
| 33 |
+
- Anti-inflammatory
|
| 34 |
+
- Antioxidant
|
| 35 |
+
- Cardiovascular protective
|
| 36 |
+
- Neuroprotective
|
| 37 |
+
- Anti-cancer
|
| 38 |
+
- Antimicrobial
|
| 39 |
+
|
| 40 |
+
### 📊 Interactive Data Visualization
|
| 41 |
+
- **Real-time Statistics**: Live metrics on compound count, bioactivities, and therapeutic coverage
|
| 42 |
+
- **Distribution Charts**: Visual analysis of therapeutic activities using Plotly
|
| 43 |
+
- **Pie Charts**: Therapeutic area distribution for quick insights
|
| 44 |
+
- **Bar Charts**: Activity type frequency analysis
|
| 45 |
+
|
| 46 |
+
### 🧬 Comprehensive Compound Information
|
| 47 |
+
- **Molecular Properties**: Chemical formulas, SMILES notation, molecular weights
|
| 48 |
+
- **Database Cross-references**: PubChem CID mappings for further research
|
| 49 |
+
- **Bioactivity Profiles**: Detailed activity descriptions with experimental context
|
| 50 |
+
- **Literature References**: Links to original research and validation studies
|
| 51 |
+
|
| 52 |
+
## 🚀 How to Use
|
| 53 |
+
|
| 54 |
+
1. **Launch the Application**: The interface loads automatically with dataset statistics
|
| 55 |
+
2. **Search Compounds**: Use the sidebar to search by compound name or therapeutic activity
|
| 56 |
+
3. **Explore Results**: Click on compound cards to see detailed molecular and bioactivity information
|
| 57 |
+
4. **Analyze Data**: Review interactive charts to understand therapeutic distribution patterns
|
| 58 |
+
5. **Cross-reference**: Use PubChem CIDs for additional research in external databases
|
| 59 |
+
|
| 60 |
+
## 📈 Dataset Integration
|
| 61 |
+
|
| 62 |
+
This application seamlessly integrates with the **PhytoAI MEGA Dataset** through Hugging Face's `hf_hub_download` functionality, ensuring:
|
| 63 |
+
|
| 64 |
+
- **Always Up-to-date**: Automatic synchronization with the latest dataset version
|
| 65 |
+
- **Efficient Loading**: Cached data loading for optimal performance
|
| 66 |
+
- **Reliable Access**: Robust error handling and fallback mechanisms
|
| 67 |
+
|
| 68 |
+
## 🔬 Research Applications
|
| 69 |
+
|
| 70 |
+
### Academic Research
|
| 71 |
+
- **Drug Discovery**: Identify promising natural compounds for pharmaceutical development
|
| 72 |
+
- **Ethnopharmacology**: Validate traditional medicine uses with modern bioactivity data
|
| 73 |
+
- **Chemical Biology**: Explore structure-activity relationships in natural products
|
| 74 |
+
|
| 75 |
+
### Pharmaceutical Industry
|
| 76 |
+
- **Lead Compound Identification**: Screen natural products for specific therapeutic targets
|
| 77 |
+
- **Bioactivity Prediction**: Use existing data to guide synthetic chemistry efforts
|
| 78 |
+
- **Competitive Intelligence**: Monitor natural product research trends and opportunities
|
| 79 |
+
|
| 80 |
+
### Educational Use
|
| 81 |
+
- **Teaching Tool**: Interactive exploration of phytochemistry and pharmacology concepts
|
| 82 |
+
- **Student Projects**: Real-world dataset for bioinformatics and cheminformatics training
|
| 83 |
+
- **Research Training**: Hands-on experience with pharmaceutical data analysis
|
| 84 |
+
|
| 85 |
+
## 🛠️ Technical Architecture
|
| 86 |
+
|
| 87 |
+
### Frontend
|
| 88 |
+
- **Streamlit**: Modern, responsive web interface
|
| 89 |
+
- **Plotly**: Interactive data visualizations
|
| 90 |
+
- **Pandas**: Efficient data manipulation and analysis
|
| 91 |
+
|
| 92 |
+
### Backend
|
| 93 |
+
- **Hugging Face Hub**: Dataset storage and version control
|
| 94 |
+
- **JSON/CSV**: Structured data formats optimized for research
|
| 95 |
+
- **Caching**: Optimized performance with Streamlit's caching system
|
| 96 |
+
|
| 97 |
+
### Data Pipeline
|
| 98 |
+
```Hugging Face Dataset → hf_hub_download → JSON Loading →
|
| 99 |
+
Pandas Processing → Streamlit Interface → Interactive Visualizations
|
| 100 |
+
```
|
| 101 |
+
|
| 102 |
+
## 📊 Dataset Statistics
|
| 103 |
+
|
| 104 |
+
- **🧬 Compounds**: 352 unique natural products
|
| 105 |
+
- **🔬 Bioactivities**: 1,314 documented activities
|
| 106 |
+
- **🎯 Therapeutic Areas**: 6+ major categories
|
| 107 |
+
- **📚 Sources**: PubChem, ChEMBL, peer-reviewed literature
|
| 108 |
+
- **🔄 Updates**: Regularly maintained and expanded
|
| 109 |
+
|
| 110 |
+
## 🌐 Related Resources
|
| 111 |
+
|
| 112 |
+
- **📊 Dataset**: [PhytoAI MEGA Dataset](https://huggingface.co/datasets/Gatescrispy/phytoai-mega-dataset)
|
| 113 |
+
- **🤖 Models**: [PhytoAI Discovery Models](https://huggingface.co/Gatescrispy/phytoai-discovery-models)
|
| 114 |
+
- **📖 Documentation**: Comprehensive API and usage documentation
|
| 115 |
+
- **💬 Community**: Research collaboration and support forum
|
| 116 |
+
|
| 117 |
+
## 🏆 Impact & Recognition
|
| 118 |
+
|
| 119 |
+
This tool has been designed to bridge the gap between traditional phytotherapy knowledge and modern AI-driven drug discovery, providing researchers worldwide with:
|
| 120 |
+
|
| 121 |
+
- **Accessible Data**: User-friendly interface for complex bioactivity data
|
| 122 |
+
- **Research Acceleration**: Rapid compound screening and hypothesis generation
|
| 123 |
+
- **Global Collaboration**: Shared platform for international research initiatives
|
| 124 |
+
- **Educational Value**: Training resource for next-generation researchers
|
| 125 |
+
|
| 126 |
+
## 📄 Citation & License
|
| 127 |
+
|
| 128 |
+
**License**: CC BY 4.0 - Free for academic and commercial use with attribution
|
| 129 |
+
|
| 130 |
+
**Citation**: If you use PhytoAI Assistant in your research, please cite:
|
| 131 |
+
```bibtex
|
| 132 |
+
@software{phytoai_assistant_2025,
|
| 133 |
+
title={PhytoAI Assistant: Interactive AI Tool for Phytotherapy Research},
|
| 134 |
+
author={Tantcheu, Cedric},
|
| 135 |
+
year={2025},
|
| 136 |
+
url={https://huggingface.co/spaces/Gatescrispy/phytoai-assistant},
|
| 137 |
+
note={Interactive Streamlit application for natural compound bioactivity exploration}
|
| 138 |
+
}
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
## 👨💻 About the Developer
|
| 142 |
+
|
| 143 |
+
**Cedric Tantcheu** - AI & Phytotherapy Research Specialist
|
| 144 |
+
- 🎓 Expertise in cheminformatics, machine learning, and natural product research
|
| 145 |
+
- 🔬 Focus on AI-driven drug discovery from traditional medicine
|
| 146 |
+
- 🌍 Committed to open science and global health solutions
|
| 147 |
+
|
| 148 |
+
---
|
| 149 |
|
| 150 |
+
**🌿 Advancing phytotherapy research through intelligent technology**
|
| 151 |
|
| 152 |
+
*Built with ❤️ for the global research community*
|
|
|
app.py
ADDED
|
@@ -0,0 +1,220 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
import streamlit as st
|
| 2 |
+
import json
|
| 3 |
+
import pandas as pd
|
| 4 |
+
from huggingface_hub import hf_hub_download
|
| 5 |
+
import plotly.express as px
|
| 6 |
+
|
| 7 |
+
st.set_page_config(
|
| 8 |
+
page_title="PhytoAI Assistant",
|
| 9 |
+
page_icon="🌿",
|
| 10 |
+
layout="wide"
|
| 11 |
+
)
|
| 12 |
+
|
| 13 |
+
@st.cache_data
|
| 14 |
+
def load_phytoai_data():
|
| 15 |
+
"""Load PhytoAI data from HF dataset"""
|
| 16 |
+
try:
|
| 17 |
+
dataset_path = hf_hub_download(
|
| 18 |
+
repo_id="Gatescrispy/phytoai-mega-dataset",
|
| 19 |
+
filename="mega_final_dataset.json",
|
| 20 |
+
repo_type="dataset"
|
| 21 |
+
)
|
| 22 |
+
with open(dataset_path, 'r') as f:
|
| 23 |
+
return json.load(f)
|
| 24 |
+
except Exception as e:
|
| 25 |
+
st.error(f"Data loading error: {e}")
|
| 26 |
+
return None
|
| 27 |
+
|
| 28 |
+
def main():
|
| 29 |
+
st.title("🌿 PhytoAI Assistant")
|
| 30 |
+
st.markdown("### AI Assistant for Phytotherapy Research")
|
| 31 |
+
st.markdown("---")
|
| 32 |
+
|
| 33 |
+
# Load data
|
| 34 |
+
with st.spinner("Loading PhytoAI data..."):
|
| 35 |
+
data = load_phytoai_data()
|
| 36 |
+
|
| 37 |
+
if data is None:
|
| 38 |
+
st.error("❌ Unable to load PhytoAI data")
|
| 39 |
+
st.info("The dataset will be available once uploaded to Hugging Face")
|
| 40 |
+
|
| 41 |
+
# Demo data
|
| 42 |
+
st.subheader("📊 PhytoAI Dataset Preview")
|
| 43 |
+
st.write("**Dataset content:**")
|
| 44 |
+
st.write("• 352 unique natural compounds")
|
| 45 |
+
st.write("• 1,314 documented bioactivities")
|
| 46 |
+
st.write("• Sources: PubChem, ChEMBL, scientific literature")
|
| 47 |
+
|
| 48 |
+
return
|
| 49 |
+
|
| 50 |
+
# Search interface
|
| 51 |
+
st.sidebar.header("🔍 Compound Search")
|
| 52 |
+
|
| 53 |
+
search_type = st.sidebar.selectbox(
|
| 54 |
+
"Search type:",
|
| 55 |
+
["Compound name", "Therapeutic activity"]
|
| 56 |
+
)
|
| 57 |
+
|
| 58 |
+
if search_type == "Compound name":
|
| 59 |
+
compound_search = st.sidebar.text_input(
|
| 60 |
+
"Compound name",
|
| 61 |
+
placeholder="curcumin, resveratrol, quercetin..."
|
| 62 |
+
)
|
| 63 |
+
|
| 64 |
+
if compound_search:
|
| 65 |
+
search_compounds_by_name(data, compound_search)
|
| 66 |
+
|
| 67 |
+
elif search_type == "Therapeutic activity":
|
| 68 |
+
activity_search = st.sidebar.selectbox(
|
| 69 |
+
"Select an activity:",
|
| 70 |
+
["", "anti-inflammatory", "antioxidant", "cardiovascular",
|
| 71 |
+
"neuroprotective", "anti-cancer", "antimicrobial"]
|
| 72 |
+
)
|
| 73 |
+
|
| 74 |
+
if activity_search:
|
| 75 |
+
search_by_therapeutic_activity(data, activity_search)
|
| 76 |
+
|
| 77 |
+
# Main statistics
|
| 78 |
+
display_main_statistics(data)
|
| 79 |
+
|
| 80 |
+
# Visualizations
|
| 81 |
+
create_visualizations(data)
|
| 82 |
+
|
| 83 |
+
# Footer
|
| 84 |
+
st.markdown("---")
|
| 85 |
+
st.markdown("**🌿 PhytoAI** - AI Assistant for Phytotherapy Research")
|
| 86 |
+
st.markdown("📊 [PhytoAI Dataset](https://huggingface.co/datasets/Gatescrispy/phytoai-mega-dataset) | 🔬 Research & Development")
|
| 87 |
+
|
| 88 |
+
def search_compounds_by_name(data, search_term):
|
| 89 |
+
"""Search by compound name"""
|
| 90 |
+
st.subheader(f"🔍 Results for '{search_term}'")
|
| 91 |
+
|
| 92 |
+
results = []
|
| 93 |
+
for compound_id, compound_data in data.items():
|
| 94 |
+
compound_name = compound_data.get('compound_name', '').lower()
|
| 95 |
+
if search_term.lower() in compound_name:
|
| 96 |
+
results.append((compound_id, compound_data))
|
| 97 |
+
|
| 98 |
+
if results:
|
| 99 |
+
for compound_id, compound_data in results[:5]:
|
| 100 |
+
with st.expander(f"🧬 {compound_data.get('compound_name', 'Unknown compound')}"):
|
| 101 |
+
col1, col2 = st.columns(2)
|
| 102 |
+
|
| 103 |
+
with col1:
|
| 104 |
+
st.write("**Molecular Properties:**")
|
| 105 |
+
st.write(f"• Formula: `{compound_data.get('molecular_formula', 'N/A')}`")
|
| 106 |
+
st.write(f"• SMILES: `{compound_data.get('smiles', 'N/A')}`")
|
| 107 |
+
st.write(f"• PubChem CID: `{compound_data.get('pubchem_cid', 'N/A')}`")
|
| 108 |
+
|
| 109 |
+
with col2:
|
| 110 |
+
st.write("**Bioactivities:**")
|
| 111 |
+
bioactivities = compound_data.get('bioactivities', [])
|
| 112 |
+
for i, activity in enumerate(bioactivities[:5]):
|
| 113 |
+
st.write(f"• {activity.get('activity_type', 'N/A')}")
|
| 114 |
+
if i >= 4 and len(bioactivities) > 5:
|
| 115 |
+
st.write(f"... and {len(bioactivities) - 5} others")
|
| 116 |
+
break
|
| 117 |
+
else:
|
| 118 |
+
st.info("No compounds found for this search")
|
| 119 |
+
|
| 120 |
+
def search_by_therapeutic_activity(data, activity_type):
|
| 121 |
+
"""Search by therapeutic activity"""
|
| 122 |
+
st.subheader(f"🎯 Compounds with activity: {activity_type}")
|
| 123 |
+
|
| 124 |
+
matching_compounds = []
|
| 125 |
+
for compound_id, compound_data in data.items():
|
| 126 |
+
bioactivities = compound_data.get('bioactivities', [])
|
| 127 |
+
for activity in bioactivities:
|
| 128 |
+
if activity_type.lower() in activity.get('activity_type', '').lower():
|
| 129 |
+
matching_compounds.append({
|
| 130 |
+
'Compound': compound_data.get('compound_name', 'N/A'),
|
| 131 |
+
'Formula': compound_data.get('molecular_formula', 'N/A'),
|
| 132 |
+
'Activity': activity.get('activity_type', 'N/A'),
|
| 133 |
+
'CID': compound_data.get('pubchem_cid', 'N/A')
|
| 134 |
+
})
|
| 135 |
+
break
|
| 136 |
+
|
| 137 |
+
if matching_compounds:
|
| 138 |
+
df = pd.DataFrame(matching_compounds)
|
| 139 |
+
st.dataframe(df, use_container_width=True)
|
| 140 |
+
st.info(f"📊 {len(matching_compounds)} compounds found with this activity")
|
| 141 |
+
else:
|
| 142 |
+
st.warning("No compounds found for this activity")
|
| 143 |
+
|
| 144 |
+
def display_main_statistics(data):
|
| 145 |
+
"""Display main statistics"""
|
| 146 |
+
st.header("📈 PhytoAI Dataset Statistics")
|
| 147 |
+
|
| 148 |
+
col1, col2, col3, col4 = st.columns(4)
|
| 149 |
+
|
| 150 |
+
with col1:
|
| 151 |
+
st.metric("🧬 Total compounds", len(data))
|
| 152 |
+
|
| 153 |
+
with col2:
|
| 154 |
+
total_bioactivities = sum(len(comp.get('bioactivities', [])) for comp in data.values())
|
| 155 |
+
st.metric("🔬 Total bioactivities", f"{total_bioactivities:,}")
|
| 156 |
+
|
| 157 |
+
with col3:
|
| 158 |
+
therapeutic_areas = set()
|
| 159 |
+
for compound_data in data.values():
|
| 160 |
+
for activity in compound_data.get('bioactivities', []):
|
| 161 |
+
activity_type = activity.get('activity_type', '').lower()
|
| 162 |
+
if any(term in activity_type for term in ['anti-inflammatory', 'antioxidant', 'cardiovascular', 'neuroprotective', 'anti-cancer', 'antimicrobial']):
|
| 163 |
+
therapeutic_areas.add(activity_type.split()[0] if activity_type else 'unknown')
|
| 164 |
+
st.metric("🎯 Therapeutic areas", len(therapeutic_areas))
|
| 165 |
+
|
| 166 |
+
with col4:
|
| 167 |
+
compounds_with_pubchem = sum(1 for comp in data.values() if comp.get('pubchem_cid'))
|
| 168 |
+
coverage = (compounds_with_pubchem / len(data)) * 100
|
| 169 |
+
st.metric("📊 PubChem coverage", f"{coverage:.1f}%")
|
| 170 |
+
|
| 171 |
+
def create_visualizations(data):
|
| 172 |
+
"""Create interactive visualizations"""
|
| 173 |
+
st.header("📊 Interactive Visualizations")
|
| 174 |
+
|
| 175 |
+
# Therapeutic activity analysis
|
| 176 |
+
activity_counts = {}
|
| 177 |
+
for compound_data in data.values():
|
| 178 |
+
for activity in compound_data.get('bioactivities', []):
|
| 179 |
+
activity_type = activity.get('activity_type', '').lower()
|
| 180 |
+
# Categorize activities
|
| 181 |
+
if 'anti-inflammatory' in activity_type:
|
| 182 |
+
activity_counts['Anti-inflammatory'] = activity_counts.get('Anti-inflammatory', 0) + 1
|
| 183 |
+
elif 'antioxidant' in activity_type:
|
| 184 |
+
activity_counts['Antioxidant'] = activity_counts.get('Antioxidant', 0) + 1
|
| 185 |
+
elif 'cardiovascular' in activity_type:
|
| 186 |
+
activity_counts['Cardiovascular'] = activity_counts.get('Cardiovascular', 0) + 1
|
| 187 |
+
elif 'neuroprotective' in activity_type:
|
| 188 |
+
activity_counts['Neuroprotective'] = activity_counts.get('Neuroprotective', 0) + 1
|
| 189 |
+
elif 'anti-cancer' in activity_type or 'anticancer' in activity_type:
|
| 190 |
+
activity_counts['Anti-cancer'] = activity_counts.get('Anti-cancer', 0) + 1
|
| 191 |
+
elif 'antimicrobial' in activity_type:
|
| 192 |
+
activity_counts['Antimicrobial'] = activity_counts.get('Antimicrobial', 0) + 1
|
| 193 |
+
|
| 194 |
+
if activity_counts:
|
| 195 |
+
col1, col2 = st.columns(2)
|
| 196 |
+
|
| 197 |
+
with col1:
|
| 198 |
+
# Bar chart
|
| 199 |
+
fig_bar = px.bar(
|
| 200 |
+
x=list(activity_counts.keys()),
|
| 201 |
+
y=list(activity_counts.values()),
|
| 202 |
+
title="Distribution of Therapeutic Activities",
|
| 203 |
+
labels={'x': 'Activity Type', 'y': 'Number of Compounds'},
|
| 204 |
+
color=list(activity_counts.values()),
|
| 205 |
+
color_continuous_scale="Viridis"
|
| 206 |
+
)
|
| 207 |
+
fig_bar.update_layout(showlegend=False)
|
| 208 |
+
st.plotly_chart(fig_bar, use_container_width=True)
|
| 209 |
+
|
| 210 |
+
with col2:
|
| 211 |
+
# Pie chart
|
| 212 |
+
fig_pie = px.pie(
|
| 213 |
+
values=list(activity_counts.values()),
|
| 214 |
+
names=list(activity_counts.keys()),
|
| 215 |
+
title="Therapeutic Areas Distribution"
|
| 216 |
+
)
|
| 217 |
+
st.plotly_chart(fig_pie, use_container_width=True)
|
| 218 |
+
|
| 219 |
+
if __name__ == "__main__":
|
| 220 |
+
main()
|
requirements.txt
CHANGED
|
@@ -1,3 +1,4 @@
|
|
| 1 |
-
|
| 2 |
-
|
| 3 |
-
|
|
|
|
|
|
| 1 |
+
streamlit>=1.28.0
|
| 2 |
+
huggingface_hub>=0.16.0
|
| 3 |
+
pandas>=1.5.0
|
| 4 |
+
plotly>=5.0.0
|