Qianhui19 commited on
Commit
efbafeb
·
verified ·
1 Parent(s): 91a6d73

Upload readme.md

Browse files
Files changed (1) hide show
  1. readme.md +141 -0
readme.md ADDED
@@ -0,0 +1,141 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+
3
+ # Compound Batch Query Tool
4
+
5
+ ## Project Overview
6
+ This project is a compound batch query tool designed to annotating Contaminants of Emerging Concern (CECs)through databases and API interactions. It includes a Dify-based annotating agent, Flask-based SQL query service and a Tkinter-based graphical user interface for batch annotating CECs.
7
+
8
+ ### Key Features
9
+ 1.**CECs annotating agent**:
10
+ - Utilizes Dify's visual workflow orchestration engine and chains together the logic for querying multiple databases (such as PubChem Lite and InVitroDB) to form an automated pipeline.
11
+ - Supports CECs annotaing, which includes: `Category`, `EndpointName`, `XLogP, `BioPathway`, `ToxicityInfo`, `KnownUse`, `DisorderDisease`.
12
+
13
+ 2. **SQL Query Service**:
14
+ - Provides a RESTful API (via Flask) to execute `SELECT` queries on PubChem Lite and InVitroDB databases.
15
+ - Supports dual-database switching with robust security design.
16
+ - Ensures safe SQL operations by restricting queries to `SELECT` only.
17
+
18
+ 3. **Batch Compound Classification Tool**:
19
+ - A desktop GUI tool (built using Tkinter) that processes compound names from CSV files.
20
+ - Uses Dify's API to classify compounds into categories such as main category, subcategories, biological pathways, toxicity information, etc.
21
+ - Saves the results as CSV files with detailed logs for reference.
22
+
23
+ ---
24
+
25
+ ## File Structure
26
+
27
+ ```
28
+ .
29
+ ├── step1_pubchemlite_invitro_to_dify_en.py
30
+ └── step2_CECs annotating_agent_v1.0.py
31
+ ```
32
+
33
+ ### File Details
34
+
35
+ #### 1. `pubchemlite_invitro_to_dify_en.py`
36
+
37
+ This is a Flask-based SQL query API service with the following key functionalities:
38
+ - Allows users to execute SQL queries via HTTP POST requests.
39
+ - Provides dual-database support for PubChem Lite and InVitroDB.
40
+ - Ensures safety by restricting operations to `SELECT` queries only (disallows `INSERT`, `DELETE`, `UPDATE`, `DROP`, etc.).
41
+ - Includes robust error handling with detailed feedback.
42
+
43
+ **How to Run**:
44
+ ```bash
45
+ python pubchemlite_invitro_to_dify_en.py
46
+ ```
47
+
48
+ The service runs on `http://127.0.0.1:5000` by default.
49
+
50
+
51
+ #### 1. `CECs annotating_agent_v1.0.py`
52
+
53
+ This is a Tkinter-based batch compound classification tool with the following key functionalities:
54
+ - Allows users to select a CSV file and configure parameters through a graphical interface.
55
+ - Uses Dify's API to classify compounds into predefined categories.
56
+ - Supports batch processing and saves results as CSV files.
57
+ - Provides detailed logging and error messages for each step.
58
+
59
+ **How to Run**:
60
+ ```bash
61
+ python CECs annotating_agent_v1.0.py
62
+ ```
63
+
64
+ **Key Dependencies**:
65
+ - `tkinter`: For the graphical user interface.
66
+ - `pandas`: For loading and saving CSV files.
67
+ - `requests`: For making RESTful API calls.
68
+ - `json`: For parsing and generating JSON data.
69
+
70
+ ---
71
+
72
+ ## Usage Guide
73
+
74
+ ### 1. Environment Setup
75
+ Ensure you have the following Python packages installed:
76
+ ```bash
77
+ pip install flask pandas sqlalchemy requests pymysql
78
+ ```
79
+
80
+ ### 2. SQL Query Service
81
+ - Modify the database connection details in `pubchemlite_invitro_to_dify_en.py`:
82
+ ```python
83
+ DB_CONFIGS = {
84
+ "pubchemlite": {
85
+ "uri": "mysql+pymysql://<username>:<password>@<host>:<port>/<database>"
86
+ },
87
+ "invitrodb_v4_3": {
88
+ "uri": "mysql+pymysql://<username>:<password>@<host>:<port>/<database>"
89
+ }
90
+ }
91
+ ```
92
+ - Start the service and test the API with the examples provided above.
93
+
94
+ ### 3. Batch Compound Classification Tool
95
+ - Update the default configuration in `CECs annotating_agent_v1.0.py`:
96
+ ```python
97
+ self.default_api_key = "<DIFY_API_KEY>"
98
+ self.default_base_url = "http://<DIFY_HOST>:<PORT>/v1"
99
+ self.default_csv_path = "./path_to_your_data.csv"
100
+ ```
101
+ - Run the program and use the GUI to upload a CSV file and execute batch classification.
102
+
103
+ ---
104
+
105
+ ## Example Data
106
+
107
+ ### Input File Format
108
+ The input CSV file should contain a column with compound names. For example:
109
+ ```csv
110
+ IUPAC_name
111
+ Methanol
112
+ Ethanol
113
+ Acetone
114
+ ```
115
+
116
+ ### Output File Format
117
+ The output file will be in CSV format and include the following fields:
118
+ - `CompoundName`: The compound name.
119
+ - `MainCategory`: The main classification category.
120
+ - `AdditionalCategory1`: Subcategory 1.
121
+ - `AdditionalCategory2`: Subcategory 2.
122
+ - `EndpointName`: Expanded endpoint classification.
123
+ - `XLogP`: XLogP value.
124
+ - `BioPathway`: Biological pathway information.
125
+ - `ToxicityInfo`: Toxicity information.
126
+ - `KnownUse`: Known uses of the compound.
127
+ - `DisorderDisease`: Associated disorders or diseases.
128
+
129
+ ---
130
+
131
+ ## Contributors
132
+
133
+ We welcome contributions! If you are interested in improving this project, feel free to submit pull requests or suggestions.
134
+
135
+ ---
136
+
137
+ ## License
138
+
139
+ This project is licensed under the cc-by-nc-4.0 License.
140
+
141
+ ---