File size: 4,821 Bytes
efbafeb
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---



# Compound Batch Query Tool

## Project Overview
This project is a compound batch query tool designed to annotating  Contaminants of Emerging Concern (CECs)through databases and API interactions. It includes a Dify-based annotating agent, Flask-based SQL query service and a Tkinter-based graphical user interface for batch annotating CECs.

### Key Features
1.**CECs annotating agent**:
   - Utilizes Dify's visual workflow orchestration engine and chains together the logic for querying multiple databases (such as PubChem Lite and InVitroDB) to form an automated pipeline.
   - Supports CECs annotaing, which includes: `Category`, `EndpointName`,  `XLogP, `BioPathway`, `ToxicityInfo`, `KnownUse`, `DisorderDisease`.

2. **SQL Query Service**:
   - Provides a RESTful API (via Flask) to execute `SELECT` queries on PubChem Lite and InVitroDB databases.
   - Supports dual-database switching with robust security design.
   - Ensures safe SQL operations by restricting queries to `SELECT` only.

3. **Batch Compound Classification Tool**:
   - A desktop GUI tool (built using Tkinter) that processes compound names from CSV files.
   - Uses Dify's API to classify compounds into categories such as main category, subcategories, biological pathways, toxicity information, etc.
   - Saves the results as CSV files with detailed logs for reference.

---


## File Structure

```

.

├── step1_pubchemlite_invitro_to_dify_en.py  

└── step2_CECs annotating_agent_v1.0.py

```

### File Details

#### 1. `pubchemlite_invitro_to_dify_en.py`

This is a Flask-based SQL query API service with the following key functionalities:
- Allows users to execute SQL queries via HTTP POST requests.
- Provides dual-database support for PubChem Lite and InVitroDB.
- Ensures safety by restricting operations to `SELECT` queries only (disallows `INSERT`, `DELETE`, `UPDATE`, `DROP`, etc.).
- Includes robust error handling with detailed feedback.

**How to Run**:
```bash

python pubchemlite_invitro_to_dify_en.py

```

The service runs on `http://127.0.0.1:5000` by default.


#### 1. `CECs annotating_agent_v1.0.py`

This is a Tkinter-based batch compound classification tool with the following key functionalities:
- Allows users to select a CSV file and configure parameters through a graphical interface.
- Uses Dify's API to classify compounds into predefined categories.
- Supports batch processing and saves results as CSV files.
- Provides detailed logging and error messages for each step.

**How to Run**:
```bash

python CECs annotating_agent_v1.0.py

```

**Key Dependencies**:
- `tkinter`: For the graphical user interface.
- `pandas`: For loading and saving CSV files.
- `requests`: For making RESTful API calls.
- `json`: For parsing and generating JSON data.

---

## Usage Guide

### 1. Environment Setup
Ensure you have the following Python packages installed:
```bash

pip install flask pandas sqlalchemy requests pymysql

```

### 2. SQL Query Service
- Modify the database connection details in `pubchemlite_invitro_to_dify_en.py`:
  ```python

  DB_CONFIGS = {

      "pubchemlite": {

          "uri": "mysql+pymysql://<username>:<password>@<host>:<port>/<database>"

      },

      "invitrodb_v4_3": {

          "uri": "mysql+pymysql://<username>:<password>@<host>:<port>/<database>"

      }

  }

  ```
- Start the service and test the API with the examples provided above.

### 3. Batch Compound Classification Tool
- Update the default configuration in `CECs annotating_agent_v1.0.py`:
  ```python

  self.default_api_key = "<DIFY_API_KEY>"

  self.default_base_url = "http://<DIFY_HOST>:<PORT>/v1"

  self.default_csv_path = "./path_to_your_data.csv"

  ```
- Run the program and use the GUI to upload a CSV file and execute batch classification.

---

## Example Data

### Input File Format
The input CSV file should contain a column with compound names. For example:
```csv

IUPAC_name

Methanol

Ethanol

Acetone

```

### Output File Format
The output file will be in CSV format and include the following fields:
- `CompoundName`: The compound name.
- `MainCategory`: The main classification category.
- `AdditionalCategory1`: Subcategory 1.
- `AdditionalCategory2`: Subcategory 2.
- `EndpointName`: Expanded endpoint classification.
- `XLogP`: XLogP value.
- `BioPathway`: Biological pathway information.
- `ToxicityInfo`: Toxicity information.
- `KnownUse`: Known uses of the compound.
- `DisorderDisease`: Associated disorders or diseases.

---

## Contributors

We welcome contributions! If you are interested in improving this project, feel free to submit pull requests or suggestions.

---

## License

This project is licensed under the cc-by-nc-4.0 License.

---