File Structure
.
├── step1_pubchemlite_invitro_to_dify_en.py
└── step2_CECs annotating_agent_v1.0.py
File Details
1. pubchemlite_invitro_to_dify_en.py
This is a Flask-based SQL query API service with the following key functionalities:
- Allows users to execute SQL queries via HTTP POST requests.
- Provides dual-database support for PubChem Lite and InVitroDB.
- Ensures safety by restricting operations to
SELECTqueries only (disallowsINSERT,DELETE,UPDATE,DROP, etc.). - Includes robust error handling with detailed feedback.
How to Run:
python pubchemlite_invitro_to_dify_en.py
The service runs on http://127.0.0.1:5000 by default.
1. CECs annotating_agent_v1.0.py
This is a Tkinter-based batch compound classification tool with the following key functionalities:
- Allows users to select a CSV file and configure parameters through a graphical interface.
- Uses Dify's API to classify compounds into predefined categories.
- Supports batch processing and saves results as CSV files.
- Provides detailed logging and error messages for each step.
How to Run:
python CECs annotating_agent_v1.0.py
Key Dependencies:
tkinter: For the graphical user interface.pandas: For loading and saving CSV files.requests: For making RESTful API calls.json: For parsing and generating JSON data.
Usage Guide
1. Environment Setup
Ensure you have the following Python packages installed:
pip install flask pandas sqlalchemy requests pymysql
2. SQL Query Service
- Modify the database connection details in
pubchemlite_invitro_to_dify_en.py:DB_CONFIGS = { "pubchemlite": { "uri": "mysql+pymysql://<username>:<password>@<host>:<port>/<database>" }, "invitrodb_v4_3": { "uri": "mysql+pymysql://<username>:<password>@<host>:<port>/<database>" } } - Start the service and test the API with the examples provided above.
3. Batch Compound Classification Tool
- Update the default configuration in
CECs annotating_agent_v1.0.py:self.default_api_key = "<DIFY_API_KEY>" self.default_base_url = "http://<DIFY_HOST>:<PORT>/v1" self.default_csv_path = "./path_to_your_data.csv" - Run the program and use the GUI to upload a CSV file and execute batch classification.
Example Data
Input File Format
The input CSV file should contain a column with compound names. For example:
IUPAC_name
Methanol
Ethanol
Acetone
Output File Format
The output file will be in CSV format and include the following fields:
CompoundName: The compound name.MainCategory: The main classification category.AdditionalCategory1: Subcategory 1.AdditionalCategory2: Subcategory 2.EndpointName: Expanded endpoint classification.XLogP: XLogP value.BioPathway: Biological pathway information.ToxicityInfo: Toxicity information.KnownUse: Known uses of the compound.DisorderDisease: Associated disorders or diseases.
Contributors
We welcome contributions! If you are interested in improving this project, feel free to submit pull requests or suggestions.
License
This project is licensed under the cc-by-nc-4.0 License.