---
license: cc-by-nc-4.0
tags:
- agent
- chemistry
- environment
---
## Model Overview
This model is an **Contaminants of Emerging Concern Annotation Intelligent Agent** built on the Dify platform, integrated with the Norman knowledge base, Pubchemlite_exposomics database, and Invitrodb_v4.3 database. It enables high-throughput, large-scale annotation of emerging contaminants, including **usage classification** and **toxicity endpoints** by inputting the IUPAC name of the target contaminant.
## Model Purpose
To construct a specialized knowledge database for emerging contaminants usage classification, which combines multi-source chemical/toxicological databases and AI agents. The core goals are:
1. Realize fast and large-scale annotation of emerging contaminants' usage categories.
2. Provide efficient query services for toxicity endpoints.
3. Support high-throughput data analysis scenarios for emerging contaminants in environmental chemistry and toxicology research.
## Key Definitions
| Term | Definition |
|------|------------|
| **System** | Refers to the *Emerging Contaminants Annotation Intelligent Agent Based on Dify Platform* |
| **User** | Anyone authorized to use the functions of this system |
| **IUPAC Name** | A systematic naming convention formulated by IUPAC for accurately describing the composition and structure of chemical substances |
| **AI Agent** | A system based on large language models (LLMs) that understands user intentions and invokes multiple tools to solve complex tasks; in this system, it accepts IUPAC names of emerging contaminants and outputs usage classification, toxicity endpoints, and AC50 values |
| **Norman Database** | A network for monitoring and evaluating environmental pollutants, facilitating European and international cooperation and data sharing in environmental pollution monitoring; classifies over 100,000 chemicals |
| **Pubchemlite_exposomics Database** | An open-source organic molecule information database derived from PubChem, applicable for mass spectrometry analysis and non-targeted identification of unknown pollutants |
| **Invitrodb_v4.3 Database** | The core database of US EPA ToxCast, storing a large amount of biological activity data, analysis workflows, and metadata of compounds generated by high-throughput screening (HTS) |
## System Architecture & Components
### Core Databases Deployment
The system integrates three core databases with differentiated deployment strategies:
1. **Norman Chemical Classification Database**
- Serves as a relational knowledge base, uploaded and parsed on the FastGPT platform, then embedded into the Dify platform.
- Optimized classification: Integrated or removed redundant categories, finally categorized chemicals into **9 classes**.
2. **Pubchemlite_exposomics & Invitrodb_v4.3 Databases**
- Deployed in local SQL databases to support efficient local query and invocation.
- Query workflow: GPT-4o generates SQL statements → Extract valid SQL queries → Backend executes database queries and returns results.
### Agent Workflow Design (Dify Chatflow)
1. **Base Model**: GPT-4o is used to generate SQL query statements and organize output data in JSON format for subsequent data extraction.
2. **Custom Schema Tool**
- Created on the Dify platform to standardize SQL statement generation and API invocation logic.
- Implementation steps: Create custom tool → Configure tool name and Schema rules (see schema_tool.txt for details).
3. **Knowledge Base Integration (FastGPT + Dify)**
- **FastGPT Knowledge Base Construction**
1. Log in to FastGPT (https://fastgpt.aiown.top/) and enter the main interface.
2. Import dataset (50,000+ chemicals with IUPAC names and categories from the Norman database).
3. Connect the dataset to a FastGPT application and configure prompts (consistent with Few-shot prompts).
4. Publish the application and export the API key for subsequent calls.
- **Fast-Dify Adaptor (FDA) Plugin**
- Resolves API incompatibility between FastGPT and Dify.
- Deployment steps: Create `docker-compose.yml` for FDA → Run `docker-compose up -d` in the configuration file directory to deploy the plugin.
- **Dify External Knowledge Base Connection**: Link the trained FastGPT knowledge base to Dify by importing the FastGPT API key and knowledge base ID.