--- title: PFAS AI Analyzer emoji: ๐Ÿงช colorFrom: red colorTo: indigo sdk: docker app_port: 7860 tags: - chemistry - pfas - bert - toxicology - streamlit pinned: false short_description: AI-powered PFAS detection and risk assessment pipeline. --- # ๐Ÿงช PFAS AI Analyzer (BERT Enhanced) This application is an end-to-end AI pipeline designed to identify, classify, and assess the environmental risks of **Per- and Polyfluoroalkyl Substances (PFAS)**. It leverages a fine-tuned **BERT (Bidirectional Encoder Representations from Transformers)** model to generate molecular embeddings, followed by Random Forest regressors for property prediction. ## ๐Ÿš€ Key Features 1. **Advanced PFAS Detection:** Uses the OECD-aligned "Chain Rule" logic to distinguish industrial PFAS from fluorinated pharmaceuticals (e.g., Prozac, Fipronil). 2. **Subclass Classification:** Automatically categorizes molecules into PFCA, PFSA, or General PFAS. 3. **Risk Assessment:** Predicts key environmental properties: * **Persistence:** Estimated half-life / biodegradation potential. * **Mobility:** Soil adsorption coefficient ($K_{oc}$). * **Bioaccumulation:** Bioconcentration factor (BCF) / LogP. 4. **BERT Embeddings:** Utilizes a transformer model trained on ChEMBL data to understand deep molecular features beyond simple fingerprints. ## ๐Ÿง  How It Works 1. **Input:** The user provides a SMILES string (Simplified Molecular Input Line Entry System). 2. **Tokenization:** The SMILES string is tokenized using a specialized `Spe_Tokenizer`. 3. **Embedding:** The **SMILE-to-BERT** model converts the tokens into a 113-dimensional dense vector representation. 4. **Inference:** * A **Random Forest Classifier** determines the PFAS subclass. * **Random Forest Regressors** predict environmental properties. 5. **Validation:** A rule-based sanity checker applies chemical structure rules to prevent false positives. ## ๐Ÿ“‚ File Structure * `src/app.py`: Main Streamlit application. * `src/pfas_assets.zip`: Contains the BERT model weights and tokenizer data. * `src/*.pkl`: Trained Scikit-Learn models