ashaddams's picture
Update README.md
c85b7f7 verified
---
title: Algae Yield Predictor
emoji: 🌱
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.46.1
app_file: app.py
pinned: false
license: cc-by-nc-nd-4.0
---
# 🌱 Algae Yield Predictor
This Space provides an interactive interface to **predict algal biomass, lipid, protein, and carbohydrate yields** under different culture conditions.
It uses augmented datasets (200k synthetic rows per target) combined with ensemble ML models (CatBoost, XGBoost, LightGBM, ExtraTrees) and a meta-stacking approach.
[Full description continues here...]
# 🌱 Algae Yield Predictor
This Space provides an interactive interface to **predict algal biomass, lipid, protein, and carbohydrate yields** under different culture conditions.
It uses augmented datasets (200k synthetic rows per target) combined with ensemble ML models (CatBoost, XGBoost, LightGBM, ExtraTrees) and a meta-stacking approach.
---
## ✨ Features
- **Targets:** biomass, lipid, protein, carbohydrate
- **Species–Media aware:** dropdowns restrict valid species–medium combinations
- **Curated suggestions:** shows recommended conditions for each species/target
- **Uncertainty estimates:** KNN-based local intervals (10–90%) from augmented data
- **Response plots:** sweep one variable (light, days, pH, etc.) and visualize prediction curve + uncertainty band
- **DOI references:** retrieves closest experimental setups from `doi.csv` (if provided)
---
## 🚀 How to Use
1. Select a **target** (biomass, lipid, protein, carb).
2. Choose **species** and valid **growth medium**.
3. Adjust culture conditions:
- Light intensity
- Day/Night exposure
- Temperature
- pH
- Days of culture
4. Click **Predict + Plot** to:
- Get yield prediction with uncertainty band
- See response curve for chosen variable
5. Optionally click **Find Closest DOI Matches** to explore related literature.
---
## 🧩 Models & Data
- **Training data:** real experimental CSV (`ai_al.csv`) + augmented synthetic sets (20k/200k).
- **Models:** CatBoost, XGBoost, LightGBM, ExtraTrees → stacked with RidgeCV.
- **Uncertainty:** derived from nearest neighbors in augmented dataset.
If `doi.csv` is provided with experimental metadata + DOI links, the app will display closest literature matches.
---
## 📂 File Structure