File size: 2,334 Bytes
b84f100
 
 
 
 
 
c85b7f7
b84f100
 
 
 
 
 
 
 
 
 
 
c9c9100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f373e28
c9c9100
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f373e28
 
c85b7f7
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
---
title: Algae Yield Predictor
emoji: 🌱
colorFrom: green
colorTo: blue
sdk: gradio
sdk_version: 5.46.1
app_file: app.py
pinned: false
license: cc-by-nc-nd-4.0
---

# 🌱 Algae Yield Predictor

This Space provides an interactive interface to **predict algal biomass, lipid, protein, and carbohydrate yields** under different culture conditions.  
It uses augmented datasets (200k synthetic rows per target) combined with ensemble ML models (CatBoost, XGBoost, LightGBM, ExtraTrees) and a meta-stacking approach.

[Full description continues here...]
# 🌱 Algae Yield Predictor

This Space provides an interactive interface to **predict algal biomass, lipid, protein, and carbohydrate yields** under different culture conditions.  
It uses augmented datasets (200k synthetic rows per target) combined with ensemble ML models (CatBoost, XGBoost, LightGBM, ExtraTrees) and a meta-stacking approach.

---

## ✨ Features
- **Targets:** biomass, lipid, protein, carbohydrate  
- **Species–Media aware:** dropdowns restrict valid species–medium combinations  
- **Curated suggestions:** shows recommended conditions for each species/target  
- **Uncertainty estimates:** KNN-based local intervals (10–90%) from augmented data  
- **Response plots:** sweep one variable (light, days, pH, etc.) and visualize prediction curve + uncertainty band  
- **DOI references:** retrieves closest experimental setups from `doi.csv` (if provided)

---

## 🚀 How to Use
1. Select a **target** (biomass, lipid, protein, carb).  
2. Choose **species** and valid **growth medium**.  
3. Adjust culture conditions:
   - Light intensity
   - Day/Night exposure
   - Temperature
   - pH
   - Days of culture  
4. Click **Predict + Plot** to:
   - Get yield prediction with uncertainty band  
   - See response curve for chosen variable  
5. Optionally click **Find Closest DOI Matches** to explore related literature.

---

## 🧩 Models & Data
- **Training data:** real experimental CSV (`ai_al.csv`) + augmented synthetic sets (20k/200k).  
- **Models:** CatBoost, XGBoost, LightGBM, ExtraTrees → stacked with RidgeCV.  
- **Uncertainty:** derived from nearest neighbors in augmented dataset.  

If `doi.csv` is provided with experimental metadata + DOI links, the app will display closest literature matches.

---

## 📂 File Structure