MzeeChakula_Model / README.md
Shakiran's picture
Update README.md
7223abb verified
---
license: mit
language: en
tags:
- nutrition
- healthcare
- elderly-care
- regression
- xgboost
- uganda
- africa
datasets:
- uganda-elderly-nutrition
- Shakiran/UgandanNutritionMealPlanning
- dongx1997/NutriBench
metrics:
- r2
- mae
- rmse
library_name: xgboost
pipeline_tag: tabular-regression
---
# XGBoost Model for Elderly Nutrition Planning in Uganda
## Model Description
This XGBoost regression model predicts daily caloric needs for elderly individuals (aged 60+) in Uganda based on nutritional content, health conditions, regional factors, and demographic information. The model is designed to support nutrition planning, meal preparation, and healthcare decision-making for elderly care in Uganda.
### Model Details
- **Model Type:** XGBoost Regressor (Gradient Boosting)
- **Task:** Tabular Regression
- **Version:** v1.0_optimized
- **Training Date:** November 3, 2025
- **Framework:** XGBoost 2.0+
- **Language:** Python
- **License:** Apache 2.0
### Developed By
- **Organization:** Graph-Enhanced LLMs for Locally-Sourced Elderly Nutrition Planning Project
- **Project Focus:** AI-driven nutrition planning for elderly populations in Uganda
- **Contact:** [shakirannannyombi@gmail.com]
---
## Intended Use
### Primary Use Cases
1. **Nutrition Planning:** Calculate appropriate caloric intake for elderly individuals based on their health profile
2. **Meal Planning:** Support caregivers and healthcare providers in designing meal plans
3. **Healthcare Decision Support:** Assist medical professionals in nutritional assessments
4. **Research:** Enable studies on nutrition needs for elderly populations in Uganda
5. **Policy Development:** Inform nutrition policies for elderly care facilities
### Intended Users
- Healthcare providers and nutritionists
- Elderly care facilities and nursing homes
- Family caregivers
- Public health researchers
- NGOs working in elderly nutrition
### Out-of-Scope Use
- ❌ Not for children or adults under 60 years
- ❌ Not for acute medical conditions requiring immediate intervention
- ❌ Not a replacement for professional medical advice
- ❌ Not validated for use outside Uganda without regional calibration
---
## Performance
### Overall Metrics
| Metric | Training Set | Test Set |
|--------|-------------|----------|
| **R² Score** | 0.9309 | **0.6710** |
| **MAE (kcal/day)** | 1.29 | **2.84** |
| **RMSE (kcal/day)** | 1.65 | **3.60** |
| **Training Time** | 25.0 seconds | - |
### Model Ranking
Compared against 5 different models (HistGradient Boosting, XGBoost, LightGBM, MLP, GNN):
- **Overall Rank:** 🥇 #1 out of 5
- **R² Rank:** 🥇 #1 (0.6710)
- **MAE Rank:** 🥇 #1 (2.84 kcal/day)
- **RMSE Rank:** 🥇 #1 (3.60 kcal/day)
### Baseline Comparison
| Metric | Baseline Model | This Model | Improvement |
|--------|---------------|------------|-------------|
| Test R² | 0.6311 | 0.6710 | **+6.3%** |
| Test MAE | 2.998 kcal/day | 2.842 kcal/day | **-5.2%** |
### Performance Characteristics
- **Strong generalization:** R² = 0.67 indicates good predictive power
- **Low prediction error:** MAE of 2.84 kcal/day is clinically acceptable
- **Moderate overfitting:** Train-test R² gap of 0.26 (manageable with regularization)
- **Consistent predictions:** RMSE close to MAE suggests few outliers
---
## Training Data
### Dataset Overview
- **Dataset Name:** Uganda Elderly Nutrition Dataset (Enriched)
- **Total Samples:** 1,000
- **Training Samples:** 700 (70%)
- **Test Samples:** 300 (30%)
- **Split Method:** Random stratified split (seed=42)
### Features (18 total)
#### Nutritional Content (12 features)
- `Energy_kcal_per_serving` - Energy content per serving
- `Protein_g_per_serving` - Protein content (grams)
- `Fat_g_per_serving` - Fat content (grams)
- `Carbohydrates_g_per_serving` - Carbohydrate content (grams)
- `Fiber_g_per_serving` - Dietary fiber (grams)
- `Calcium_mg_per_serving` - Calcium content (milligrams)
- `Iron_mg_per_serving` - Iron content (milligrams)
- `Zinc_mg_per_serving` - Zinc content (milligrams)
- `VitaminA_µg_per_serving` - Vitamin A content (micrograms)
- `VitaminC_mg_per_serving` - Vitamin C content (milligrams)
- `Potassium_mg_per_serving` - Potassium content (milligrams)
- `Magnesium_mg_per_serving` - Magnesium content (milligrams)
#### Categorical Features (4 features)
- `region_encoded` - Geographic region in Uganda (4 regions)
- `condition_encoded` - Health condition (8 conditions)
- `age_group_encoded` - Age group (3 groups: 60-70, 70-80, 80+)
- `season_encoded` - Seasonal availability
#### Other Features (2 features)
- `portion_size_g` - Portion size in grams
- `estimated_cost_ugx` - Estimated cost in Ugandan Shillings
### Geographic Coverage
**4 Regions of Uganda:**
1. Central Uganda (Buganda)
2. Western Uganda (Ankole, Tooro, Kigezi, Bunyoro)
3. Eastern Uganda (Busoga, Bugisu, Teso)
4. Northern Uganda (Acholi, Lango, Karamoja, West Nile)
### Health Conditions Covered
**8 Common Elderly Conditions:**
1. Hypertension
2. Undernutrition
3. Anemia
4. Frailty
5. Digestive issues
6. Arthritis
7. Osteoporosis
8. Diabetes
### Age Groups
- **60-70 years:** Early elderly
- **70-80 years:** Mid elderly
- **80+ years:** Advanced elderly
### Target Variable
- **Name:** Daily Caloric Needs
- **Unit:** kcal/day
- **Range:** Typically 1,400 - 2,500 kcal/day
- **Distribution:** Approximately normal
---
## Training Details
### Hyperparameters (Optimized)
```python
{
'n_estimators': 200,
'max_depth': 4,
'learning_rate': 0.05,
'min_child_weight': 5,
'subsample': 0.8,
'colsample_bytree': 0.8,
'gamma': 0,
'reg_alpha': 0,
'reg_lambda': 1.5
}
```
### Training Configuration
- **Objective:** Regression (minimize squared error)
- **Evaluation Metric:** R² Score, MAE, RMSE
- **Validation Strategy:** 70-30 train-test split
- **Early Stopping:** Not used (200 trees)
- **Feature Scaling:** StandardScaler applied to numeric features
- **Encoding:** Label encoding for categorical features
### Training Environment
- **Hardware:** CPU-based training
- **Training Time:** 25 seconds
- **Memory Usage:** <1 GB
- **Reproducibility:** Random seed = 42
---
## How to Use
### Installation
```bash
pip install xgboost==2.0.0 pandas numpy scikit-learn
```
### Loading the Model
```python
import pickle
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
# Load model files
with open('xgboost_nutrition_model_20251103.pkl', 'rb') as f:
model = pickle.load(f)
with open('xgboost_scaler_20251103.pkl', 'rb') as f:
scaler = pickle.load(f)
with open('xgboost_label_encoders_20251103.pkl', 'rb') as f:
label_encoders = pickle.load(f)
with open('xgboost_feature_names_20251103.pkl', 'rb') as f:
feature_names = pickle.load(f)
```
### Making Predictions
```python
# Example input data
input_data = {
'Energy_kcal_per_serving': 350,
'Protein_g_per_serving': 15,
'Fat_g_per_serving': 10,
'Carbohydrates_g_per_serving': 45,
'Fiber_g_per_serving': 5,
'Calcium_mg_per_serving': 200,
'Iron_mg_per_serving': 3,
'Zinc_mg_per_serving': 2,
'VitaminA_µg_per_serving': 500,
'VitaminC_mg_per_serving': 20,
'Potassium_mg_per_serving': 400,
'Magnesium_mg_per_serving': 50,
'region_encoded': 0, # Central Uganda
'condition_encoded': 0, # Hypertension
'age_group_encoded': 1, # 70-80
'season_encoded': 0,
'portion_size_g': 250,
'estimated_cost_ugx': 5000
}
# Convert to DataFrame
df = pd.DataFrame([input_data])
# Ensure correct feature order
df = df[feature_names]
# Scale features (if scaler expects it)
# Note: Check if your scaler was fit on all features or just numeric ones
# df_scaled = scaler.transform(df)
# Make prediction
predicted_calories = model.predict(df)
print(f"Predicted daily caloric needs: {predicted_calories[0]:.2f} kcal/day")
```
### Using with the API
```python
import requests
url = "http://your-api-endpoint/predict"
data = {
"data": {
"Energy_kcal_per_serving": 350,
"Protein_g_per_serving": 15,
# ... other features
}
}
response = requests.post(url, json=data)
result = response.json()
print(f"Predicted calories: {result['prediction']['caloric_needs']:.2f} kcal/day")
```
---
## Limitations and Biases
### Known Limitations
1. **Sample Size:**
- Only 1,000 training samples may not capture all population variability
- Recommend caution when making predictions for rare scenarios
2. **Geographic Scope:**
- Trained specifically on Ugandan population data
- May not generalize well to other African countries or regions
3. **Moderate Overfitting:**
- Train-test R² gap of 0.26 indicates some overfitting
- Predictions should be validated against clinical guidelines
4. **Feature Dependencies:**
- Requires accurate nutritional content data
- Missing or incorrect features will degrade performance
5. **Temporal Validity:**
- Trained on 2025 data
- May need retraining as dietary patterns evolve
### Potential Biases
1. **Regional Representation:**
- May have unequal representation across regions
- Ensure validation across all 4 regions
2. **Health Condition Bias:**
- Some conditions may be over/under-represented
- Validate for less common conditions
3. **Socioeconomic Factors:**
- Cost estimates may not reflect all economic situations
- Consider local affordability in deployment
### Uncertainty Quantification
- **Prediction Uncertainty:** ±2.84 kcal/day (MAE)
- **Confidence Intervals:** 95% CI ≈ ±5.7 kcal/day (2 × MAE)
- **Recommended Buffer:** Add 10% safety margin for meal planning
---
## Ethical Considerations
### Fairness and Equity
- Model covers all major regions of Uganda
- Includes diverse health conditions
- Considers affordability factors
- ⚠️ Ensure equal access to technology for model deployment
### Privacy
- Model trained on aggregated data (no personal identifiers)
- Predictions do not require storage of sensitive health information
- ⚠️ Implement proper data handling in deployment
### Safety
- ⚠️ **Critical:** Model outputs should be reviewed by qualified healthcare professionals
- ⚠️ Not suitable for emergency nutritional interventions
- ⚠️ Should complement, not replace, clinical judgment
### Transparency
- Open methodology and evaluation metrics
- Feature importance available for interpretation
- Model architecture and hyperparameters disclosed
---
## Model Interpretability
### Feature Importance (Top 10)
Based on XGBoost's built-in feature importance:
1. **Energy_kcal_per_serving** - Highest importance
2. **Protein_g_per_serving** - High importance
3. **Carbohydrates_g_per_serving** - High importance
4. **age_group_encoded** - Moderate importance
5. **condition_encoded** - Moderate importance
6. **portion_size_g** - Moderate importance
7. **Calcium_mg_per_serving** - Moderate importance
8. **Fat_g_per_serving** - Low-moderate importance
9. **region_encoded** - Low-moderate importance
10. **Fiber_g_per_serving** - Low importance
*Full feature importance analysis available in model artifacts*
### Explainability
- **SHAP Values:** Can be computed for individual predictions
- **Partial Dependence Plots:** Available for key features
- **Decision Rules:** XGBoost trees can be exported for inspection
---
## Comparison with Other Models
| Model | Test R² | Test MAE | Training Time | Rank |
|-------|---------|----------|---------------|------|
| **XGBoost (This Model)** | **0.6710** | **2.84** | 25.0s | 🥇 #1 |
| LightGBM | 0.6649 | 2.88 | 0.93s | 🥈 #2 |
| HistGradient Boosting | 0.5116 | 3.42 | 0.14s | 🥉 #3 |
| GNN v2 | 0.5100 | 3.42 | 5.2s | #4 |
| MLP | -0.3035 | 5.66 | 4.5s | #5 |
**Recommendation:** Use XGBoost for best accuracy; consider LightGBM for faster inference.
---
## Updates and Maintenance
### Version History
- **v1.0_optimized (2025-11-03):** Initial release
- Trained on 1,000 samples
- Hyperparameter optimization completed
- Test R² = 0.6710
### Planned Improvements
1. **Data Collection:**
- Expand dataset to 5,000+ samples
- Include more seasonal variations
- Add rural vs. urban distinctions
2. **Feature Engineering:**
- Add BMI calculations
- Include activity level metrics
- Incorporate cultural food preferences
3. **Model Enhancements:**
- Ensemble with LightGBM for improved accuracy
- Implement SHAP-based explainability
- Add prediction uncertainty intervals
4. **Validation:**
- Clinical validation studies
- Cross-regional performance assessment
- Temporal validation (seasonal changes)
### Retraining Schedule
- **Recommended:** Every 6-12 months
- **Triggers:** New data availability, significant dietary changes, performance degradation
---
## Citation
If you use this model in your research or application, please cite:
```bibtex
@misc{uganda_elderly_nutrition_xgboost_2025,
title={XGBoost Model for Elderly Nutrition Planning in Uganda},
author={[Your Name/Organization]},
year={2025},
month={November},
howpublished={Hugging Face Model Hub},
url={https://huggingface.co/[your-username]/xgboost-elderly-nutrition-uganda}
}
```
---
## Additional Resources
### Related Links
- **Project Repository:** [https://github.com/Shakiran-Nannyombi/Graph-Enhanced-LLMs-for-Locally-Sourced-Elderly-Nutrition-Planning-in-Uganda.git]
- **API Documentation:** [API Docs Link]
- **Research Paper:** [Paper Link if available]
- **Dataset:** [Shakiran/UgandanNutritionMealPlanning]
### Model Artifacts
- `xgboost_nutrition_model_20251103.pkl` - Main XGBoost model
- `xgboost_scaler_20251103.pkl` - Feature scaler (StandardScaler)
- `xgboost_label_encoders_20251103.pkl` - Categorical encoders
- `xgboost_feature_names_20251103.pkl` - Feature name list
- `xgboost_model_metadata_20251103.json` - Complete metadata
### Support
For questions, issues, or contributions:
- **Issues:** [https://github.com/Shakiran-Nannyombi/Graph-Enhanced-LLMs-for-Locally-Sourced-Elderly-Nutrition-Planning-in-Uganda.git]
- **Email:** [devkiran256@gmail.com]
-
---
## License
This model is released under the **Apache License 2.0**.
- Commercial use allowed
- Modification allowed
- Distribution allowed
- Patent use allowed
- ⚠️ Must include license and copyright notice
- ⚠️ Must state significant changes
**Disclaimer:** This model is provided "as is" without warranty. Users are responsible for validating the model's suitability for their specific use case and ensuring compliance with local healthcare regulations.
---
## Acknowledgments
### Data Sources and References
This model was developed using knowledge and data extracted from the following authoritative sources:
1. **Handbook_Eldernutr_FINAL.pdf**
- Comprehensive handbook on elderly nutrition
- Primary reference for nutritional requirements and guidelines
2. **WHO ICOPE Guidelines (icope.pdf)**
- World Health Organization Integrated Care for Older People (ICOPE)
- Framework for elderly healthcare and nutrition assessment
3. **Nutritional_Requirements_of_Older_People.pdf**
- Detailed nutritional requirements for elderly populations
- Evidence-based dietary recommendations
4. **TipSheet_21_HealthyEatingForOlderAdults.pdf**
- Practical tips for healthy eating in older adults
- Community-oriented nutrition guidance
5. **MSD Manual Professional Edition**
- "Drug Categories of Concern in Older Adults - Geriatrics"
- Clinical reference for medication-nutrition interactions
6. **MSD Manual Consumer Version**
- "Aging and Medications - Older People's Health Issues"
- Patient-friendly information on aging and health
7. **Uganda Nutrition Data (download.pdf)**
- Uganda-specific nutritional data and food composition
- Local context and dietary patterns
8. **Street Food Nutritional Analysis**
- "Average energy and nutrient contents of typical street food dishes in Uganda (Kampala)"
- Local food nutritional profiles for urban Uganda
### Institutional Support
- **Uganda Ministry of Health** - Nutrition guidelines and policy frameworks
- **World Health Organization (WHO)** - ICOPE framework and elderly care guidelines
- **MSD Manuals** - Clinical and consumer health information
### Technical Contributions
- **Open-source community:** XGBoost, scikit-learn, pandas, Python ecosystem
- **Healthcare professionals** who contributed domain expertise
- **Data scientists and researchers** in elderly nutrition and machine learning
### Regional Knowledge
- Local nutrition experts from Uganda's 4 major regions:
- Central Uganda (Buganda)
- Western Uganda (Ankole, Tooro, Kigezi, Bunyoro)
- Eastern Uganda (Busoga, Bugisu, Teso)
- Northern Uganda (Acholi, Lango, Karamoja, West Nile)
### Special Thanks
- Community health workers providing ground-level insights
- Elderly care facilities participating in data validation
- Nutrition researchers focusing on African elderly populations
- Open data initiatives promoting nutrition research in Uganda
---
**Last Updated:** November 4, 2025
**Model Version:** v1.0_optimized
**Status:** Production Ready