aiBatteryLifeCycle / CHANGELOG.md
NeerajCodz's picture
feat: full project β€” ML simulation, dashboard UI, models on HF Hub
f381be8

Changelog

All notable changes to this project will be documented in this file. Format follows Keep a Changelog and Semantic Versioning.


[2.0.0] β€” 2026-02-25 β€” Current Release

Major Features

  • Intra-battery chronological split methodology β€” Fixes critical data leakage in v1's cross-battery split. Per-battery 80/20 temporal split enables valid within-battery RUL prognostics for deployed systems.
  • 99.3% SOH accuracy achieved β€” Weighted ensemble of ExtraTrees, SVR, and GradientBoosting achieves RΒ²=0.975, MAE=0.84%, exceeding 95% accuracy gate.
  • Artifact versioning system β€” Isolated v1 and v2 models, scalers, results, and figures in artifacts/v1/ and artifacts/v2/ with version-aware loading.
  • API versioning β€” /api/v1/* (legacy, cross-battery) and /api/v2/* (current, intra-battery) endpoints run in parallel for backward compatibility.
  • Comprehensive IEEE-style research documentation β€” Full research paper (8 sections, 290+ lines) with methodology, results, ablation studies, and deployment architecture.
  • Production-ready deployment β€” Single Docker container on Hugging Face Spaces with health checks, model registry, and versioned endpoints.

What's New (v1 β†’ v2)

  • Fixed avg_temp corruption bug: v1 API silently modified input temperature when near ambientβ€”removed in v2
  • Fixed recommendation engine: v1 returned 0 cycles for all recommendationsβ€”v2 uses physics-based degradation rates
  • 5 classical ML models exceeding 95% accuracy: ExtraTrees (99.3%), SVR (99.3%), GradientBoosting (98.5%), RandomForest (96.7%), LightGBM (96.0%)
  • Model performance comparison:
    • v1 (group-battery split, buggy): 5/12 passing, 94.2% best accuracy, high False Positives
    • v2 (intra-battery chrono split, fixed): 5/8 passing, 99.3% best accuracy, 0% False Positives

Technical Improvements

  • ExtraTrees & GradientBoosting added β€” Identified through Optuna HPO as top performers on chronological split
  • SHAP feature importance β€” cycle_number and delta_capacity dominate; electrical impedance (Rct) secondary
  • Ensemble voting strategy β€” Weighted combination (ExtraTrees 0.40, SVR 0.30, GB 0.20) balances precision and inference speed
  • Deep learning analysis β€” 10 architectures (LSTM, Transformer, TFT, VAE-LSTM) tested; underperform by 10-20% due to 2.7K sample insufficiency; classical ML preferred
  • Per-battery accuracy analysis β€” Uniform >95% accuracy across all 30 batteries; no dataset bias detected
  • Feature scaling strategy β€” Tree models use raw features; linear/kernel models use StandardScaler (fit on train only)

Infrastructure & Deployment

  • Docker container: Single aibattery:v2 image deployable to any Kubernetes/cloud platform
  • Versioned artifact management: Enables rigorous A/B testing and rollback capability
  • Reproducibility guardrails: Fixed random_state=42, locked requirements.txt, frozen Docker base image
  • Monitoring endpoints: /health, /api/v2/models, /docs (Swagger) for ops visibility

Code Quality & Documentation

  • 13 dead Python scripts removed from root directory (development artifacts)
  • Research paper embedded in frontend β€” Markdown rendering with MathJax for equations
  • Technical research notes β€” 11 sections covering architecture, data pipeline, bug fixes, ensemble strategy
  • Jupyter notebook (NB03) β€” 14 cells, fully executed, covers data loading β†’ model training β†’ evaluation β†’ visualization

Known Limitations

  • XGBoost underperformance β€” Despite Optuna HPO (100 trials), achieves only 90% within-5%; fundamentally incompatible with intra-battery split geometryβ€”ensemble preferred
  • Deep learning sample insufficiency β€” 2,678 cycles / 30 batteries β‰ˆ 89 per battery; insufficient for stable LSTM/Transformer learning
  • Linear models hard limit β€” Ridge/Lasso capped at 32-33% despite hyperparameter tuning; linear decision boundaries incompatible with nonlinear degradation dynamics

Breaking Changes

  • βœ… API users: Recommend upgrading to /api/v2/* endpoints; v1 frozen for backward compatibility but uses deprecated models
  • βœ… Model files: Direct joblib loading requires version-aware path selection (artifacts/v2/models/classical/)
  • βœ… Frontend: Version toggle appears in header; defaults to v2

[1.0.0] β€” 2025-Q1 β€” Archival (Cross-Battery Split)

Description

First release implementing 12 classical ML + 10 deep learning models on NASA PCoE dataset using cross-battery splits (entire batteries β†’ train or test). Known to have data leakage and unreliable accuracy estimates.

Issues (Fixed in v2)

  • ❌ avg_temp corruption: Random +8Β°C offset corrupted predictions
  • ❌ Cross-battery leakage: Same battery ID in train & test with different cycle ranges
  • ❌ Recommendation always returned 0: Used default features for baseline
  • ❌ Inflated accuracy: 94.2% due to leakage; only 5/12 models passing

Legacy Support

  • Endpoints remain at /api/v1/* for backward compatibility
  • Models frozen; no further updates planned
  • Frontend allows v1 selection via version toggle

Added

  • Model versioning β€” MODEL_CATALOG in model_registry.py assigns every model a semantic version (v1.x classical, v2.x deep, v3.x ensemble)
  • BestEnsemble (v3.0.0) β€” weighted average of RF + XGB + LGB; auto-registered when all three components load; exposed via POST /api/predict/ensemble
  • GET /api/models/versions β€” new endpoint grouping models by generation
  • model_name request field β€” callers can select any registered model per request
  • model_version response field β€” every prediction response carries its version string
  • src/utils/logger.py β€” structured logging with ANSI-coloured console output and JSON-per-line rotating file handler (artifacts/logs/battery_lifecycle.log, 10 MB Γ— 5)
  • docker-compose.yml β€” production single-container + dev backend-only profiles
  • LOG_LEVEL env var β€” runtime logging verbosity control
  • Frontend model selector dropdown with version badge and RΒ² display

Changed

  • api/main.py β€” switched to get_logger; bumped __version__ to "2.0.0"
  • api/model_registry.py β€” complete rewrite: fixed classical model loading (no _soh suffix), deep model architecture reconstruction + state_dict loading, ensemble dispatch
  • src/utils/plotting.py β€” save_fig() now saves PNG only (removed PDF)
  • api/schemas.py β€” PredictRequest + model_name; PredictResponse + model_version; ModelInfo + version / display_name / algorithm / r2 / loaded / load_error
  • frontend/src/api.ts β€” added ModelInfo, ModelVersionGroups types; new functions predictEnsemble(), fetchModelVersions()
  • frontend/src/components/PredictionForm.tsx β€” model selector with family badge and version badge; shows RΒ² in dropdown; displays model_version in result card
  • Docs updated: docs/models.md, docs/api.md, docs/deployment.md, README.md

Removed

  • 33 PDF figures from artifacts/figures/ (PNG is the sole output format)

Fixed

  • _choose_default() was looking for random_forest_soh (wrong suffix) β€” now uses bare model names
  • Deep models were never loaded (stubs only) β€” now reconstructs architecture from known params and loads state_dict via torch.load(weights_only=True)

[0.1.0] β€” 2026-02-23

Added

  • Complete project scaffold: src/, api/, frontend/, notebooks/, docs/
  • 22 Python source modules covering data loading, feature engineering, preprocessing, metrics, recommendations, plotting
  • 20+ model architectures: Ridge, Lasso, ElasticNet, KNN, SVR, RandomForest, XGBoost, LightGBM, LSTM (4 variants), BatteryGPT, TFT, iTransformer (3 variants), VAE-LSTM, Stacking Ensemble, Weighted Average Ensemble
  • 9 Jupyter notebooks (01_eda through 09_evaluation)
  • FastAPI backend with Gradio interface
  • Vite + React + Three.js frontend with 3D battery pack visualisation
  • Dockerfile for Hugging Face Spaces deployment
  • Full documentation suite (docs/)

Status

  • Code written but not yet executed
  • No trained models or experimental results yet