Spaces:

robiriu
/

geoforce

Sleeping

App Files Files Community

geoforce / initial /PLAN.md

Ubuntu

Initial: seed from ForceX-AI v2-real-transform planning docs

6f6fd44 about 1 month ago

preview code

raw

history blame contribute delete

18.3 kB

GeoForce v2.0 — Transformation to Real Engineering Tool

Status: Planning Start Date: TBD Target: A geothermal reservoir prediction model that can answer real engineering questions for Indonesian fields (Kamojang, Darajat, Wayang Windu, Lahendong, Salak).

Why v1.1 Cannot Handle Real Cases

GeoForce v1.1 is a validated proof-of-concept. It proves the pipeline works (data -> train -> validate -> serve -> platform). But it solves only one simplified scenario class:

Aspect	v1.1 (Current)	Reality
Physics	Single-phase liquid water, constant properties	Two-phase steam-water, IAPWS-IF97 steam tables, phase transitions
Grid	2D (32x32 = 1,024 cells)	3D (50x50x20 = 50,000+ cells minimum)
Permeability	One scalar per scenario	Spatially varying field with fracture networks
Training data	Our own analytical heat+Darcy solver	TOUGH2/TOUGH3 multiphase simulation outputs
Outputs	Normalized T and P	T, P, steam saturation, enthalpy, steam fraction
Inputs	6 parameters	10+ parameters + porosity fields, fault maps, well trajectories
Temporal	5 timesteps (4-year intervals)	Flexible timesteps, production schedule-dependent
Validation	Against own synthetic simulations	Against published field data + USGS Brady benchmark

Three Fatal Gaps

No phase physics. Kamojang and Darajat are vapor-dominated — the reservoir is steam, not liquid water. GeoForce v1.1 has no concept of boiling, condensation, or steam fraction. It literally cannot represent these fields.
2D only. Steam rises, liquid sinks. Gravity segregation drives reservoir behavior in all real geothermal systems. A 2D horizontal slice misses this entirely.
Training data is wrong. Our finite-difference solver approximates heat conduction + Darcy flow with constant fluid properties. Real reservoirs need multiphase flow with pressure-dependent density, viscosity, and enthalpy from steam tables.

The Transformation Plan

Phase 1: Real Data Foundation (Weeks 1-2)

Goal: Get real simulation data and prove we can learn from it.

1A. Download NREL Brady Hot Springs Dataset

The USGS/NREL published the Open Source Reservoir (OSR) — 101 CMG STARS simulation scenarios for a Brady-like geothermal field. This is the gold standard ML benchmark in geothermal.

Source: https://github.com/NREL/geothermal_osr
Also at: https://gdr.openei.org/submissions/1346
Contents: 101 scenarios, T/P/enthalpy at wells, 2020-2040 production forecasts
Format: Jupyter notebooks + CSV/HDF5
License: Open (DOE-funded)
Published benchmark: <3.68% temperature error, <4.75% pressure error

Tasks:

Clone NREL/geothermal_osr repository
Explore data format, understand scenario structure
Retrain current CNN architecture on Brady data (quick validation)
Compare our results against published USGS benchmark
Document gaps between Brady data structure and our training pipeline

1B. Download Utah FORGE Dataset

FORGE (Frontier Observatory for Research in Geothermal Energy) is the DOE's flagship geothermal research site in Milford, Utah. It has the most comprehensive open geothermal dataset in the world.

Source: https://gdr.openei.org/forge and https://utahforge.com/project-data-dashboard/
Contents: Well logs, T/P profiles, microseismicity, stress data, 3D geological model, stimulation data
Use for: Understanding real input data formats, building realistic input channel design

Tasks:

Download FORGE well log data (temperature, pressure, lithology)
Download 3D geological model
Map FORGE data structure to GeoForce input requirements
Identify which data types are available for Indonesian fields

1C. Collect Published Indonesian Field Parameters

Indonesian field data is not publicly available in raw form, but published papers contain enough parameters to build realistic simulation campaigns.

Sources:

Stanford Geothermal Workshop papers (https://pangea.stanford.edu/ERE/db/GeoConf/)
World Geothermal Congress proceedings
ITB thesis repository
Pertamina and Star Energy technical publications

Key fields to collect parameters for:

Field	Operator	Type	Published Temp	Published Pressure	Capacity
Kamojang	PGE	Vapor-dominated	~245 C	~35 bar	375 MW
Darajat	Star Energy	Vapor-dominated	~250 C	~30 bar	260 MW
Wayang Windu	Star Energy	Liquid + vapor caps	~270 C	Variable	227 MW
Salak	Star Energy	Liquid-dominated	~240 C	Variable	377 MW
Lahendong	PGE	High-T liquid	~300 C	Variable	120 MW
Ulubelu	PGE	Liquid-dominated	~280 C	Variable	220 MW
Karaha-Talaga Bodas	PGE	Mixed	~350 C	Variable	30 MW

Tasks:

Compile parameter table from published Stanford/WGC papers
Document permeability ranges for Indonesian volcanic systems
Document fracture characteristics (orientation, density, aperture)
Build scenarios.yaml v2 with realistic Indonesian ranges
Identify which fields have the most published data (prioritize for validation)

Phase 2: Real Simulator Setup (Weeks 2-4)

Goal: Generate proper multiphase simulation data for training.

2A. Install Geothermal Simulator

Two options (pursue both, use whichever works first):

Option A: TOUGH3 (industry standard)

Request from LBNL: https://tough.lbl.gov/software/tough3/
Free for research use
Requires license agreement (may take 1-2 weeks)
Install with PyTOUGH wrapper for Python integration
Use EOS1 module (pure water, two-phase) initially
Later upgrade to EWASG module (water + NaCl + NCG)

Option B: Waiwera (fully open-source)

Source: https://waiwera.github.io/
Fortran 2003, parallel, no license needed
Validated against TOUGH2 benchmark problems
May be easier to automate for batch campaigns
Less industry adoption than TOUGH2/3

Tasks:

Request TOUGH3 license from LBNL
Install Waiwera from source (backup option)
Install PyTOUGH (Python interface to TOUGH2/3)
Run TOUGH2 standard benchmark problems to verify installation
Set up batch simulation runner (Python script + LHS parameter sampling)

2B. Design Simulation Campaign

Build 1,000-5,000 simulation scenarios covering Indonesian geothermal parameter space.

3D Grid Design:

Initial:  32 x 32 x 10 = 10,240 cells (manageable on VPS CPU)
Target:   64 x 64 x 20 = 81,920 cells (requires TOUGH3 parallel)
Domain:   2 km x 2 km x 2 km (typical Indonesian field extent)

Parameter Space (LHS sampling):

Parameter	Min	Max	Unit	Source
Base temperature	200	350	C	Indonesian field range
Base pressure	20	120	bar	Hydrostatic at 500-3000m
Log10 permeability	-16	-12	log10(m2)	Volcanic rock range
Porosity	0.01	0.15	fraction	Andesite to fractured tuff
Depth	500	3000	m	Shallow to deep
Heat source depth	3000	6000	m	Magmatic heat source
Injection rate	0	100	kg/s	Per well
Production rate	10	150	kg/s	Per well
N production wells	1	8	count	Field size dependent
N injection wells	0	4	count	Reinjection strategy
Rock thermal conductivity	1.5	4.0	W/(m K)	Volcanic rock range
Initial steam saturation	0.0	1.0	fraction	Liquid to vapor-dominated

Output Variables (per cell, per timestep):

Temperature (C)
Pressure (Pa)
Steam saturation (0-1)
Specific enthalpy (kJ/kg)
Liquid density (kg/m3)

Timesteps: 10 snapshots over 30 years (years 1, 3, 5, 7, 10, 13, 16, 20, 25, 30)

Tasks:

Write simulation campaign generator script
Define grid with vertical layering (caprock, reservoir, basement)
Implement well placement randomization
Set up initial conditions (hydrostatic gradient, geothermal gradient)
Run pilot batch (10 scenarios) to estimate per-simulation time
Estimate total compute time for 1,000 scenarios
Run full campaign (may need to use cloud compute for speed)

Phase 3: Model Architecture Upgrade (Weeks 4-8)

Goal: Build a model that handles 3D, two-phase, spatially-varying inputs.

3A. Architecture Selection

v2.0 Target: 3D U-Net

Why U-Net over current flat CNN:

Skip connections preserve spatial detail at fine scales
Well-proven for image-to-image prediction tasks
Encoder-decoder structure naturally handles multi-scale features
Can handle 3D volumes (3D convolutions)
Moderate parameter count (500K-2M, still CPU-feasible)

Input: (batch, C_in, 32, 32, 10)   — 3D volume with multiple channels
       |
   [Encoder]
       Conv3D(C_in, 32) + BN + ReLU
       Conv3D(32, 32) + BN + ReLU
       MaxPool3D(2)
       Conv3D(32, 64) + BN + ReLU
       Conv3D(64, 64) + BN + ReLU
       MaxPool3D(2)
       |
   [Bottleneck]
       Conv3D(64, 128) + BN + ReLU
       Conv3D(128, 128) + BN + ReLU
       |
   [Decoder]
       Upsample3D(2) + Cat(skip)
       Conv3D(128+64, 64) + BN + ReLU
       Conv3D(64, 64) + BN + ReLU
       Upsample3D(2) + Cat(skip)
       Conv3D(64+32, 32) + BN + ReLU
       Conv3D(32, C_out, 1x1x1) + Sigmoid
       |
Output: (batch, C_out, 32, 32, 10)  — T, P, saturation, enthalpy at each timestep

v3.0 Target: Graph Neural Network (GNN)

For fracture-dominated Indonesian systems, following Gudala & Yan (2025):

Handles unstructured/Voronoi grids natively
Can represent discrete fracture networks (DFN)
R-squared > 0.95 for T, P, displacement in published results
Sequential Graph Sage (SeqSage) architecture

3B. Input Channel Design

Channel	Description	Shape	Normalization
0	Initial temperature field	(32,32,10)	(T-25)/325
1	Log10 permeability field	(32,32,10)	(logk+16)/4
2	Porosity field	(32,32,10)	(phi-0.01)/0.14
3	Well mask (production)	(32,32,10)	Gaussian decay
4	Well mask (injection)	(32,32,10)	Gaussian decay
5	Initial pressure	(32,32,10)	(P-Pmin)/(Pmax-Pmin)
6	Initial steam saturation	(32,32,10)	[0,1]
7	Rock type indicator	(32,32,10)	Categorical (caprock=0, reservoir=1, basement=2)
8	Depth (z-coordinate)	(32,32,10)	(z-zmin)/(zmax-zmin)
9	Production rate schedule	(32,32,10)	Normalized rate
10	Injection rate schedule	(32,32,10)	Normalized rate

Total: 11 input channels (vs 6 in v1.1)

3C. Output Channel Design

Channel Group	Variables	Shape	Denormalization
Temperature	T at 10 timesteps	(10, 32, 32, 10)	T * 350 C
Pressure	P at 10 timesteps	(10, 32, 32, 10)	P * 120 bar
Steam saturation	Sg at 10 timesteps	(10, 32, 32, 10)	[0, 1]
Enthalpy	h at 10 timesteps	(10, 32, 32, 10)	h * 3000 kJ/kg

Total: 40 output channels (vs 10 in v1.1)

3D. Physics-Informed Loss Upgrade

Current physics loss: soft temporal smoothness + Laplacian + Darcy coupling

v2.0 physics loss:

Mass conservation: d(phirho)/dt + div(rhov) = q_wells
Energy conservation: d(phirhoh)/dt + div(rhohv) = div(K*gradT) + q_wells
Saturation constraint: Sl + Sg = 1 (hard constraint via sigmoid)
Steam table consistency: h(T,P) must be physically consistent
Gravity term: P gradient should include rhogdz component

Tasks:

Implement 3D U-Net architecture in PyTorch
Design input preprocessing pipeline for TOUGH3 outputs
Design output denormalization pipeline
Implement upgraded physics loss functions
Benchmark training time on VPS CPU
Determine if GPU is needed (cloud GPU if so)

Phase 4: Validation and Benchmarking (Weeks 8-10)

Goal: Prove the model works on real geothermal physics.

4A. Validation Against Brady Hot Springs

The USGS published these benchmarks for Brady-like reservoir ML:

Metric	USGS Result	Our Target
Temperature error	<3.68%	<4.0%
Pressure error	<4.75%	<5.0%
Energy error	<4.04%	<5.0%
Inference time	0.9 s	<0.5 s

Tasks:

Train on Brady OSR data
Evaluate on held-out Brady scenarios
Compare results with published USGS numbers
Document where we match and where we don't

4B. Validation Against Indonesian Field Data

Use published monitoring data from Stanford/WGC papers:

Kamojang production decline curves (Darma et al., 2010)
Wayang Windu expansion history (Star Energy publications)
Darajat production data (Hadi et al., various years)

This is qualitative validation — does the model produce physically reasonable results for Indonesian-like parameters?

Tasks:

Build Indonesian validation scenarios from published parameters
Run predictions and compare to published production histories
Document results in validation report
Identify remaining gaps

4C. Updated Technical Report

Publish GeoForce v2.0 technical report covering:

TOUGH3-trained data (vs analytical in v1.1)
Two-phase results (vs single-phase)
3D results (vs 2D)
Brady benchmark comparison
Indonesian field qualitative validation
Honest limitations list

Phase 5: Platform Integration (Weeks 10-12)

Goal: Replace v1.1 on the platform with v2.0.

Tasks:

Update serve.py for new model architecture and I/O format
Update platform GeoForcePage.tsx for new output variables (add saturation, enthalpy views)
Add 3D visualization (three.js or deck.gl for 3D grid display)
Update HuggingFace model card and dataset
Write LinkedIn post announcing v2.0
Update landing page copy

Phase 6: Publish and Position (Weeks 12+)

Tasks:

Submit to Stanford Geothermal Workshop (annual, January deadline typically)
Submit to GRC (Geothermal Resources Council) annual meeting
Publish v2.0 model + Indonesian dataset on HuggingFace
Write technical blog post comparing v1.1 to v2.0 (honest, show the journey)
Reach out to ITB Geothermal Engineering faculty for review
Prepare data-sharing proposal for Pertamina Geothermal Energy

Available Public Data Sources

Ready to Use Now

Dataset	Source	URL
NREL Brady OSR	NREL/USGS	https://github.com/NREL/geothermal_osr
Brady Hot Springs ML Results	GDR	https://gdr.openei.org/submissions/1346
Utah FORGE	DOE	https://gdr.openei.org/forge
GeoThermalCloud Data	LANL	https://github.com/SmartTensors/GeoThermalCloud.jl
GDR Repository	DOE	https://gdr.openei.org/
Nevada GEOTHERM	NBMG	https://nbmg.unr.edu/geothermal/Data.html

Simulators

Simulator	License	URL
TOUGH3	Free for research	https://tough.lbl.gov/software/tough3/
Waiwera	Open-source	https://waiwera.github.io/
PyTOUGH	Open-source	https://github.com/acroucher/PyTOUGH
FEHM	Open-source	https://fehm.lanl.gov/
GEOPHIRES-X	Open-source	https://github.com/NREL/GEOPHIRES-X

Key Papers

Paper	Authors	Year	Relevance
Modeling Subsurface Performance with ML	Beckers et al. (NREL/USGS)	2022	Brady benchmark, our primary comparison target
Prediction Modeling for Geothermal Reservoirs Using DL	Gudmundsdottir & Horne (Stanford)	2020	LSTM vs feedforward for well-level prediction
GNN for Multi-Physics Geothermal with DFN	Gudala & Yan	2025	GNN architecture for fractured reservoirs (v3.0 target)
LSTM-CNN Surrogate for Well Placement	Various	2025	99.8% accuracy on homogeneous, 94% on heterogeneous
Physics-Guided DL for Geothermal Energy	Various	2023	Physics loss design for geothermal
ML/DL in Geothermal Review	Al-Fakih et al.	2024	Comprehensive review of all approaches

Compute Requirements Estimate

Task	Compute	Time Estimate
TOUGH3 simulation (1 scenario)	VPS CPU	~30-60 min
1,000 scenarios campaign	VPS CPU (2 cores)	~20-40 days
1,000 scenarios campaign	Cloud (16 cores)	~3-5 days
U-Net training (1,000 scenarios)	VPS CPU	~4-8 hours
U-Net training (5,000 scenarios)	Cloud GPU (A10)	~2-4 hours
Validation + reporting	VPS CPU	~1 day

Budget note: The simulation campaign is the bottleneck. Running 1,000 TOUGH3 simulations on VPS will take weeks. Consider:

Cloud burst (AWS/GCP spot instances, ~$50-100 for the campaign)
Reducing to 500 scenarios initially
Using Waiwera with MPI parallelism if VPS has multiple cores

Risk Register

Risk	Impact	Mitigation
TOUGH3 license takes too long	Delays Phase 2 by weeks	Start with Waiwera (no license needed)
VPS too slow for 1,000 simulations	Campaign takes >1 month	Use cloud spot instances for batch
3D U-Net too large for CPU training	Training takes days	Start with 32x32x8 grid, upgrade later
Indonesian field data not detailed enough	Can't validate against real fields	Use Brady benchmark as primary validation + qualitative Indonesian comparison
Two-phase physics makes the problem much harder	Model accuracy drops below acceptable	Start with liquid-dominated fields (Salak, Ulubelu) before attempting vapor-dominated (Kamojang)

Success Criteria for v2.0

Metric	Target	Why
Temperature RMSE	<5 C	Matches USGS Brady benchmark
Pressure RMSE	<0.5 MPa	Below typical wellbore measurement uncertainty
Steam saturation RMSE	<0.1	Useful for production forecasting
R-squared (all variables)	>0.95	Industry-acceptable prediction quality
Inference time	<1 second	Real-time exploration workflow
Physics violation rate	<1%	Mass and energy conservation
Can predict two-phase	Yes	Required for Indonesian fields
3D	Yes	Required for gravity effects
Validated against Brady	Yes	Published benchmark comparison