Buckets:

ktongue
/

DEM_MCM

Files

xet

ktongue/DEM_MCM / markov_gpu_article.html

ktongue

about 2 months ago

download

raw

16.8 kB

	<!DOCTYPE html>
	<html>
	<head>
	<meta charset="UTF-8">
	<title>Modélisation Markovienne GPU - Formule Article</title>
	<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
	<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
	<style>
	body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; max-width: 950px; margin: 0 auto; padding: 20px; line-height: 1.7; }
	h1 { border-bottom: 3px solid #2980b9; padding-bottom: 10px; color: #2c3e50; }
	h2 { color: #34495e; border-bottom: 1px solid #bdc3c7; padding-bottom: 5px; margin-top: 35px; }
	h3 { color: #8e44ad; margin-top: 25px; }
	code { background: #ecf0f1; padding: 2px 6px; border-radius: 3px; font-family: 'Consolas', monospace; }
	pre { background: #2c3e50; color: #ecf0f1; padding: 20px; border-radius: 8px; overflow-x: auto; font-size: 0.9em; }
	pre code { background: none; padding: 0; color: inherit; }
	.formula { background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); color: white; padding: 18px; border-radius: 10px; margin: 20px 0; text-align: center; font-size: 1.1em; }
	.formula-inline { background: #f8f9fa; padding: 2px 8px; border-radius: 4px; }
	.note { background: #fff3cd; padding: 15px; border-radius: 8px; border-left: 5px solid #f39c12; margin: 20px 0; }
	.warning { background: #fee; padding: 15px; border-radius: 8px; border-left: 5px solid #e74c3c; margin: 20px 0; }
	.success { background: #d4edda; padding: 15px; border-radius: 8px; border-left: 5px solid #28a745; margin: 20px 0; }
	table { border-collapse: collapse; width: 100%; margin: 20px 0; box-shadow: 0 2px 15px rgba(0,0,0,0.1); }
	th, td { border: 1px solid #ddd; padding: 12px 15px; text-align: left; }
	th { background: #3498db; color: white; }
	tr:nth-child(even) { background: #f9f9f9; }
	.param { background: #e8f4f8; padding: 3px 10px; border-radius: 4px; font-weight: bold; color: #2980b9; }
	.gpu { background: #fdc; padding: 2px 8px; border-radius: 4px; font-weight: bold; color: #e74c3c; }
	.math-block { background: #fafafa; border: 1px solid #eee; padding: 15px; border-radius: 5px; margin: 15px 0; }
	.optimization { background: #e8f6f3; border: 2px solid #1abc9c; padding: 15px; border-radius: 8px; margin: 20px 0; }
	.comparison { display: flex; gap: 20px; }
	.comparison > div { flex: 1; padding: 15px; border-radius: 8px; }
	.comparison .old { background: #ffebee; border: 1px solid #ef9a9a; }
	.comparison .new { background: #e8f5e9; border: 1px solid #a5d6a7; }
	hr { border: none; border-top: 2px solid #eee; margin: 30px 0; }
	.toc { background: #f8f9fa; padding: 20px; border-radius: 8px; margin: 20px 0; }
	.toc ul { list-style-type: none; padding-left: 20px; }
	.toc li { margin: 8px 0; }
	.toc a { color: #2980b9; text-decoration: none; }
	.toc a:hover { text-decoration: underline; }
	</style>
	</head>
	<body>

	<h1>Modélisation Markovienne GPU - Formule de l'Article</h1>

	<p>Cette page documente l'implémentation <strong>GPU optimisée</strong> et <strong>vectorisée</strong> de la formule de matrice de transition de Markov présentée dans l'article.</p>

	<div class="toc">
	<strong>Table des matières</strong>
	<ul>
	<li><a href="#formule">1. Formule de l'Article</a></li>
	<li><a href="#differences">2. Différence avec l'Approche Classique</a></li>
	<li><a href="#optimisations">3. Optimisations GPU</a></li>
	<li><a href="#code">4. Code Complet</a></li>
	<li><a href="#explications">5. Explications Détaillées</a></li>
	<li><a href="#parametres">6. Guide des Paramètres</a></li>
	<li><a href="#formules">7. Résumé des Formules</a></li>
	</ul>
	</div>

	<hr>

	<h2 id="formule">1. Formule de l'Article</h2>

	<p>La matrice de transition de Markov est calculée selon:</p>

	<div class="formula">
	\[
	P_{ij} = \frac{1}{N_{LT}} \sum_{n=1}^{N_{LT}} \frac{T_{ij}(n)}{\phi(i, t_{n-1})}
	\]
	</div>

	<h3>Variables</h3>

	<table>
	<tr><th>Variable</th><th>Description</th></tr>
	<tr><td class="param">\(N_{LT}\)</td><td>Nombre de pas de temps pour l'apprentissage (Learning Time)</td></tr>
	<tr><td>\(T_{ij}(n)\)</td><td>Nombre de transitions observées de \(i \to j\) au pas de temps \(n\)</td></tr>
	<tr><td>\(\phi(i, t_{n-1})\)</td><td>Nombre de particules dans l'état \(i\) au temps \(t_{n-1}\)</td></tr>
	<tr><td>\(P_{ij}\)</td><td>Probabilité de transition de l'état \(i\) vers l'état \(j\)</td></tr>
	</table>

	<h3>Décomposition</h3>

	<p>À chaque pas de temps \(n\), on calcule d'abord une <strong>matrice instantanée</strong>:</p>

	<div class="formula">
	\[
	P^{(n)}_{ij} = \frac{T_{ij}(n)}{\phi(i, t_{n-1})}
	\]
	</div>

	<p>Cette matrice est <strong>stochastique par ligne</strong>:</p>

	<div class="formula">
	\[
	\sum_{j} P^{(n)}_{ij} = 1 \quad \text{pour tout } i
	\]
	</div>

	<p>La matrice finale est la <strong>moyenne arithmétique</strong> de ces matrices:</p>

	<div class="formula">
	\[
	P_{ij} = \frac{1}{N_{LT}} \sum_{n=1}^{N_{LT}} P^{(n)}_{ij}
	\]
	</div>

	<hr>

	<h2 id="differences">2. Différence avec l'Approche Classique</h2>

	<div class="comparison">
	<div class="old">
	<h3>Approche Classique (Somme globale)</h3>
	\[
	P_{ij} = \frac{\sum_n T_{ij}(n)}{\sum_n \phi(i, t_{n-1})}
	\]
	<p>1. Compter TOUTES les transitions</p>
	<p>2. Diviser par le total général</p>
	</div>
	<div class="new">
	<h3>Approche Article (Moyenne de matrices)</h3>
	\[
	P_{ij} = \frac{1}{N_{LT}} \sum_{n} \frac{T_{ij}(n)}{\phi(i, t_{n-1})}
	\]
	<p>1. Normaliser PAR timestep</p>
	<p>2. Moyenne des matrices</p>
	</div>
	</div>

	<div class="note">
	<strong>Pourquoi cette différence est importante?</strong><br>
	L'approche article donne le même poids à chaque timestep, même si le nombre de particules par état varie. L'approche classique donne plus de poids aux timesteps avec plus de particules.
	</div>

	<hr>

	<h2 id="optimisations">3. Optimisations GPU</h2>

	<div class="optimization">
	<h3>⚡ Optimisations Implémentées</h3>
	</div>

	<h3>3.1 torch.bincount - Comptage Vectorisé O(n)</h3>

	<p>Au lieu d'une boucle Python:</p>

	<pre><code># ❌ LENT: boucle Python
	for state in range(n_states):
	count = (states == state).sum()

	# ✅ RAPIDE: bincount GPU
	phi = torch.bincount(states, minlength=n_states)</code></pre>

	<div class="math-block">
	<b>Complexité:</b><br>
	- Boucle Python: \(O(n \times n_{states})\)<br>
	- torch.bincount: \(O(n)\) - optimisé CUDA
	</div>

	<h3>3.2 torch.scatter_add_ - Accumulation Vectorisée</h3>

	<p>Pour construire la matrice de transitions:</p>

	<pre><code># Pour chaque particule: T[s_prev[i], s_curr[i]] += 1

	# ❌ LENT: boucle Python
	for i in range(n):
	T[s_prev[i], s_curr[i]] += 1

	# ✅ RAPIDE: scatter_add GPU
	indices = s_prev * n_states + s_curr # Index linéaire
	T.scatter_add_(0, indices, torch.ones(n))</code></pre>

	<h3>3.3 Division Vectorisée</h3>

	<pre><code># ❌ LENT: boucle Python
	for i in range(n_states):
	if phi[i] > 0:
	P[i] = T[i] / phi[i]

	# ✅ RAPIDE: vectorisé avec where
	phi_expanded = phi.unsqueeze(1).expand(n_states, n_states)
	P = torch.where(phi_expanded > 0, T / phi_expanded, torch.zeros_like(T))</code></pre>

	<h3>3.4 Schéma Global</h3>

	<pre><code>+------------------+ +------------------+
	\| Lecture CSV \| --> \| NumPy arrays \|
	+------------------+ +------------------+
	\| CPU -> GPU
	v
	+------------------+ +------------------+
	\| torch.bincount \| --> \| phi(i, t-1) \|
	+------------------+ +------------------+
	\|
	+---------------------+ \|
	\| scatter_add (T) \|----+
	+---------------------+ \|
	v
	+---------------------+ +------------------+
	\| Division vectorisée\|--->\| P_n (timestep) \|
	+---------------------+ +------------------+
	\|
	v
	+---------------------+ +------------------+
	\| Accumulation \|<---\| P = Σ P_n / N \|
	+---------------------+ +------------------+</code></pre>

	<hr>

	<h2 id="code">4. Code Complet</h2>

	<pre><code>"""
	Modélisation Markovienne GPU - Formule de l'Article
	Implémentation vectorisée: P_ij = (1/N_LT) * Σ_n [ T_ij(n) / phi(i, t_{n-1}) ]
	"""

	import polars as pl
	from huggingface_hub import HfFileSystem
	import torch
	import numpy as np
	from tqdm import tqdm

	# ===============================================================================
	# PARAMÈTRES
	# ===============================================================================
	N_LT = 100 # Nombre de pas de temps pour l'apprentissage
	nx, ny, nz = 5, 5, 5 # Discrétisation spatiale
	DEVICE = "cuda" if torch.cuda.is_available() else "cpu"
	# ===============================================================================

	fs = HfFileSystem()
	folder_path = "hf://buckets/ktongue/DEM_MCM/Output Paraview"
	files = sorted(fs.glob(f"{folder_path}/*.csv"))

	print(f"📁 {len(files)} fichiers \| 🖥️ {DEVICE} \| N_LT={N_LT}")

	# 1. Calcul des limites spatiales
	sample = files[::50]
	x_vals, y_vals, z_vals = [], [], []
	for f in sample:
	with fs.open(f, 'rb') as file:
	df = pl.read_csv(file)
	x_vals.extend(df['coordinates:0'].to_list())
	y_vals.extend(df['coordinates:1'].to_list())
	z_vals.extend(df['coordinates:2'].to_list())

	xmin, xmax = min(x_vals) - 0.001, max(x_vals) + 0.001
	ymin, ymax = min(y_vals) - 0.001, max(y_vals) + 0.001
	zmin, zmax = min(z_vals) - 0.001, max(z_vals) + 0.001

	n_states = nx * ny * nz
	dx = (xmax - xmin) / nx
	dy = (ymax - ymin) / ny
	dz = (zmax - zmin) / nz

	# ===============================================================================
	# FONCTIONS GPU VECTORISÉES
	# ===============================================================================

	def states_to_indices(x, y, z):
	"""Convertit (x,y,z) -> indices d'état sur GPU."""
	ix = ((np.array(x) - xmin) / dx).astype(np.int64)
	iy = ((np.array(y) - ymin) / dy).astype(np.int64)
	iz = ((np.array(z) - zmin) / dz).astype(np.int64)
	ix = np.clip(ix, 0, nx - 1)
	iy = np.clip(iy, 0, ny - 1)
	iz = np.clip(iz, 0, nz - 1)
	return ix + iy * nx + iz * nx * ny

	def compute_P_matrix_gpu(states_prev, states_curr, n_states):
	"""
	Calcule P_n = T_ij(n) / phi(i, t_{n-1}) - TOUT GPU.

	Optimisé avec:
	- torch.bincount: comptage O(n)
	- torch.scatter_add_: accumulation vectorisée
	- Division vectorisée avec torch.where
	"""
	# Asegurar GPU
	s_prev = states_prev.to(DEVICE)
	s_curr = states_curr.to(DEVICE)

	# phi(i, t_{n-1}) = nombre de particules par état source
	phi = torch.bincount(s_prev, minlength=n_states).float()

	# Construire T_ij(n) avec scatter_add
	n = min(len(s_prev), len(s_curr))
	indices = s_prev[:n] * n_states + s_curr[:n]
	counts = torch.ones(n, device=DEVICE, dtype=torch.float64)

	T = torch.zeros(n_states * n_states, device=DEVICE, dtype=torch.float64)
	T.scatter_add_(0, indices, counts)
	T = T.view(n_states, n_states)

	# Normalisation vectorisée
	phi_expanded = phi.unsqueeze(1).expand(n_states, n_states)
	P_n = torch.where(
	phi_expanded > 0,
	T / phi_expanded,
	torch.zeros_like(T)
	)

	return P_n

	# ===============================================================================
	# BOUCLE PRINCIPALE
	# ===============================================================================
	P_accumulator = torch.zeros((n_states, n_states), dtype=torch.float64, device=DEVICE)
	files_to_process = files[:N_LT + 1]

	for i in tqdm(range(1, len(files_to_process)), desc="Learning"):
	# Lecture
	with fs.open(files_to_process[i-1], 'rb') as f:
	df_prev = pl.read_csv(f)
	with fs.open(files_to_process[i], 'rb') as f:
	df_curr = pl.read_csv(f)

	# Conversion vers indices (CPU -> GPU)
	states_prev_np = states_to_indices(
	df_prev['coordinates:0'], df_prev['coordinates:1'], df_prev['coordinates:2']
	)
	states_curr_np = states_to_indices(
	df_curr['coordinates:0'], df_curr['coordinates:1'], df_curr['coordinates:2']
	)

	states_prev = torch.from_numpy(states_prev_np).to(DEVICE)
	states_curr = torch.from_numpy(states_curr_np).to(DEVICE)

	# Calcul P_n et accumulation
	P_n = compute_P_matrix_gpu(states_prev, states_curr, n_states)
	P_accumulator += P_n

	# Moyenne finale
	P = P_accumulator / N_LT

	# Sauvegarde
	np.save(f'/kaggle/working/transition_matrix_NLT_{N_LT}.npy', P.cpu().numpy())
	print("✅ Terminé!")</code></pre>

	<hr>

	<h2 id="explications">5. Explications Détaillées</h2>

	<h3>5.1 Fonction `states_to_indices()`</h3>

	<p>Convertit les coordonnées spatiales continues en indices d'état discrets.</p>

	<div class="formula">
	\[
	state = i_x + i_y \cdot n_x + i_z \cdot n_x \cdot n_y
	\]
	</div>

	<p>où:</p>
	<ul>
	<li>\(i_x = \text{clamp}\left(\left\lfloor \frac{x - x_{min}}{\Delta x} \right\rfloor, 0, n_x-1\right)\)</li>
	<li>Même chose pour \(i_y\), \(i_z\)</li>
	</ul>

	<h3>5.2 Fonction `compute_P_matrix_gpu()`</h3>

	<h4>Étape A: Comptage avec bincount</h4>

	<pre><code>phi = torch.bincount(s_prev, minlength=n_states).float()</code></pre>

	<div class="formula">
	\[
	\phi[i] = \sum_{p=1}^{N} \mathbb{1}_{s_p(t_{n-1}) = i}
	\]
	</div>

	<h4>Étape B: Construction de T avec scatter_add</h4>

	<pre><code>indices = s_prev * n_states + s_curr # Index linéaire
	T.scatter_add_(0, indices, counts)</code></pre>

	<div class="formula">
	\[
	T[i, j] = \sum_{p=1}^{N} \mathbb{1}_{s_p(t_{n-1}) = i} \cdot \mathbb{1}_{s_p(t_n) = j}
	\]
	</div>

	<h4>Étape C: Normalisation</h4>

	<pre><code>P_n = T / phi_expanded</code></pre>

	<div class="formula">
	\[
	P^{(n)}_{ij} = \frac{T_{ij}}{\phi[i]}
	\]
	</div>

	<h3>5.3 Pourquoi cette implémentation est rapide</h3>

	<table>
	<tr><th>Opération</th><th>Complexité</th><th>Type</th></tr>
	<tr><td>bincount</td><td>\(O(N)\)</td><td class="gpu">GPU parallélisé</td></tr>
	<tr><td>scatter_add</td><td>\(O(N)\)</td><td class="gpu">GPU parallélisé</td></tr>
	<tr><td>Division</td><td>\(O(n_{states}^2)\)</td><td class="gpu">GPU vectorisé</td></tr>
	<tr><td>Boucle Python</td><td>\(O(N \times n_{states})\)</td><td>CPU séquentiel</td></tr>
	</table>

	<hr>

	<h2 id="parametres">6. Guide des Paramètres</h2>

	<h3>6.1 Paramètres Principaux</h3>

	<table>
	<tr><th>Paramètre</th><th>Défaut</th><th>Description</th><th>Influence</th></tr>
	<tr><td class="param">N_LT</td><td>100</td><td>Nombre de timesteps pour l'apprentissage</td>
	<td>
	- <strong>Petit</strong>: Capture les dynamiques rapides, plus bruité<br>
	- <strong>Grand</strong>: Estimation robuste, lisse les fluctuations
	</td>
	</tr>
	<tr><td class="param">nx, ny, nz</td><td>5, 5, 5</td><td>Discrétisation spatiale</td>
	<td>
	- <strong>Petit</strong>: Moins d'états, moins de données par état<br>
	- <strong>Grand</strong>: Plus fin, plus de données nécessaires
	</td></tr>
	</table>

	<h3>6.2 Trade-offs</h3>

	<div class="warning">
	<strong>N_LT trop petit:</strong> Grande variance, matrice bruitée, peut ne pas capturer la vraie dynamique.
	</div>

	<div class="warning">
	<strong>N_LT trop grand:</strong> Moyenne sur des régimes différents si le système n'est pas stationnaire.
	</div>

	<div class="success">
	<strong>Recommandation:</strong> Commencer après le régime transitoire. Utiliser N_LT = 100-500 selon la longueur de la simulation.
	</div>

	<hr>

	<h2 id="formules">7. Résumé des Formules</h2>

	<table>
	<tr><th>Concept</th><th>Formule</th><th>Implémentation</th></tr>
	<tr><td>Matrice instantanée</td><td>\(P^{(n)}_{ij} = \frac{T_{ij}(n)}{\phi(i, t_{n-1})}\)</td><td><code>P_n = T / phi</code></td></tr>
	<tr><td>Matrice finale</td><td>\(P_{ij} = \frac{1}{N_{LT}} \sum_n P^{(n)}_{ij}\)</td><td><code>P = P_accum / N_LT</code></td></tr>
	<tr><td>Index d'état</td><td>\(state = i_x + i_y n_x + i_z n_x n_y\)</td><td><code>ix + iynx + iznx*ny</code></td></tr>
	<tr><td>Coordonnées vers indice</td><td>\(i_x = \left\lfloor \frac{x - x_{min}}{\Delta x} \right\rfloor\)</td><td><code>(x - xmin) / dx</code></td></tr>
	<tr><td>Comptage état</td><td>\(\phi[i] = \sum_p \mathbb{1}_{s_p = i}\)</td><td><code>torch.bincount(states)</code></td></tr>
	<tr><td>Transitions</td><td>\(T_{ij} = \sum_p \mathbb{1}_{s_{p}^{t-1}=i} \cdot \mathbb{1}_{s_{p}^{t}=j}\)</td><td><code>scatter_add(indices)</code></td></tr>
	</table>

	<hr>

	<h2>8. Performance</h2>

	<p>Cette implémentation GPU est typiquement <strong>10-50x plus rapide</strong> que l'équivalent CPU avec boucles Python.</p>

	<div class="success">
	<strong>Vitesse estimée:</strong><br>
	- 100 timesteps × 1000 particules × 125 états<br>
	- Temps: ~30-60 secondes sur GPU<br>
	- Temps équivalent CPU: ~10-30 minutes
	</div>

	</body>
	</html>

Xet Storage Details

Size:: 16.8 kB
Xet hash:: cd8689eb8573d3ec9a1184ee3ee61f034c1cb4f383e9b95d7b2abf306fece273

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.