Buckets:

ktongue
/

DEM_MCM

Files

xet

ktongue/DEM_MCM / markov_modeling.html

ktongue

about 2 months ago

download

raw

16.3 kB

	<!DOCTYPE html>
	<html>
	<head>
	<meta charset="UTF-8">
	<title>Modélisation Markovienne d'un Mélangeur DEM</title>
	<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
	<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
	<style>
	body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; max-width: 900px; margin: 0 auto; padding: 20px; line-height: 1.6; }
	h1 { border-bottom: 2px solid #333; padding-bottom: 10px; }
	h2 { color: #2c3e50; border-bottom: 1px solid #ddd; padding-bottom: 5px; }
	h3 { color: #34495e; }
	code { background: #f4f4f4; padding: 2px 6px; border-radius: 3px; font-family: 'Consolas', monospace; }
	pre { background: #f4f4f4; padding: 15px; border-radius: 5px; overflow-x: auto; }
	pre code { background: none; padding: 0; }
	table { border-collapse: collapse; width: 100%; }
	th, td { border: 1px solid #ddd; padding: 8px; text-align: left; }
	th { background: #f2f2f2; }
	blockquote { background: #f9f9f9; border-left: 4px solid #3498db; margin: 0; padding: 10px 15px; }
	hr { border: none; border-top: 1px solid #ddd; margin: 30px 0; }
	.formula { background: #f8f9fa; padding: 10px; border-radius: 5px; margin: 10px 0; }
	</style>
	</head>
	<body>

	<h1>Modélisation Markovienne d'un Mélangeur DEM</h1>

	<h2>Description du projet</h2>

	<p>Ce document explique la construction d'une <strong>matrice de transition de Markov</strong> à partir de données de simulation DEM (Discrete Element Method).</p>

	<h3>Objectif</h3>

	<p>Transformer les trajectoires de particules en une <strong>chaîne de Markov discrète</strong> où:
	<ul>
	<li>La matrice \(P_{ij}\) représente la probabilité qu'une particule passe de l'état \(i\) à l'état \(j\)</li>
	<li>Les états sont définis par la position spatiale discrétisée (grille 5×5×5 = 125 états)</li>
	</ul>

	<hr>

	<h2>1. Fondements Mathématiques</h2>

	<h3>1.1 Chaînes de Markov</h3>

	<p>Une <strong>chaîne de Markov</strong> satisfait la <strong>propriété de Markov</strong>:</p>

	<div class="formula">
	\[
	P(X_{t+1} = j \| X_t = i, X_{t-1} = i_{t-1}, ...) = P(X_{t+1} = j \| X_t = i) = P_{ij}
	\]
	</div>

	<blockquote>
	L'état futur ne dépend <strong>que</strong> de l'état présent, pas de l'historique.
	</blockquote>

	<h3>1.2 Matrice de Transition</h3>

	<div class="formula">
	\[
	P = \begin{pmatrix}
	P_{11} & P_{12} & \cdots & P_{1n} \\
	P_{21} & P_{22} & \cdots & P_{2n} \\
	\vdots & \vdots & \ddots & \vdots \\
	P_{n1} & P_{n2} & \cdots & P_{nn}
	\end{pmatrix}
	\]
	</div>

	<p><strong>Propriétés:</strong></p>
	<ul>
	<li>\(P_{ij} \geq 0\) (probabilités non-négatives)</li>
	<li>\(\sum_{j} P_{ij} = 1\) (somme par ligne = 1) → <strong>stochasticité</strong></li>
	</ul>

	<h3>1.3 Formule d'Estimation de \(P_{ij}\)</h3>

	<p>À partir des données, on estime \(P_{ij}\) par comptage empirique:</p>

	<div class="formula">
	\[
	P_{ij} = \frac{\text{nombre de transitions } i \to j}{\text{nombre total de particules parties de } i}
	\]
	</div>

	<p>Formulation développée:</p>

	<div class="formula">
	\[
	P_{ij} = \frac{1}{N_i} \sum_{t=1}^{T} \sum_{p=1}^{N_p} \mathbb{1}_{s_p(t-1)=i} \cdot \mathbb{1}_{s_p(t)=j}
	\]
	</div>

	<p>où:</p>
	<ul>
	<li>\(N_i = \sum_{t} \sum_p \mathbb{1}_{s_p(t-1)=i}\) (nombre de particules dans l'état \(i\))</li>
	<li>\(s_p(t)\) = état spatial de la particule \(p\) au temps \(t\)</li>
	<li>\(\mathbb{1}_{condition}\) = fonction indicatrice (1 si vrai, 0 sinon)</li>
	</ul>

	<hr>

	<h2>2. Importations</h2>

	<pre><code>import polars as pl # Lecture efficace des CSV
	from huggingface_hub import HfFileSystem # Accès fichiers distants
	import torch # Calculs GPU (CUDA)
	from tqdm import tqdm # Barre de progression
	import numpy as np # Tableaux numériques
	import matplotlib.pyplot as plt</code></pre>

	<table>
	<tr><th>Bibliothèque</th><th>Rôle</th></tr>
	<tr><td><strong>polars</strong></td><td>Alternative rapide à pandas pour lire les CSV</td></tr>
	<tr><td><strong>huggingface_hub</strong></td><td>Accès stockage distant via HfFileSystem</td></tr>
	<tr><td><strong>torch</strong></td><td>Calculs tensoriels sur GPU (CUDA)</td></tr>
	<tr><td><strong>tqdm</strong></td><td>Barres de progression</td></tr>
	<tr><td><strong>numpy</strong></td><td>Manipulation de tableaux</td></tr>
	</table>

	<hr>

	<h2>3. Connexion au Dépôt HuggingFace</h2>

	<pre><code>fs = HfFileSystem()
	folder_path = "hf://buckets/ktongue/DEM_MCM/Output Paraview"
	csv_files = sorted(fs.glob(f"{folder_path}/*.csv"))</code></pre>

	<ul>
	<li><code>HfFileSystem()</code> : Interface style système de fichiers pour HuggingFace</li>
	<li><code>fs.glob()</code> : Retourne les chemins matching le pattern <code>*.csv</code></li>
	<li><code>sorted()</code> : Ordonne lexicographiquement (cohérence temporelle)</li>
	</ul>

	<hr>

	<h2>4. Étape 1: Calcul des Limites Spatiales</h2>

	<p><strong>Objectif:</strong> Déterminer les frontières min/max en x, y, z pour discrétiser l'espace.</p>

	<pre><code># Échantillonnage: 1 fichier sur 100 (limites quasi-constantes)
	sample_files = csv_files[::100]

	x_vals, y_vals, z_vals = [], [], []

	for f in sample_files:
	with fs.open(f, "rb") as file:
	df = pl.read_csv(file)
	x_vals.extend(df["coordinates:0"].to_list())
	y_vals.extend(df["coordinates:1"].to_list())
	z_vals.extend(df["coordinates:2"].to_list())

	# Calcul des min/max
	xmin, xmax = min(x_vals), max(x_vals)
	ymin, ymax = min(y_vals), max(y_vals)
	zmin, zmax = min(z_vals), max(z_vals)

	# Marge de sécurité (1mm)
	margin = 0.001
	xmin, xmax = xmin - margin, xmax + margin
	ymin, ymax = ymin - margin, ymax + margin
	zmin, zmax = zmin - margin, zmax + margin</code></pre>

	<p><strong>Échantillonnage <code>csv_files[::100]</code>:</strong></p>
	<ul>
	<li>6000 fichiers → ~60 fichiers suffisent</li>
	<li>Les limites spatiales ne changent pas significativement</li>
	</ul>

	<p><strong>Collecte des coordonnées:</strong></p>
	<ul>
	<li><code>df["coordinates:0"]</code> : Sélectionne la colonne x</li>
	<li><code>.to_list()</code> : Convertit en liste Python</li>
	<li><code>.extend()</code> : Ajoute les éléments (plus efficace que append)</li>
	</ul>

	<hr>

	<h2>5. Étape 2: Discrétisation Spatiale</h2>

	<p><strong>Objectif:</strong> Diviser l'espace 3D en grille régulière de cellules.</p>

	<pre><code>nx, ny, nz = 5, 5, 5 # Nombre de divisions par axe
	n_states = nx * ny * nz # Total = 125 états

	# Taille de chaque cellule
	dx = (xmax - xmin) / nx
	dy = (ymax - ymin) / ny
	dz = (zmax - zmin) / nz</code></pre>

	<h3>5.1 Formule de calcul de l'indice de cellule</h3>

	<p>Pour une coordonnée \(x\):</p>

	<div class="formula">
	\[
	i_x = \left\lfloor \frac{x - x_{min}}{\Delta x} \right\rfloor
	\]
	</div>

	<p>Même formule pour \(i_y\), \(i_z\).</p>

	<h3>5.2 Fonction <code>compute_states_gpu()</code></h3>

	<pre><code>def compute_states_gpu(x, y, z):
	"""
	Convertit les coordonnées continues (x,y,z) en indices d'états discrets.

	Formule: state = ix + iy * nx + iz * nx * ny
	"""
	x_tensor = torch.tensor(x, device="cuda")
	y_tensor = torch.tensor(y, device="cuda")
	z_tensor = torch.tensor(z, device="cuda")

	# Calcul des indices avec floor (division puis long())
	ix = ((x_tensor - xmin) / dx).long()
	iy = ((y_tensor - ymin) / dy).long()
	iz = ((z_tensor - zmin) / dz).long()

	# Clamp: restreint à [0, max] pour éviter out of bounds
	ix = ix.clamp(0, nx - 1)
	iy = iy.clamp(0, ny - 1)
	iz = iz.clamp(0, nz - 1)

	# Index linéaire (row-major order)
	return ix + iy * nx + iz * nx * ny</code></pre>

	<p><strong>Détail des opérations:</strong></p>
	<ol>
	<li><code>x_tensor - xmin</code> : Translation pour centrer à 0</li>
	<li><code>/ dx</code> : Normalisation par la taille de cellule</li>
	<li><code>.long()</code> : Conversion float → int (équivalent floor)</li>
	<li><code>.clamp(0, nx-1)</code> : Restriction aux limites</li>
	</ol>

	<div class="formula">
	\[
	\text{clamp}(ix, 0, n_x-1) =
	\begin{cases}
	0 & \text{si } ix < 0 \\
	n_x-1 & \text{si } ix \geq n_x \\
	ix & \text{sinon}
	\end{cases}
	\]
	</div>

	<p><strong>Index linéaire:</strong></p>

	<div class="formula">
	\[
	state = i_x + i_y \cdot n_x + i_z \cdot n_x \cdot n_y
	\]
	</div>

	<p><strong>Exemple:</strong> Pour nx=5, ny=5, nz=5, cellule (2, 1, 0):</p>

	<div class="formula">
	\[
	state = 2 + 1 \cdot 5 + 0 \cdot 25 = 7
	\]
	</div>

	<hr>

	<h2>6. Étape 3: Initialisation des Tenseurs GPU</h2>

	<pre><code>device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

	# transition_counts[i,j] = nombre de transitions i → j
	transition_counts = torch.zeros((n_states, n_states), dtype=torch.int64, device=device)

	# state_counts[i] = nombre de particules dans l'état i
	state_counts = torch.zeros(n_states, dtype=torch.int64, device=device)</code></pre>

	<p><strong>Explications:</strong></p>
	<ul>
	<li><code>torch.zeros(shape)</code> : Crée un tensor de zéros</li>
	<li><code>dtype=torch.int64</code> : Entiers 64 bits (évite overflow)</li>
	<li><code>device=device</code> : Alloue sur GPU ou CPU</li>
	</ul>

	<p><strong>Motivation int64:</strong> 6000 timesteps × 1030 particules ≈ 6.2M transitions max.</p>

	<hr>

	<h2>7. Étape 4: Calcul des Transitions</h2>

	<pre><code>n_files = len(csv_files)
	df_prev = None
	states_prev = None

	for i in tqdm(range(1, n_files), desc="Processing"):
	# Lecture du fichier courant
	with fs.open(csv_files[i], "rb") as f:
	df_curr = pl.read_csv(f)

	ids_curr = df_curr["Particle_ID"].to_numpy()

	# Calcul des états sur GPU
	states_curr = compute_states_gpu(
	df_curr["coordinates:0"].to_numpy(),
	df_curr["coordinates:1"].to_numpy(),
	df_curr["coordinates:2"].to_numpy(),
	)

	if df_prev is not None:
	ids_prev = df_prev["Particle_ID"].to_numpy()

	# Identification des particules communes
	common_ids = np.intersect1d(ids_prev, ids_curr)

	if len(common_ids) > 0:
	# Masques pour particules communes
	prev_idx = np.isin(ids_prev, common_ids)
	curr_idx = np.isin(ids_curr, common_ids)

	# États avant/après
	s_prev = states_prev[prev_idx]
	s_curr = states_curr[curr_idx]

	# Comptage vectorisé des transitions
	transitions = torch.stack([s_prev, s_curr], dim=1)
	unique_trans, counts = torch.unique(transitions, dim=0, return_counts=True)

	# Accumulation des comptages
	for j in range(len(unique_trans)):
	transition_counts[unique_trans[j, 0], unique_trans[j, 1]] += counts[j]

	# Comptage par état d'origine
	unique_prev, _ = torch.unique(s_prev, return_counts=True)
	for u in unique_prev:
	mask = s_prev == u
	state_counts[u] += mask.sum().item()

	df_prev = df_curr
	states_prev = states_curr</code></pre>

	<h3>7.1 Identification des particules communes</h3>

	<pre><code>common_ids = np.intersect1d(ids_prev, ids_curr)</code></pre>

	<div class="formula">
	\[
	\text{common\_ids} = \{id \mid id \in \text{ids.prev} \land id \in \text{ids.curr}\}
	\]
	</div>

	<h3>7.2 Masques booléens</h3>

	<pre><code>prev_idx = np.isin(ids_prev, common_ids)</code></pre>

	<p>Retourne True pour chaque élément de <code>ids_prev</code> qui est dans <code>common_ids</code>.</p>

	<h3>7.3 Comptage vectorisé</h3>

	<pre><code>transitions = torch.stack([s_prev, s_curr], dim=1)
	unique_trans, counts = torch.unique(transitions, dim=0, return_counts=True)</code></pre>

	<p><strong>Mathématiquement:</strong></p>
	<ul>
	<li><code>torch.stack([a, b], dim=1)</code> : Crée matrice <code>[a_i, b_i]</code> par ligne</li>
	<li><code>torch.unique(..., dim=0)</code> : Trouve les lignes uniques</li>
	<li><code>return_counts=True</code> : Compte les occurrences</li>
	</ul>

	<p><strong>Exemple:</strong></p>
	<pre>
	s_prev = [0, 1, 0, 2, 1]
	s_curr = [1, 1, 2, 2, 0]

	transitions = [[0,1], [1,1], [0,2], [2,2], [1,0]]

	unique_trans = [[0,1], [0,2], [1,0], [1,1], [2,2]]
	counts = [1, 1, 1, 1, 1]
	</pre>

	<h3>7.4 Accumulation</h3>

	<pre><code>transition_counts[unique_trans[j, 0], unique_trans[j, 1]] += counts[j]</code></pre>

	<div class="formula">
	\[
	C[i, j] \mathrel{+}= \text{count}
	\]
	</div>

	<p>où \(C[i,j]\) = transition_counts[i,j]</p>

	<hr>

	<h2>8. Étape 5: Normalisation</h2>

	<pre><code>P = torch.zeros((n_states, n_states), dtype=torch.float64, device=device)

	for i in range(n_states):
	if state_counts[i] > 0:
	P[i] = transition_counts[i].float() / state_counts[i].float()</code></pre>

	<h3>Formule de normalisation</h3>

	<div class="formula">
	\[
	P_{ij} = \frac{C[i,j]}{S[i]}
	\]
	</div>

	<p>où \(C[i,j]\) = transition_counts[i,j] et \(S[i]\) = state_counts[i]</p>

	<p><strong>Preuve de stochasticité:</strong></p>

	<div class="formula">
	\[
	\sum_{j=1}^{n} P_{ij} = \frac{\sum_j C[i,j]}{S[i]} = \frac{S[i]}{S[i]} = 1
	\]
	</div>

	<h3>Vérification</h3>

	<pre><code>row_sums = P.sum(dim=1)
	visited = state_counts > 0
	print(f"Somme par ligne: {row_sums[visited].min():.6f} / {row_sums[visited].max():.6f}")</code></pre>

	<hr>

	<h2>9. Étape 6: Sauvegarde</h2>

	<pre><code>P_cpu = P.cpu().numpy()
	transition_cpu = transition_counts.cpu().numpy()
	state_counts_cpu = state_counts.cpu().numpy()

	np.save("transition_matrix.npy", P_cpu)
	np.save("transition_counts.npy", transition_cpu)
	np.save("state_counts.npy", state_counts_cpu)</code></pre>

	<ul>
	<li><code>.cpu()</code> : Copie le tensor du GPU vers la mémoire CPU</li>
	<li><code>.numpy()</code> : Convertit en array NumPy</li>
	<li>Format <code>.npy</code> : Binaire compressé, chargement rapide via <code>np.load()</code></li>
	</ul>

	<hr>

	<h2>10. Visualisation et Analyse</h2>

	<h3>10.1 Heatmap de la matrice</h3>

	<pre><code>plt.imshow(P_cpu, cmap='Blues')
	plt.colorbar(label='P_ij')
	plt.xlabel('État destination (j)')
	plt.ylabel('État origine (i)')</code></pre>

	<p><strong>Interprétation:</strong></p>
	<ul>
	<li><strong>Diagonale forte</strong> : Particules qui restent dans leur cellule (mélange lent)</li>
	<li><strong>Diagonale faible</strong> : Mélange rapide</li>
	<li><strong>Structure de blocs</strong> : Zones préférentielles</li>
	</ul>

	<h3>10.2 Distribution des probabilités</h3>

	<pre><code>P_nonzero = P_cpu[P_cpu > 0]
	plt.hist(P_nonzero.flatten(), bins=50)</code></pre>

	<hr>

	<h2>11. Propriétés Mathématiques</h2>

	<h3>11.1 Vérifications</h3>

	<table>
	<tr><th>Propriété</th><th>Formule</th><th>Vérification</th></tr>
	<tr><td>Non-négativité</td><td>\(P_{ij} \geq 0\)</td><td><code>np.all(P_cpu >= 0)</code></td></tr>
	<tr><td>Stochasticité</td><td>\(\sum_j P_{ij} = 1\)</td><td><code>np.allclose(row_sums, 1.0)</code></td></tr>
	<tr><td>Rayon spectral</td><td>\(\rho(P) = 1\)</td><td><code>np.linalg.eigvals(P_cpu)</code></td></tr>
	</table>

	<h3>11.2 Distribution Stationnaire</h3>

	<p>Un vecteur \(\pi\) tel que:</p>

	<div class="formula">
	\[
	\pi = \pi P
	\]
	</div>

	<p>où \(\pi_i\) = probabilité limite d'être dans l'état \(i\).</p>

	<pre><code>eigenvalues, eigenvectors = np.linalg.eig(P_cpu.T)
	stationary_idx = np.argmin(np.abs(eigenvalues - 1.0))
	pi_stationary = np.abs(eigenvectors[:, stationary_idx])
	pi_stationary = pi_stationary / pi_stationary.sum()</code></pre>

	<hr>

	<h2>12. Résumé des Formules Clés</h2>

	<table>
	<tr><th>Concept</th><th>Formule</th></tr>
	<tr><td>Indice cellule X</td><td>\(i_x = \left\lfloor \frac{x - x_{min}}{\Delta x} \right\rfloor\)</td></tr>
	<tr><td>Index linéaire</td><td>\(state = i_x + i_y \cdot n_x + i_z \cdot n_x \cdot n_y\)</td></tr>
	<tr><td>Transition</td><td>\(P_{ij} = \frac{C[i,j]}{\sum_k C[i,k]}\)</td></tr>
	<tr><td>Stochasticité</td><td>\(\sum_j P_{ij} = 1\)</td></tr>
	</table>

	<hr>

	<h2>13. Utilisations Futures</h2>

	<p>La matrice de transition peut servir à:</p>
	<ol>
	<li><strong>Simulation</strong>: Générer trajectoires avec \(X_{t+1} \sim \text{Categorical}(P[X_t,])\)</li>
	<li><strong>Analyse</strong>: Calculer temps de mélange, distributions limites</li>
	<li><strong>Optimisation</strong>: Améliorer la conception du mélangeur</li>
	</ol>

	</body>
	</html>

Xet Storage Details

Size:: 16.3 kB
Xet hash:: 172537d009f326f86654a0556958f4a9fb7109cf2243e4de95f20451aa64e78c

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.