Buckets:

ktongue
/

DEM_MCM

Files

xet

ktongue/DEM_MCM / markov_article_formula.html

ktongue

about 2 months ago

download

raw

12.4 kB

	<!DOCTYPE html>
	<html>
	<head>
	<meta charset="UTF-8">
	<title>Modélisation Markovienne - Formule Article</title>
	<script src="https://polyfill.io/v3/polyfill.min.js?features=es6"></script>
	<script id="MathJax-script" async src="https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js"></script>
	<style>
	body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; max-width: 900px; margin: 0 auto; padding: 20px; line-height: 1.6; }
	h1 { border-bottom: 3px solid #2c3e50; padding-bottom: 10px; color: #2c3e50; }
	h2 { color: #34495e; border-bottom: 1px solid #ddd; padding-bottom: 5px; margin-top: 30px; }
	h3 { color: #7f8c8d; }
	code { background: #f4f4f4; padding: 2px 6px; border-radius: 3px; font-family: 'Consolas', monospace; font-size: 0.95em; }
	pre { background: #2d2d2d; color: #f8f8f2; padding: 15px; border-radius: 5px; overflow-x: auto; }
	pre code { background: none; padding: 0; color: inherit; }
	.formula { background: #ecf0f1; padding: 15px; border-radius: 8px; margin: 15px 0; border-left: 4px solid #3498db; }
	.note { background: #fff3cd; padding: 15px; border-radius: 5px; border-left: 4px solid #ffc107; margin: 15px 0; }
	.warning { background: #f8d7da; padding: 15px; border-radius: 5px; border-left: 4px solid #dc3545; margin: 15px 0; }
	table { border-collapse: collapse; width: 100%; margin: 15px 0; }
	th, td { border: 1px solid #ddd; padding: 10px; text-align: left; }
	th { background: #34495e; color: white; }
	tr:nth-child(even) { background: #f9f9f9; }
	.param { background: #d5f4e6; padding: 3px 8px; border-radius: 3px; font-weight: bold; }
	</style>
	</head>
	<body>

	<h1>Modélisation Markovienne - Formule de l'Article</h1>

	<p>Cette version implémente <strong>exactement</strong> la formule de l'article pour calculer la matrice de transition de Markov.</p>

	<hr>

	<h2>1. Fondements Mathématiques</h2>

	<h3>1.1 Formule de la Matrice de Transition (Équation 8)</h3>

	<p>Selon l'article, la matrice de transition est calculée par:</p>

	<div class="formula">
	\[
	P_{ij} = \frac{1}{N_{LT}} \sum_{n=1}^{N_{LT}} \frac{T_{ij}(n)}{\phi(i, t_{n-1})}
	\]
	</div>

	<p>où:</p>
	<ul>
	<li>\(N_{LT}\) = nombre de pas de temps pour l'apprentissage (<strong>Learning Time</strong>)</li>
	<li>\(T_{ij}(n)\) = nombre de transitions observées de \(i \to j\) au pas de temps \(n\)</li>
	<li>\(\phi(i, t_{n-1})\) = nombre de particules dans l'état \(i\) au temps \(t_{n-1}\)</li>
	</ul>

	<h3>1.2 Interprétation</h3>

	<p>Cette formule calcule <strong>la moyenne arithmétique</strong> de matrices de transition instantanées:</p>

	<div class="formula">
	\[
	P^{(n)}_{ij} = \frac{T_{ij}(n)}{\phi(i, t_{n-1})}
	\]
	</div>

	<p>pour chaque pas de temps \(n\), puis:</p>

	<div class="formula">
	\[
	P_{ij} = \frac{1}{N_{LT}} \sum_{n=1}^{N_{LT}} P^{(n)}_{ij}
	\]
	</div>

	<h3>1.3 Différence avec l'approche précédente</h3>

	<div class="note">
	<strong>Approche précédente:</strong> Compter TOUTES les transitions, puis diviser par le total.<br><br>
	<strong>Approche article:</strong> Calculer une matrice normalisée à chaque pas de temps, puis moyenner ces matrices.
	</div>

	<hr>

	<h2>2. Code Complet</h2>

	<pre><code>import polars as pl
	from huggingface_hub import HfFileSystem
	import numpy as np
	from tqdm import tqdm

	fs = HfFileSystem()
	folder_path = "hf://buckets/ktongue/DEM_MCM/Output Telem"
	files = sorted(fs.glob(f"{folder_path}/*.csv"))

	# ============================================
	# PARAMÈTRES UTILISATEUR
	# ============================================
	<strong>N_LT = 100</strong> # Nombre de pas de temps pour l'apprentissage
	nx, ny, nz = 5, 5, 5 # Discrétisation spatiale
	# ============================================

	print(f"📁 {len(files)} fichiers disponibles")
	print(f"⚙️ Utilisation de N_LT = {N_LT} pas de temps pour l'apprentissage")

	# 1. Calcul des limites (échantillonnage rapide)
	print("🔍 Calcul des limites spatiales...")
	sample_files = files[::50]
	x_vals, y_vals, z_vals = [], [], []
	for f in sample_files:
	with fs.open(f, 'rb') as file:
	df = pl.read_csv(file)
	x_vals.extend(df['x'].to_list())
	y_vals.extend(df['y'].to_list())
	z_vals.extend(df['z'].to_list())

	xmin, xmax = min(x_vals) - 0.001, max(x_vals) + 0.001
	ymin, ymax = min(y_vals) - 0.001, max(y_vals) + 0.001
	zmin, zmax = min(z_vals) - 0.001, max(z_vals) + 0.001

	n_states = nx * ny * nz
	dx = (xmax - xmin) / nx
	dy = (ymax - ymin) / ny
	dz = (zmax - zmin) / nz

	def get_state(x, y, z):
	ix = min(max(int((x - xmin) / dx), 0), nx - 1)
	iy = min(max(int((y - ymin) / dy), 0), ny - 1)
	iz = min(max(int((z - zmin) / dz), 0), nz - 1)
	return ix + iy * nx + iz * nx * ny

	# 2. Calcul de la matrice selon la formule de l'article
	print(f"📊 Calcul de la matrice sur {N_LT} pas de temps...")

	# Accumulateur pour la somme des matrices normalisées
	P_sum = np.zeros((n_states, n_states))

	# On prend les N_LT premiers pas de temps
	files_to_process = files[:N_LT + 1]

	for i in tqdm(range(1, len(files_to_process)), desc="Learning"):
	# Lecture de deux fichiers consécutifs
	with fs.open(files_to_process[i-1], 'rb') as f:
	df_prev = pl.read_csv(f)
	with fs.open(files_to_process[i], 'rb') as f:
	df_curr = pl.read_csv(f)

	# Calculer les états pour chaque particule
	df_prev = df_prev.with_columns([
	pl.struct(['x', 'y', 'z']).map_elements(
	lambda s: get_state(s['x'], s['y'], s['z']), return_dtype=pl.Int64
	).alias('state_prev')
	])
	df_curr = df_curr.with_columns([
	pl.struct(['x', 'y', 'z']).map_elements(
	lambda s: get_state(s['x'], s['y'], s['z']), return_dtype=pl.Int64
	).alias('state_curr')
	])

	# Jointure sur les IDs pour tracker les particules
	merged = df_prev.select(['ID', 'state_prev']).join(
	df_curr.select(['ID', 'state_curr']), on='ID'
	)

	# Comptage pour ce pas de temps n
	phi_counts = merged.group_by('state_prev').len().rename({'len': 'total_source'})
	transitions = merged.group_by(['state_prev', 'state_curr']).len()

	# Calculer P_n et l'ajouter à la somme
	trans_with_total = transitions.join(phi_counts, on='state_prev')

	for row in trans_with_total.iter_rows():
	state_from, state_to, count, total = row
	if total > 0:
	P_sum[state_from, state_to] += count / total

	# 3. Moyenne finale
	P = P_sum / N_LT

	print(f"\n✅ Matrice calculée avec N_LT = {N_LT}")
	print(f" Shape: {P.shape}")
	print(f" Vérification (somme lignes): {P.sum(axis=1)[:5]}...")

	# Sauvegarde
	np.save(f'/kaggle/working/transition_matrix_NLT_{N_LT}.npy', P)
	print(f"💾 Sauvegardé: transition_matrix_NLT_{N_LT}.npy")</code></pre>

	<hr>

	<h2>3. Explication Détaillée du Code</h2>

	<h3>3.1 Paramètres</h3>

	<table>
	<tr><th>Paramètre</th><th>Valeur</th><th>Description</th></tr>
	<tr><td class="param">N_LT</td><td>100</td><td>Nombre de pas de temps pour l'apprentissage</td></tr>
	<tr><td class="param">nx, ny, nz</td><td>5, 5, 5</td><td>Discrétisation spatiale (125 états)</td></tr>
	</table>

	<h3>3.2 Étape 1: Calcul des Limites Spatiales</h3>

	<pre><code>sample_files = files[::50] # Échantillonnage: 1 fichier sur 50
	x_vals, y_vals, z_vals = [], [], []
	for f in sample_files:
	with fs.open(f, 'rb') as file:
	df = pl.read_csv(file)
	x_vals.extend(df['x'].to_list())
	y_vals.extend(df['y'].to_list())
	z_vals.extend(df['z'].to_list())</code></pre>

	<p>Même principe que précédemment, mais avec un échantillonnage de 1/50 pour plus de précision.</p>

	<h3>3.3 Étape 2: Fonction de Discrétisation</h3>

	<pre><code>def get_state(x, y, z):
	ix = min(max(int((x - xmin) / dx), 0), nx - 1)
	iy = min(max(int((y - ymin) / dy), 0), ny - 1)
	iz = min(max(int((z - zmin) / dz), 0), nz - 1)
	return ix + iy * nx + iz * nx * ny</code></pre>

	<div class="formula">
	\[
	state = i_x + i_y \cdot n_x + i_z \cdot n_x \cdot n_y
	\]
	</div>

	<p>où:</p>
	<ul>
	<li>\(i_x = \text{clamp}\left(\left\lfloor \frac{x - x_{min}}{\Delta x} \right\rfloor, 0, n_x-1\right)\)</li>
	<li>Même formule pour \(i_y\) et \(i_z\)</li>
	</ul>

	<h3>3.4 Étape 3: Calcul Itératif de la Matrice</h3>

	<p>C'est <strong>l核心</strong> de l'implémentation qui suit la formule de l'article.</p>

	<h4>3.4.1 Lecture de deux fichiers consécutifs</h4>

	<pre><code>with fs.open(files_to_process[i-1], 'rb') as f:
	df_prev = pl.read_csv(f)
	with fs.open(files_to_process[i], 'rb') as f:
	df_curr = pl.read_csv(f)</code></pre>

	<p>À chaque itération \(n\), on lit le fichier à \(t_{n-1}\) et à \(t_n\).</p>

	<h4>3.4.2 Calcul des états</h4>

	<pre><code>df_prev = df_prev.with_columns([
	pl.struct(['x', 'y', 'z']).map_elements(
	lambda s: get_state(s['x'], s['y'], s['z']), return_dtype=pl.Int64
	).alias('state_prev')
	])</code></pre>

	<p>Cette opération Polars applique <code>get_state()</code> à chaque ligne du DataFrame pour créer la colonne <code>state_prev</code>.</p>

	<h4>3.4.3 Jointure pour tracker les particules</h4>

	<pre><code>merged = df_prev.select(['ID', 'state_prev']).join(
	df_curr.select(['ID', 'state_curr']), on='ID'
	)</code></pre>

	<div class="formula">
	\[
	\text{merged} = \{ (ID, state_{n-1}, state_n) \mid ID \in \text{IDs communes} \}
	\]
	</div>

	<h4>3.4.4 Comptage des transitions</h4>

	<pre><code>phi_counts = merged.group_by('state_prev').len().rename({'len': 'total_source'})
	transitions = merged.group_by(['state_prev', 'state_curr']).len()</code></pre>

	<ul>
	<li><code>phi_counts</code>: Pour chaque état source \(i\), compte \(\phi(i, t_{n-1})\)</li>
	<li><code>transitions</code>: Pour chaque paire \((i, j)\), compte \(T_{ij}(n)\)</li>
	</ul>

	<h4>3.4.5 Accumulation</h4>

	<pre><code>trans_with_total = transitions.join(phi_counts, on='state_prev')

	for row in trans_with_total.iter_rows():
	state_from, state_to, count, total = row
	if total > 0:
	P_sum[state_from, state_to] += count / total</code></pre>

	<div class="formula">
	\[
	P_{ij}^{(n)} = \frac{T_{ij}(n)}{\phi(i, t_{n-1})}
	\]
	</div>

	<p>Pour chaque pas de temps \(n\), on calcule la matrice instantanée et on l'ajoute à <code>P_sum</code>.</p>

	<h3>3.5 Étape 4: Moyenne Finale</h3>

	<pre><code>P = P_sum / N_LT</code></pre>

	<div class="formula">
	\[
	P_{ij} = \frac{1}{N_{LT}} \sum_{n=1}^{N_{LT}} P_{ij}^{(n)} = \frac{1}{N_{LT}} \sum_{n=1}^{N_{LT}} \frac{T_{ij}(n)}{\phi(i, t_{n-1})}
	\]
	</div>

	<hr>

	<h2>4. Choix de \(N_{LT}\)</h2>

	<h3>4.1 Influence du paramètre</h3>

	<table>
	<tr><th>\(N_{LT}\)</th><th>Comportement</th><th>Avantage</th><th>Inconvénient</th></tr>
	<tr><td>Petit (10-50)</td><td>Captures rapides, bruité</td><td>Dynamiques locales</td><td>Variance élevée</td></tr>
	<tr><td>Grand (100-500)</td><td>Lissé, stable</td><td>Estimation robuste</td><td>Moyenne de régimes différents</td></tr>
	</table>

	<h3>4.2 Régime Permanent</h3>

	<div class="note">
	<strong>Recommandation de l'article:</strong> Commencer l'apprentissage une fois le <strong>régime permanent</strong> atteint (après que le système ait atteint un état stationnaire).
	</div>

	<pre><code># Pour commencer après le régime transitoire:
	start_index = 500 # Par exemple, après 500 pas de temps
	files_to_process = files[start_index : start_index + N_LT + 1]</code></pre>

	<hr>

	<h2>5. Résumé des Formules</h2>

	<table>
	<tr><th>Concept</th><th>Formule</th></tr>
	<tr><td>Matrice instantanée</td><td>\(P^{(n)}_{ij} = \frac{T_{ij}(n)}{\phi(i, t_{n-1})}\)</td></tr>
	<tr><td>Matrice finale</td><td>\(P_{ij} = \frac{1}{N_{LT}} \sum_{n=1}^{N_{LT}} P^{(n)}_{ij}\)</td></tr>
	<tr><td>Index d'état</td><td>\(state = i_x + i_y \cdot n_x + i_z \cdot n_x \cdot n_y\)</td></tr>
	<tr><td>Coordonnées</td><td>\(i_x = \left\lfloor \frac{x - x_{min}}{\Delta x} \right\rfloor\)</td></tr>
	</table>

	<hr>

	<h2>6. Notes d'Implémentation</h2>

	<ul>
	<li><strong>Choix de Polars</strong>: Utilisé pour les jointures et group_by efficaces sur les DataFrames</li>
	<li><strong>Pas de GPU</strong>: Cette version CPU est plus lente mais plus simple à comprendre</li>
	<li><strong>Jointure interne</strong>: Seules les particules présentes aux deux temps sont considérées</li>
	<li><strong>Normalisation par pas</strong>: Chaque matrice \(P^{(n)}\) est stochastique par ligne</li>
	</ul>

	</body>
	</html>

Xet Storage Details

Size:: 12.4 kB
Xet hash:: 65f72d2dfbeb9b859c26244c1bbfde4f3f3873473f5e5d44000aa3fc650394cf

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.