Spaces:

garywelz
/

glmp

Running

App Files Files Community

glmp / Welz_Presentation_Notes.html

garywelz

Upload Welz_Presentation_Notes.html

a951500 verified 3 months ago

raw

history blame contribute delete

33.7 kB

	<!DOCTYPE html>
	<html lang="en">
	<head>
	<meta charset="UTF-8">
	<meta name="viewport" content="width=device-width, initial-scale=1.0">
	<title>Welz Presentation Notes - TDA Seminar</title>
	<style>
	body { font-family: Georgia, 'Times New Roman', serif; max-width: 42rem; margin: 2rem auto; padding: 0 1.5rem; line-height: 1.6; color: #222; }
	h1, h2, h3 { font-family: 'Segoe UI', system-ui, sans-serif; }
	a { color: #0066cc; }
	.featured-note { border-radius: 4px; }
	h4 { margin-bottom: 0.5rem; }
	</style>
	</head>
	<body>

	<h1 id="welz-presentation-notes">Welz Presentation Notes</h1>
	<h2
	id="feedback-loops-as-loops-topological-data-analysis-of-genetic-regulatory-circuits">Feedback
	Loops as Loops: Topological Data Analysis of Genetic Regulatory
	Circuits</h2>
	<p><strong>Presentation:</strong> Gary Welz \| CopernicusAI / CUNY
	Graduate Center (PoI) <strong>Date:</strong> February 27, 2026
	<strong>Live deck:</strong> <a
	href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/TDA_Seminar_Slides.html">TDA_Seminar_Slides.html</a>
	<strong>Preprint:</strong> <a
	href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/TDA_PREPRINT_DRAFT.html">HTML</a></p>
	<hr />
	<h2 id="presentation-script-26-slides">Presentation Script (29
	Slides)</h2>
	<h3 id="slide-1-title">Slide 1: Title</h3>
	<p>Feedback Loops as Loops — Topological Data Analysis of Genetic
	Regulatory Circuits. Gary Welz, CopernicusAI / CUNY Graduate Center
	(PoI). February 27, 2026.</p>
	<hr />
	<h3 id="slide-2-from-papers-to-flowcharts">Slide 2: From papers to
	flowcharts</h3>
	<p>The first attempt at a beta-galactosidase flow chart was made in 1995
	and appeared in an article in <em>The X Advisor</em>, an online magazine
	for Unix developers, entitled “Is the Genome Like a Computer Program?”
	The article contained excerpts from conversations with biologists on the
	bionet.genome.chromosome newsgroup. The article is archived at the
	Internet Archive; the newsgroup discussions are archived by Google. The
	1995 chart was created from text alone—the same process that large
	language models (LLMs) use today. The source was Berg & Singer
	(1992, pp. 71-73). This illustrates that diagrams are only as detailed
	and reliable as their source material; using different sources for the
	same process can yield different charts. In the original bionet thread,
	the genome was proposed as a flowchart with genes connected by logical
	“and” and “or.” Robert Robbins replied that flow charts require careful
	interpretation but that bringing computer-science insights to bear on
	the genome has potentially huge payoffs. G. Dellaire emphasized that
	genome structure, not just linear sequence, encodes how the code is
	read—context that is spatial or temporal. The original chart is shown in
	the slide image.</p>
	<hr />
	<h3 id="slide-3-same-chart-30-years-later">Slide 3: Same chart, 30 years
	later</h3>
	<p>The same Lac operon / beta-galactosidase idea is now generated with
	LLMs and Mermaid Markdown. The original chart was so time-consuming to
	produce that the approach lay dormant for decades. It is now possible to
	produce any of these flowcharts from a single prompt in seconds. The Lac
	Operon flowchart can be viewed in the GLMP viewer via the link on the
	slide.</p>
	<hr />
	<h3 id="slide-4-the-innovation-text-to-visual-data">Slide 4: The
	Innovation: Text to Visual Data</h3>
	<p>Traditional topological data analysis (TDA) starts from numerical
	data. In this work, the starting point is text—paper descriptions—which
	is converted into visual flowcharts first. That shift is what makes the
	rest possible. The pipeline is: text (papers) to visual flowcharts to
	features to topology. Mermaid Markdown converts textual process
	descriptions into structured flowcharts. Flowcharts become visual data,
	and TDA reveals structure. Topology is extracted from descriptions, not
	from direct measurements. Novel aspects include: a
	text-to-visual-to-topology pipeline; five features (nodes, conditionals,
	OR gates, AND gates, loops); feedback loops corresponding literally to
	H1 loops in homology; and LLM-assisted curation at scale. The approach
	is conceptually similar to the Politics case study in Carlsson &
	Vejdemo-Johansson (2021, pp. 199-201) but exhibits these distinct
	characteristics.</p>
	<hr />
	<h3 id="slide-5-the-question">Slide 5: The Question</h3>
	<p>The central question is whether the <em>shape</em> of these
	circuits—as captured by topology—aligns with what biologists already
	know: feedback loops, cascades, and regulatory motifs. Can regulatory
	structure (feedback, cascades) be detected from circuit topology?
	Feedback loops are literally loops; they should appear in H1. The work
	asks whether text-derived visual data can support that.</p>
	<hr />
	<h3 id="slide-6-the-glmp-database">Slide 6: The GLMP Database</h3>
	<p>The Genome Logic Modeling Project (GLMP) provides 108 processes—each
	one a Mermaid flowchart with nodes, conditionals, OR/AND gates, and
	loops (back-edges). We extract five features per process: nodes,
	conditionals (aka edges), AND gates, OR gates, loops. The set includes
	66 from <em>E. coli</em>, 38 from <em>S. cerevisiae</em>, and 4 from
	<em>Bacillus subtilis</em>. Examples include lac operon, SOS response,
	and two-component signaling. A link to the full database table allows
	any process to be opened for its flowchart. Code is available at
	github.com/garywelz/glmp.</p>
	<hr />
	<h3 id="slide-7-glmp-references-in-json-and-feedback">Slide 7: GLMP:
	References in JSON and Feedback</h3>
	<p>Each process in GLMP is grounded in the literature: the JSON holds
	PubMed and DOI. The viewer accepts feedback so that flowcharts can be
	corrected or improved. Flowcharts are thus citable and correctable. In
	the viewer, Sources & Citations, Metadata, and the
	Improve-this-process form appear below each flowchart.</p>
	<hr />
	<h3 id="slide-8-from-flowcharts-to-features">Slide 8: From Flowcharts to
	Features</h3>
	<p>The full graph structure is not used for TDA. Instead, each flowchart
	is summarized into five features: nodes, conditionals (aka edges), AND
	gates, OR gates, and loops (back-edges). Features are standardized to
	zero mean and unit variance. The matrix is 108 processes × 5 features.
	These capture circuit complexity and logic structure.</p>
	<hr />
	<h3 id="slide-9-tda-pipeline">Slide 9: TDA Pipeline</h3>
	<p>From the feature matrix, a distance is built between every pair of
	processes. A Vietoris-Rips filtration is run and Ripser is used to
	obtain persistence diagrams. Cocycles are extracted; they indicate which
	processes sit on which topological loop. Output includes persistence
	diagrams for H0, H1, and H2, plus the membership of each H1 loop.</p>
	<hr />
	<h3 id="slide-10-what-are-we-counting">Slide 10: What Are We Counting? H₀, H₁, H₂</h3>
	<p>H₀ counts connected components—are the pieces connected? In GLMP, H₀
	starts at 108 and collapses as the Vietoris–Rips radius grows. H₁ counts
	loops—closed cycles with no filled face; in gene regulation, feedback
	loops. The 33 H₁ features are these unfilled cycles; biologically richest
	for GLMP. H₂ counts enclosed voids (hollow cavities). In cancer GRN work
	(Masoomy et al., 2021), H₂ in healthy cells = redundant regulatory
	structures. GLMP yields H₂ = 1.</p>
	<hr />
	<h3 id="slide-11-mathematical-note-1-betti">Slide 11: Mathematical Note (1) — Betti Numbers: History & Geometry</h3>
	<p>[NEW] Before looking at our results, a brief mathematical grounding — skip this if you'd prefer and come back to it. Betti numbers are named for Enrico Betti (1823–1892), formalized by Poincaré in the 1890s. They count topological "holes" of each dimension: β₀ = connected components (pieces), β₁ = independent loops that don't bound any filled region (1-dimensional holes), β₂ = enclosed voids (2-dimensional holes). Euler's formula for connected planar graphs is χ = V − E + F = 2, where F includes the outer, unbounded face — the triangle has V=3, E=3, F=2, giving χ=2; the tetrahedron has V=4, E=6, F=4, also χ=2; the cube has V=8, E=12, F=6, also χ=2. This generalizes via Betti numbers to χ = β₀ − β₁ + β₂ − …, the Euler characteristic. The key point for this talk: feedback loops in biology should show up as β₁ features — loops in H₁. That is exactly what we find.</p>
	<hr />
	<h3 id="slide-12-mathematical-note-2-faces">Slide 12: Mathematical Note (2) — Faces, 2-Simplices, and H₁</h3>
	<p>[NEW] This slide explains why some loops persist and others don't. In a planar graph, a face is a region bounded by edges, including the outer, unbounded region. In homology, faces correspond to 2-simplices — filled triangles: when three processes are mutually close enough in feature space, the Vietoris–Rips complex inserts a solid triangle among them. When a cycle of edges exists but no 2-simplex fills it in — no triangle caps it off — that loop is not the boundary of any face, so it cannot be "explained away," and it persists as an H₁ feature. Our 33 H₁ loops are exactly those cycles with no filling triangle. Biologically: a feedback circuit A→B→C→A persists in H₁ when there is no shortcut pathway that cuts across the loop and completes a filled triangle. The literal correspondence between feedback in biology and loops in homology is the conceptual core of this work.</p>
	<hr />
	<h3 id="slide-13-persistence-diagram">Slide 13: Persistence Diagram</h3>
	<p>The persistence diagram shows one component per process in H0 and 33
	loops in H1. The question is whether those H1 loops align with known
	biology—feedback circuits, stress responses, and so on. H2 yields 1
	void (expected—few points form persistent 2D cavities).</p>
	<hr />
	<h3 id="slide-14-what-do-the-loops-look-like-1-pca-cocycle-edges">Slide
	14: What Do the Loops Look Like? (1) PCA + Cocycle Edges</h3>
	<p>The persistence diagram tells us H1 has 33 loops but not where they
	sit in the data. To make homology visible, the 5D feature space is
	projected to 2D via PCA (principal component analysis—finds directions
	of maximum variance; preserves distances for visualization), then the
	cocycle edges—the pairs of processes that form each cycle—are drawn.
	Each colored loop is one H1 cycle: red (#1), blue (#2), green (#3),
	purple (#4), orange (#5). Lac operon, two-component, and SOS are
	labeled. <strong>Interactive version (hover for process names):</strong>
	<a
	href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp_h1_loops_interactive.html">glmp_h1_loops_interactive.html</a></p>
	<hr />
	<h3 id="slide-15-what-do-the-loops-look-like-2-mapper-graph">Slide 15:
	What Do the Loops Look Like? (2) Mapper Graph</h3>
	<p>The Mapper algorithm builds a simplicial complex: cluster nearby
	processes, then connect clusters that overlap. Each node is a cluster of
	similar processes (node size = process count); edges connect overlapping
	clusters. Cycles in this graph correspond to topological loops—so the
	loops in the Mapper graph visualize H1 structure in a different way,
	complementing the persistence diagram and the cocycle-in-PCA view.
	Current parameters: n_cubes=12, perc_overlap=0.65 → 18 nodes, 45 edges.
	<strong>Interactive version (click nodes to see processes, search by
	name):</strong> <a
	href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp_mapper_graph_interactive_v2.html">glmp_mapper_graph_interactive_v2.html</a></p>
	<hr />
	<h3 id="slide-16-top-h1-loop-1-persistence-0.563">Slide 16: Top H1 Loop
	#1 (Persistence = 0.563)</h3>
	<p>The most persistent loop aggregates stress response, protein quality
	control, and DNA repair: SOS response, quorum sensing, biofilm
	formation, BER, BAM, ribosome assembly, RNA pol recycling, Type III
	secretion, ubiquitin-proteasome, UPR. E. coli and yeast; shared “stress
	+ quality control + feedback” character.</p>
	<hr />
	<h3 id="slide-17-example-sos-response-loop-1">Slide 17: Example: SOS
	Response (Loop #1)</h3>
	<p>The SOS response is E. coli’s emergency DNA repair system: damage
	activates RecA, which inactivates LexA repressor, inducing repair genes.
	Classic feedback — repair turns genes off. SOS sits in the top H1 loop
	alongside quorum sensing, biofilm, UPR, and protein quality control.</p>
	<hr />
	<h3 id="slide-18-top-h1-loop-2-persistence-0.443">Slide 18: Top H1 Loop
	#2 (Persistence = 0.443)</h3>
	<p>Six processes: antibiotic efflux pumps, arginine biosynthesis,
	osmotic stress response, tryptophan biosynthesis, peroxisome biogenesis,
	vacuolar protein sorting. Metabolic regulation and organelle
	biogenesis—E. coli and yeast.</p>
	<hr />
	<h3 id="slide-19-top-h1-loop-3-persistence-0.306">Slide 19: Top H1 Loop
	#3 (Persistence = 0.306)</h3>
	<p>Six processes: biofilm formation, DNA replication elongation,
	flagellar assembly, osmotic stress, sigma factor competition, peroxisome
	biogenesis. Gene regulation, replication, motility, stress—E. coli and
	yeast.</p>
	<hr />
	<h3 id="slide-20-top-h1-loop-4-persistence-0.279">Slide 20: Top H1 Loop
	#4 (Persistence = 0.279)</h3>
	<p>Six processes: phosphate regulation, translation elongation, translation
	termination, tryptophan biosynthesis, osmotic stress response, sporulation
	initiation. Gene regulation, translation, stress, developmental—E. coli,
	yeast, Bacillus.</p>
	<hr />
	<h3 id="slide-21-top-h1-loop-5-persistence-0.198">Slide 21: Top H1 Loop
	#5 (Persistence = 0.198)</h3>
	<p>Five processes: ara operon, maltose regulon, Pho regulon, nitrogen
	catabolite repression (NCR/TORC1), competence development. Nutrient and
	developmental regulation—ara and Pho are classic feedback circuits. E.
	coli, yeast, Bacillus.</p>
	<hr />
	<h3 id="slide-22-example-ara-operon-loop-5">Slide 22: Example: Ara
	Operon (Loop #5)</h3>
	<p>AraC acts as repressor or activator depending on arabinose; DNA
	looping and CRP-cAMP integration. Ara sits in Loop #5 with Pho regulon,
	maltose regulon, nitrogen catabolite repression, and competence—all
	nutrient-sensing or developmental decisions with shared regulatory
	logic.</p>
	<hr />
	<h3 id="slide-23-biological-coherence-check">Slide 23: Biological
	Coherence Check</h3>
	<p>With the new loop-based feature set, known feedback circuits cluster
	coherently: SOS, quorum sensing, biofilm in Loop #1; ara and Pho in Loop
	#5; trp biosynthesis in Loops #2 and #4. Topology recovers stress,
	nutrient-sensing, and feedback architecture.</p>
	<hr />
	<h3 id="slide-24-organism-patterns">Slide 24: Organism Patterns</h3>
	<p>All top five loops mix organisms. Loop #1, #2, #3: E. coli and yeast.
	Loop #4 and #5: E. coli, yeast, and Bacillus. Regulatory logic
	transcends organism boundaries.</p>
	<hr />
	<h3 id="slide-25-why-these-features-work">Slide 25: Why These Features
	Work</h3>
	<p>Using loop (back-edge) features instead of NOT gates yields richer
	persistence values and clearer biological groupings. Stress circuits →
	Loop #1; nutrient-sensing (ara, Pho) → Loop #5; metabolic feedback (trp)
	→ Loops #2 and #4. The new feature set is a distinct experiment; results
	are richer and more interpretable.</p>
	<hr />
	<h3 id="slide-26-limitations-and-caveats">Slide 26: Limitations and
	Caveats</h3>
	<p>Sample size: 108 processes—enough to reveal structure; scaling to
	200-500+ is a priority. Five features (nodes, conditionals, OR gates,
	AND gates, loops); ablation shows node_count carries the most weight.
	Graph-theoretic features (cycle rank, longest path, gate ratios) are
	planned. LLM-generated flowcharts require expert fact-checking. Open
	question: Does topology predict function or correlate with known
	biology? The coherence check supports the latter.</p>
	<hr />
	<h3 id="slide-27-next-steps">Slide 27: Next Steps</h3>
	<p>Directions include Mapper, ablation and null-model validation, richer
	features. Longer-term goal: flowcharts and TDA as a Rosetta Stone
	linking topology to genetic “machine code”—sequence motifs for AND/OR.
	Falsifiable if circuits in the same H1 loop share enriched motifs. Null
	model permutation test: p = 0.022. Graph-theoretic features, persistent
	cohomology, scaling to 200-500+ planned. Code:
	github.com/garywelz/glmp/tree/main/tda-analysis.</p>
	<hr />
	<h3 id="slide-28-references">Slide 28: References</h3>
	<p>Carlsson & Vejdemo-Johansson (2021). Bauer (2021). Berg &
	Singer (1992). Masoomy et al. (2021). Rivera-Cancel et al. (2014). Swingle et al. (2025).
	Tralie et al. (2018). Welz (1995).</p>
	<hr />
	<h3 id="slide-29-acknowledgments-and-questions">Slide 29:
	Acknowledgments and Questions</h3>
	<p>Jordan Matuszewski; CUNY Graduate Center TDA seminar group; Kevin
	Gardner and colleagues (ASRC, CCNY). GLMP and TDA analysis:
	github.com/garywelz/glmp. Contact: Gary Welz \| <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3f585e4d46485a53457f58525e5653115c5052">[email protected]</a> \|
	917-593-2537.</p>
	<hr />
	<h2 id="glossary-of-terms">Glossary of Terms</h2>
	<h3 id="tda-terms">TDA Terms</h3>
	<p><strong>Persistent Homology (H0, H1, H2):</strong> H₀ counts connected
	components—the number of disconnected pieces. In GLMP, H₀ begins at 108
	(one per process) and collapses as the Vietoris–Rips radius grows. H₁
	counts loops—closed cycles with no filled-in face. In a gene regulatory
	network, this corresponds to a feedback loop: gene A activates B, B
	activates C, C represses A; the loop persists because no 2-simplex
	(filled triangle) caps it off. The 33 H₁ features in GLMP are precisely
	these unfilled cycles. H₂ counts enclosed voids—hollow cavities, like the
	interior of a sphere. In cancer GRN work (Masoomy et al., 2021), H₂
	features in healthy cells were interpreted as redundant regulatory
	structures. GLMP yields H₂ = 1. The intuitive ladder: H₀ asks “are the
	pieces connected?”; H₁ asks “are there feedback loops?”; H₂ asks “are
	there enclosed cavities?” For GLMP, H₁ is biologically richest: feedback
	loops are literally loops.</p>
	<p><strong>Persistence (birth, death):</strong> Birth = distance scale
	where a feature appears. Death = scale where it disappears. Persistence
	= death minus birth = significance. Loop #1: persistence 0.563; Loop #2:
	0.443; Loop #3: 0.306.</p>
	<p><strong>Vietoris-Rips Complex:</strong> Points connect when within
	distance epsilon; epsilon increases gradually. At each scale, shapes
	(clusters, loops, voids) form. Loops persisting across scales are
	treated as real structure.</p>
	<p><strong>Betti numbers (β₀, β₁, β₂, …):</strong> The ranks of the
	homology groups; they count “holes” of each dimension. Named for Enrico
	Betti (1823–1892), formalized by Henri Poincaré in the 1890s.
	Geometrically: β₀ = connected components (pieces); β₁ = loops—closed
	paths that do <em>not</em> bound any filled region (1-D holes); β₂ =
	enclosed voids (2-D holes). A cycle is a closed path with no boundary; a
	cycle that is not the boundary of any filled region represents a hole.
	Betti numbers are topological invariants—stable under continuous
	deformation. In our work: β₀ = 108 (components), β₁ = 33 (loops), β₂ = 1
	(void).</p>
	<p><strong>Euler characteristic (χ):</strong> For connected planar
	graphs, χ = V − E + F = 2, where F includes the <em>outer</em>
	(unbounded) face. Examples: triangle (V=3, E=3, F=2) → χ=2; square (V=4,
	E=4, F=2) → χ=2; tetrahedron (4,6,4) → χ=2; cube (8,12,6) → χ=2. This
	generalizes to χ = β₀ − β₁ + β₂ − … via the Betti numbers. The Euler
	characteristic sits at the foundation of persistent homology.</p>
	<p><strong>Faces and 2-simplices:</strong> In a planar graph, a
	<em>face</em> is a region bounded by edges—including the outer region.
	In homology, faces correspond to <em>2-simplices</em> (filled
	triangles): three vertices within the distance threshold form a triangle
	whose interior fills in the loop. When a loop is <em>not</em> bounded by
	any 2-simplex—no triangle fills it in—that loop persists as an H₁
	feature. Our 33 H₁ loops are exactly those cycles that fail to be
	filled; they are the β₁ contribution to χ.</p>
	<p><strong>Cocycles:</strong> Mathematical representation of which
	points form a loop. Used to identify which processes (e.g., lac operon)
	form each H1 loop. For visualization: project the 5D feature space to 2D
	(PCA), then draw edges between each (process A, process B) pair in the
	cocycle; the resulting polygon is the H1 loop.</p>
	<p><strong>PCA + Cocycle view:</strong> Makes homology visible by
	showing where loops sit in a 2D projection. Each colored polygon
	corresponds to one H1 cycle.</p>
	<p><strong>Mapper graph:</strong> Nodes = clusters of similar processes;
	edges = overlapping clusters. Cycles in the Mapper graph correspond to
	H1 loops in homology. Current implementation (n_cubes=12,
	perc_overlap=0.65): 18 nodes, 45 edges. Interactive version: <a
	href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp_mapper_graph_interactive_v2.html">glmp_mapper_graph_interactive_v2.html</a>.</p>
	<h3 id="biological-terms">Biological Terms</h3>
	<p><strong>Lac Operon:</strong> Classic gene regulation in E. coli.
	Controls lactose digestion genes. Demonstrates negative feedback;
	textbook feedback circuit.</p>
	<p><strong>Two-Component Signaling (EnvZ-OmpR):</strong> Sensor (EnvZ)
	detects signal; response (OmpR) controls genes. Feedback: response
	affects sensor. Paradigm bacterial signaling with feedback.</p>
	<p><strong>SOS Response:</strong> E. coli emergency DNA repair.
	Feedback: damage turns genes on; repair turns them off. SOS appears in
	Loop #1 with other stress responses.</p>
	<p><strong>Operon:</strong> Group of genes controlled together. Often
	has feedback. Lac, trp, ara are examples.</p>
	<p><strong>Quorum Sensing:</strong> Bacteria coordinate by signaling. At
	quorum, behavior changes (biofilms, toxins). Positive feedback. Appears
	in Loop #1.</p>
	<hr />
	<hr />

	<!-- ============================================================ -->
	<!-- FEATURED NOTES -->
	<!-- ============================================================ -->

	<h2 id="featured-notes">Featured Notes</h2>

	<p style="font-style: italic; color: #555;">Extended discussions for readers who want to go deeper. Each note is self-contained and can be read independently of the others.</p>

	<!-- ---- NOTE FOR MATHEMATICIANS ---- -->

	<div style="border-left: 4px solid #2E4D7B; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5f8ff;">
	<h3 id="note-for-mathematicians" style="margin-top: 0.25rem; color: #2E4D7B;">📐 Note for Mathematicians: Euler's Formula, Betti Numbers, and the Shape of Regulatory Space</h3>

	<p>You've probably seen Euler's formula for polyhedra:</p>

	<p style="text-align: center; font-size: 1.1rem;"><strong>V − E + F = 2</strong></p>

	<p>Vertices minus edges plus faces equals 2. It holds for a cube (8 − 12 + 6 = 2), a tetrahedron (4 − 6 + 4 = 2), a triangle treated as a flat polyhedron with an outer face (3 − 3 + 2 = 2). It looks like a curiosity until you understand what it is actually counting: a running tally of holes at different dimensions, with alternating signs. V counts 0-dimensional things (points). E counts 1-dimensional things (edges, which can close into loops). F counts 2-dimensional things (faces, which can cap loops or bound voids). The alternating + and − is the algebraic signature of homology.</p>

	<p>Enrico Betti generalized this in the 1870s; Poincaré formalized it in the 1890s. Instead of just V, E, F, you get a sequence of numbers β₀, β₁, β₂, … — <strong>Betti numbers</strong>, one per dimension — and the Euler characteristic generalizes to:</p>

	<p style="text-align: center; font-size: 1.1rem;"><strong>χ = β₀ − β₁ + β₂ − β₃ + …</strong></p>

	<p>Each Betti number counts independent topological features of that dimension:</p>
	<ul>
	<li><strong>β₀</strong> — connected components (pieces). For a single connected shape, β₀ = 1.</li>
	<li><strong>β₁</strong> — independent loops not bounding any filled region; genuine 1-dimensional holes.</li>
	<li><strong>β₂</strong> — enclosed voids; genuine 2-dimensional holes, like the interior of a hollow sphere.</li>
	</ul>

	<p>A few examples to build intuition:</p>
	<ul>
	<li>Solid triangle: β₀ = 1, β₁ = 0 (loop is filled), β₂ = 0. χ = 1.</li>
	<li>Hollow triangle (three edges, no interior): β₀ = 1, β₁ = 1 (loop unfilled), β₂ = 0. χ = 0.</li>
	<li>Hollow sphere: β₀ = 1, β₁ = 0, β₂ = 1. χ = 2.</li>
	<li>Torus: β₀ = 1, β₁ = 2 (one loop around the tube, one through the hole), β₂ = 1. χ = 0.</li>
	</ul>

	<p>Euler's original V − E + F = 2 is simply χ = β₀ − β₁ + β₂ = 2 for convex polyhedra, where β₁ = 0 (no through-holes) and β₀ = β₂ = 1.</p>

	<p><strong>Connection to GLMP.</strong> The 108 processes embedded in 5-dimensional feature space via the Vietoris–Rips filtration produce a simplicial complex with its own Betti numbers. Ripser computes them across the filtration:</p>
	<ul>
	<li>β₀ = 108 at birth (one component per process), collapsing toward 1 as ε grows and processes connect.</li>
	<li>β₁ = 33 persistent loops — 33 independent cycles that are never capped by a 2-simplex across the filtration range we study.</li>
	<li>β₂ = 1 enclosed void — consistent with expectations for 108 points in 5D; few such hollow structures form and persist.</li>
	</ul>

	<p>The alternating sum gives a topological fingerprint of the dataset: χ = 108 − 33 + 1 = 76 at peak complexity. That number would differ under a different feature set, a different organism distribution, or a different pipeline — it is a genuine invariant of the data's shape as we have encoded it.</p>

	<p><strong>Why β₁ = 33 is the productive dimension.</strong> β₀ describes clustering — informative but unsurprising; processes group by complexity. β₂ = 1 is expected given dimensionality and sample size. β₁ = 33 is where the structure lives: 33 independent ways the circuit data "goes around a hole" without filling it in. Each represents a family of regulatory circuits sharing a structural niche, arranged in a ring with a gap at the center that no single process bridges. Persistence filters signal from noise — the five loops with the highest death-minus-birth values are the ones that resist filling across the widest range of ε, and those are the loops with biological interpretations that hold up.</p>

	<p><strong>The deepest point.</strong> Euler's formula says that for any convex polyhedron, however complex, V − E + F = 2. The topology is invariant — it doesn't depend on how you draw the shape. Betti numbers extend this invariance to arbitrary shapes in arbitrary dimensions. They are not sensitive to embedding, orientation, or continuous deformation — only to fundamental topological structure. This is why TDA is a principled choice for data analysis: you are finding invariants, not clusters whose boundaries depend on a threshold, and not model fits that depend on distributional assumptions. The 33 loops are a property of the shape of the data. And that shape, it turns out, reflects the shape of regulatory logic in living cells.</p>
	</div>

	<!-- ---- NOTES FOR BIOLOGISTS ---- -->

	<h3 id="notes-for-biologists" style="color: #2E7D32;">🔬 Notes for Biologists</h3>

	<p style="font-style: italic; color: #555;">Five short discussions on the biological meaning and assumptions behind the TDA results.</p>

	<div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;">
	<h4 style="margin-top: 0.25rem; color: #2E7D32;">1. Why Not Just Cluster the Circuits?</h4>
	<p>A natural first instinct is to run k-means or hierarchical clustering on the feature matrix and call similar circuits a group. Clustering finds dense regions — it answers "which processes are near each other?" Topology answers a different question: "what is the shape of the space they occupy?" A ring of processes with a gap in the middle looks like one cluster to k-means (or two clusters if you cut it differently), but it is an H₁ loop to TDA — and the gap in the middle is informative. It means there is no "average" regulatory circuit that sits at the center bridging all the others. The hole is the finding. Clustering erases holes; homology counts them.</p>
	</div>

	<div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;">
	<h4 style="margin-top: 0.25rem; color: #2E7D32;">2. What Do the Five Features Actually Capture?</h4>
	<p>Biologists sometimes worry that reducing a regulatory circuit to five numbers loses everything important. That worry is legitimate — and it is exactly why the five features were chosen carefully. Node count captures overall circuit complexity: how many molecular players are involved. Conditional count (edges) captures connectivity: how densely those players communicate. OR gates capture circuits that respond to any one of several signals — alternative pathway logic. AND gates capture circuits that require coincidence of multiple signals — conjunction logic, often associated with tighter control. Loops (back-edges in the Mermaid flowchart) directly count explicit feedback structure. Together these five numbers encode the logical architecture of a circuit — not its molecular identity, but its computational shape. Two circuits from different organisms with the same architecture will be neighbors in feature space. That is a hypothesis, and the coherence check tests it.</p>
	</div>

	<div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;">
	<h4 style="margin-top: 0.25rem; color: #2E7D32;">3. Why Cross-Organism Loops Are the Most Interesting Result</h4>
	<p>When E. coli's SOS response and yeast's unfolded protein response land in the same H₁ loop, organism-level confounding cannot explain it — they share no evolutionary recent common ancestor for these particular pathways, no common regulator, no shared molecular machinery. What they share is circuit architecture: both are stress-induced, feedback-regulated, quality-control responses. The topology is grouping by regulatory logic that evolution has apparently arrived at independently in bacteria and eukaryotes. This is convergent evolution at the level of circuit shape rather than sequence. Loop #1 and Loop #5, which both mix organisms, are therefore the strongest evidence that the topology is capturing something real about the biology rather than reflecting database composition artifacts.</p>
	</div>

	<div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;">
	<h4 style="margin-top: 0.25rem; color: #2E7D32;">4. What an H₁ Loop Does and Does Not Claim</h4>
	<p>An H₁ loop says: these processes occupy a ring-shaped region in feature space, with a gap at the center. It does <em>not</em> say these processes interact biologically, share a common regulator, or form a pathway. SOS and quorum sensing are in the same loop not because they talk to each other but because they have similar circuit architectures — both involve an environmental stress signal, a cascade of regulatory steps, a feedback that shuts the response off once the stress is resolved. The topology is a statement about structural similarity, not biological interaction. This matters for interpretation: the coherence check asks whether known feedback circuits land in the same loops, not whether the loops predict protein-protein interactions. Keeping that distinction clear is essential when presenting to biologists who may read "loop" as a network connection.</p>
	</div>

	<div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;">
	<h4 style="margin-top: 0.25rem; color: #2E7D32;">5. What the Null Model Result Means in Plain Language</h4>
	<p>The null model permutation test asks: if we randomly shuffled which process gets which label — scrambling the biological identity of every circuit while keeping the feature values — how often would we get a coherence score as high as 0.750? In 1,000 random shuffles, we achieved that score only about 22 times (p = 0.022). In plain language: the fact that known feedback circuits cluster coherently in our H₁ loops is very unlikely to be a coincidence. A skeptic might object that the feature set was chosen to favor feedback detection (loops/back-edges are one of the five features), and that objection is fair — which is why the feature ablation study matters. Dropping the loops feature reduces coherence sharply, but so does dropping node count and conditional count. The signal is distributed across features, not manufactured by a single one. The null model result and the ablation study together make the case that the topology is reflecting something real, not something we built in by construction.</p>
	</div>

	<hr />

	<h2 id="key-discoveries-and-innovations">Key Discoveries and
	Innovations</h2>
	<p><strong>Methodological contribution:</strong> A pipeline was
	demonstrated: text to visual flowcharts (Mermaid) to features to
	topology. Topology is extracted from descriptions, not direct
	measurements.</p>
	<p><strong>Novel aspects:</strong> (1) Text-to-visual-to-topology
	pipeline; (2) Five features: nodes, conditionals, OR gates, AND gates,
	loops; (3) Feedback loops = H1 loops (literal correspondence); (4)
	LLM-assisted curation at scale.</p>
	<p><strong>Main finding:</strong> With the loop-based feature set, known
	feedback circuits cluster coherently: SOS, quorum sensing, biofilm in
	Loop #1; ara and Pho