glmp / Welz_Presentation_Notes.html
garywelz's picture
Upload Welz_Presentation_Notes.html
a951500 verified
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Welz Presentation Notes - TDA Seminar</title>
<style>
body { font-family: Georgia, 'Times New Roman', serif; max-width: 42rem; margin: 2rem auto; padding: 0 1.5rem; line-height: 1.6; color: #222; }
h1, h2, h3 { font-family: 'Segoe UI', system-ui, sans-serif; }
a { color: #0066cc; }
.featured-note { border-radius: 4px; }
h4 { margin-bottom: 0.5rem; }
</style>
</head>
<body>
<h1 id="welz-presentation-notes">Welz Presentation Notes</h1>
<h2
id="feedback-loops-as-loops-topological-data-analysis-of-genetic-regulatory-circuits">Feedback
Loops as Loops: Topological Data Analysis of Genetic Regulatory
Circuits</h2>
<p><strong>Presentation:</strong> Gary Welz | CopernicusAI / CUNY
Graduate Center (PoI) <strong>Date:</strong> February 27, 2026
<strong>Live deck:</strong> <a
href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/TDA_Seminar_Slides.html">TDA_Seminar_Slides.html</a>
<strong>Preprint:</strong> <a
href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/TDA_PREPRINT_DRAFT.html">HTML</a></p>
<hr />
<h2 id="presentation-script-26-slides">Presentation Script (29
Slides)</h2>
<h3 id="slide-1-title">Slide 1: Title</h3>
<p>Feedback Loops as Loops β€” Topological Data Analysis of Genetic
Regulatory Circuits. Gary Welz, CopernicusAI / CUNY Graduate Center
(PoI). February 27, 2026.</p>
<hr />
<h3 id="slide-2-from-papers-to-flowcharts">Slide 2: From papers to
flowcharts</h3>
<p>The first attempt at a beta-galactosidase flow chart was made in 1995
and appeared in an article in <em>The X Advisor</em>, an online magazine
for Unix developers, entitled β€œIs the Genome Like a Computer Program?”
The article contained excerpts from conversations with biologists on the
bionet.genome.chromosome newsgroup. The article is archived at the
Internet Archive; the newsgroup discussions are archived by Google. The
1995 chart was created from text aloneβ€”the same process that large
language models (LLMs) use today. The source was Berg &amp; Singer
(1992, pp.Β 71-73). This illustrates that diagrams are only as detailed
and reliable as their source material; using different sources for the
same process can yield different charts. In the original bionet thread,
the genome was proposed as a flowchart with genes connected by logical
β€œand” and β€œor.” Robert Robbins replied that flow charts require careful
interpretation but that bringing computer-science insights to bear on
the genome has potentially huge payoffs. G. Dellaire emphasized that
genome structure, not just linear sequence, encodes how the code is
readβ€”context that is spatial or temporal. The original chart is shown in
the slide image.</p>
<hr />
<h3 id="slide-3-same-chart-30-years-later">Slide 3: Same chart, 30 years
later</h3>
<p>The same Lac operon / beta-galactosidase idea is now generated with
LLMs and Mermaid Markdown. The original chart was so time-consuming to
produce that the approach lay dormant for decades. It is now possible to
produce any of these flowcharts from a single prompt in seconds. The Lac
Operon flowchart can be viewed in the GLMP viewer via the link on the
slide.</p>
<hr />
<h3 id="slide-4-the-innovation-text-to-visual-data">Slide 4: The
Innovation: Text to Visual Data</h3>
<p>Traditional topological data analysis (TDA) starts from numerical
data. In this work, the starting point is textβ€”paper descriptionsβ€”which
is converted into visual flowcharts first. That shift is what makes the
rest possible. The pipeline is: text (papers) to visual flowcharts to
features to topology. Mermaid Markdown converts textual process
descriptions into structured flowcharts. Flowcharts become visual data,
and TDA reveals structure. Topology is extracted from descriptions, not
from direct measurements. Novel aspects include: a
text-to-visual-to-topology pipeline; five features (nodes, conditionals,
OR gates, AND gates, loops); feedback loops corresponding literally to
H1 loops in homology; and LLM-assisted curation at scale. The approach
is conceptually similar to the Politics case study in Carlsson &amp;
Vejdemo-Johansson (2021, pp.Β 199-201) but exhibits these distinct
characteristics.</p>
<hr />
<h3 id="slide-5-the-question">Slide 5: The Question</h3>
<p>The central question is whether the <em>shape</em> of these
circuitsβ€”as captured by topologyβ€”aligns with what biologists already
know: feedback loops, cascades, and regulatory motifs. Can regulatory
structure (feedback, cascades) be detected from circuit topology?
Feedback loops are literally loops; they should appear in H1. The work
asks whether text-derived visual data can support that.</p>
<hr />
<h3 id="slide-6-the-glmp-database">Slide 6: The GLMP Database</h3>
<p>The Genome Logic Modeling Project (GLMP) provides 108 processesβ€”each
one a Mermaid flowchart with nodes, conditionals, OR/AND gates, and
loops (back-edges). We extract five features per process: nodes,
conditionals (aka edges), AND gates, OR gates, loops. The set includes
66 from <em>E. coli</em>, 38 from <em>S. cerevisiae</em>, and 4 from
<em>Bacillus subtilis</em>. Examples include lac operon, SOS response,
and two-component signaling. A link to the full database table allows
any process to be opened for its flowchart. Code is available at
github.com/garywelz/glmp.</p>
<hr />
<h3 id="slide-7-glmp-references-in-json-and-feedback">Slide 7: GLMP:
References in JSON and Feedback</h3>
<p>Each process in GLMP is grounded in the literature: the JSON holds
PubMed and DOI. The viewer accepts feedback so that flowcharts can be
corrected or improved. Flowcharts are thus citable and correctable. In
the viewer, Sources &amp; Citations, Metadata, and the
Improve-this-process form appear below each flowchart.</p>
<hr />
<h3 id="slide-8-from-flowcharts-to-features">Slide 8: From Flowcharts to
Features</h3>
<p>The full graph structure is not used for TDA. Instead, each flowchart
is summarized into five features: nodes, conditionals (aka edges), AND
gates, OR gates, and loops (back-edges). Features are standardized to
zero mean and unit variance. The matrix is 108 processes Γ— 5 features.
These capture circuit complexity and logic structure.</p>
<hr />
<h3 id="slide-9-tda-pipeline">Slide 9: TDA Pipeline</h3>
<p>From the feature matrix, a distance is built between every pair of
processes. A Vietoris-Rips filtration is run and Ripser is used to
obtain persistence diagrams. Cocycles are extracted; they indicate which
processes sit on which topological loop. Output includes persistence
diagrams for H0, H1, and H2, plus the membership of each H1 loop.</p>
<hr />
<h3 id="slide-10-what-are-we-counting">Slide 10: What Are We Counting? Hβ‚€, H₁, Hβ‚‚</h3>
<p>Hβ‚€ counts connected componentsβ€”are the pieces connected? In GLMP, Hβ‚€
starts at 108 and collapses as the Vietoris–Rips radius grows. H₁ counts
loopsβ€”closed cycles with no filled face; in gene regulation, feedback
loops. The 33 H₁ features are these unfilled cycles; biologically richest
for GLMP. Hβ‚‚ counts enclosed voids (hollow cavities). In cancer GRN work
(Masoomy et al., 2021), Hβ‚‚ in healthy cells = redundant regulatory
structures. GLMP yields Hβ‚‚ = 1.</p>
<hr />
<h3 id="slide-11-mathematical-note-1-betti">Slide 11: Mathematical Note (1) β€” Betti Numbers: History &amp; Geometry</h3>
<p>[NEW] Before looking at our results, a brief mathematical grounding β€” skip this if you'd prefer and come back to it. Betti numbers are named for Enrico Betti (1823–1892), formalized by PoincarΓ© in the 1890s. They count topological "holes" of each dimension: Ξ²β‚€ = connected components (pieces), β₁ = independent loops that don't bound any filled region (1-dimensional holes), Ξ²β‚‚ = enclosed voids (2-dimensional holes). Euler's formula for connected planar graphs is Ο‡ = V βˆ’ E + F = 2, where F includes the outer, unbounded face β€” the triangle has V=3, E=3, F=2, giving Ο‡=2; the tetrahedron has V=4, E=6, F=4, also Ο‡=2; the cube has V=8, E=12, F=6, also Ο‡=2. This generalizes via Betti numbers to Ο‡ = Ξ²β‚€ βˆ’ β₁ + Ξ²β‚‚ βˆ’ …, the Euler characteristic. The key point for this talk: feedback loops in biology should show up as β₁ features β€” loops in H₁. That is exactly what we find.</p>
<hr />
<h3 id="slide-12-mathematical-note-2-faces">Slide 12: Mathematical Note (2) β€” Faces, 2-Simplices, and H₁</h3>
<p>[NEW] This slide explains why some loops persist and others don't. In a planar graph, a face is a region bounded by edges, including the outer, unbounded region. In homology, faces correspond to 2-simplices β€” filled triangles: when three processes are mutually close enough in feature space, the Vietoris–Rips complex inserts a solid triangle among them. When a cycle of edges exists but no 2-simplex fills it in β€” no triangle caps it off β€” that loop is not the boundary of any face, so it cannot be "explained away," and it persists as an H₁ feature. Our 33 H₁ loops are exactly those cycles with no filling triangle. Biologically: a feedback circuit Aβ†’Bβ†’Cβ†’A persists in H₁ when there is no shortcut pathway that cuts across the loop and completes a filled triangle. The literal correspondence between feedback in biology and loops in homology is the conceptual core of this work.</p>
<hr />
<h3 id="slide-13-persistence-diagram">Slide 13: Persistence Diagram</h3>
<p>The persistence diagram shows one component per process in H0 and 33
loops in H1. The question is whether those H1 loops align with known
biologyβ€”feedback circuits, stress responses, and so on. H2 yields 1
void (expectedβ€”few points form persistent 2D cavities).</p>
<hr />
<h3 id="slide-14-what-do-the-loops-look-like-1-pca-cocycle-edges">Slide
14: What Do the Loops Look Like? (1) PCA + Cocycle Edges</h3>
<p>The persistence diagram tells us H1 has 33 loops but not where they
sit in the data. To make homology visible, the 5D feature space is
projected to 2D via PCA (principal component analysisβ€”finds directions
of maximum variance; preserves distances for visualization), then the
cocycle edgesβ€”the pairs of processes that form each cycleβ€”are drawn.
Each colored loop is one H1 cycle: red (#1), blue (#2), green (#3),
purple (#4), orange (#5). Lac operon, two-component, and SOS are
labeled. <strong>Interactive version (hover for process names):</strong>
<a
href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp_h1_loops_interactive.html">glmp_h1_loops_interactive.html</a></p>
<hr />
<h3 id="slide-15-what-do-the-loops-look-like-2-mapper-graph">Slide 15:
What Do the Loops Look Like? (2) Mapper Graph</h3>
<p>The Mapper algorithm builds a simplicial complex: cluster nearby
processes, then connect clusters that overlap. Each node is a cluster of
similar processes (node size = process count); edges connect overlapping
clusters. Cycles in this graph correspond to topological loopsβ€”so the
loops in the Mapper graph visualize H1 structure in a different way,
complementing the persistence diagram and the cocycle-in-PCA view.
Current parameters: n_cubes=12, perc_overlap=0.65 β†’ 18 nodes, 45 edges.
<strong>Interactive version (click nodes to see processes, search by
name):</strong> <a
href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp_mapper_graph_interactive_v2.html">glmp_mapper_graph_interactive_v2.html</a></p>
<hr />
<h3 id="slide-16-top-h1-loop-1-persistence-0.563">Slide 16: Top H1 Loop
#1 (Persistence = 0.563)</h3>
<p>The most persistent loop aggregates stress response, protein quality
control, and DNA repair: SOS response, quorum sensing, biofilm
formation, BER, BAM, ribosome assembly, RNA pol recycling, Type III
secretion, ubiquitin-proteasome, UPR. E. coli and yeast; shared β€œstress
+ quality control + feedback” character.</p>
<hr />
<h3 id="slide-17-example-sos-response-loop-1">Slide 17: Example: SOS
Response (Loop #1)</h3>
<p>The SOS response is E. coli’s emergency DNA repair system: damage
activates RecA, which inactivates LexA repressor, inducing repair genes.
Classic feedback β€” repair turns genes off. SOS sits in the top H1 loop
alongside quorum sensing, biofilm, UPR, and protein quality control.</p>
<hr />
<h3 id="slide-18-top-h1-loop-2-persistence-0.443">Slide 18: Top H1 Loop
#2 (Persistence = 0.443)</h3>
<p>Six processes: antibiotic efflux pumps, arginine biosynthesis,
osmotic stress response, tryptophan biosynthesis, peroxisome biogenesis,
vacuolar protein sorting. Metabolic regulation and organelle
biogenesisβ€”E. coli and yeast.</p>
<hr />
<h3 id="slide-19-top-h1-loop-3-persistence-0.306">Slide 19: Top H1 Loop
#3 (Persistence = 0.306)</h3>
<p>Six processes: biofilm formation, DNA replication elongation,
flagellar assembly, osmotic stress, sigma factor competition, peroxisome
biogenesis. Gene regulation, replication, motility, stressβ€”E. coli and
yeast.</p>
<hr />
<h3 id="slide-20-top-h1-loop-4-persistence-0.279">Slide 20: Top H1 Loop
#4 (Persistence = 0.279)</h3>
<p>Six processes: phosphate regulation, translation elongation, translation
termination, tryptophan biosynthesis, osmotic stress response, sporulation
initiation. Gene regulation, translation, stress, developmentalβ€”E. coli,
yeast, Bacillus.</p>
<hr />
<h3 id="slide-21-top-h1-loop-5-persistence-0.198">Slide 21: Top H1 Loop
#5 (Persistence = 0.198)</h3>
<p>Five processes: ara operon, maltose regulon, Pho regulon, nitrogen
catabolite repression (NCR/TORC1), competence development. Nutrient and
developmental regulationβ€”ara and Pho are classic feedback circuits. E.
coli, yeast, Bacillus.</p>
<hr />
<h3 id="slide-22-example-ara-operon-loop-5">Slide 22: Example: Ara
Operon (Loop #5)</h3>
<p>AraC acts as repressor or activator depending on arabinose; DNA
looping and CRP-cAMP integration. Ara sits in Loop #5 with Pho regulon,
maltose regulon, nitrogen catabolite repression, and competenceβ€”all
nutrient-sensing or developmental decisions with shared regulatory
logic.</p>
<hr />
<h3 id="slide-23-biological-coherence-check">Slide 23: Biological
Coherence Check</h3>
<p>With the new loop-based feature set, known feedback circuits cluster
coherently: SOS, quorum sensing, biofilm in Loop #1; ara and Pho in Loop
#5; trp biosynthesis in Loops #2 and #4. Topology recovers stress,
nutrient-sensing, and feedback architecture.</p>
<hr />
<h3 id="slide-24-organism-patterns">Slide 24: Organism Patterns</h3>
<p>All top five loops mix organisms. Loop #1, #2, #3: E. coli and yeast.
Loop #4 and #5: E. coli, yeast, and Bacillus. Regulatory logic
transcends organism boundaries.</p>
<hr />
<h3 id="slide-25-why-these-features-work">Slide 25: Why These Features
Work</h3>
<p>Using loop (back-edge) features instead of NOT gates yields richer
persistence values and clearer biological groupings. Stress circuits β†’
Loop #1; nutrient-sensing (ara, Pho) β†’ Loop #5; metabolic feedback (trp)
β†’ Loops #2 and #4. The new feature set is a distinct experiment; results
are richer and more interpretable.</p>
<hr />
<h3 id="slide-26-limitations-and-caveats">Slide 26: Limitations and
Caveats</h3>
<p>Sample size: 108 processesβ€”enough to reveal structure; scaling to
200-500+ is a priority. Five features (nodes, conditionals, OR gates,
AND gates, loops); ablation shows node_count carries the most weight.
Graph-theoretic features (cycle rank, longest path, gate ratios) are
planned. LLM-generated flowcharts require expert fact-checking. Open
question: Does topology predict function or correlate with known
biology? The coherence check supports the latter.</p>
<hr />
<h3 id="slide-27-next-steps">Slide 27: Next Steps</h3>
<p>Directions include Mapper, ablation and null-model validation, richer
features. Longer-term goal: flowcharts and TDA as a Rosetta Stone
linking topology to genetic β€œmachine code”—sequence motifs for AND/OR.
Falsifiable if circuits in the same H1 loop share enriched motifs. Null
model permutation test: p = 0.022. Graph-theoretic features, persistent
cohomology, scaling to 200-500+ planned. Code:
github.com/garywelz/glmp/tree/main/tda-analysis.</p>
<hr />
<h3 id="slide-28-references">Slide 28: References</h3>
<p>Carlsson &amp; Vejdemo-Johansson (2021). Bauer (2021). Berg &amp;
Singer (1992). Masoomy et al. (2021). Rivera-Cancel et al.Β (2014). Swingle et al.Β (2025).
Tralie et al.Β (2018). Welz (1995).</p>
<hr />
<h3 id="slide-29-acknowledgments-and-questions">Slide 29:
Acknowledgments and Questions</h3>
<p>Jordan Matuszewski; CUNY Graduate Center TDA seminar group; Kevin
Gardner and colleagues (ASRC, CCNY). GLMP and TDA analysis:
github.com/garywelz/glmp. Contact: Gary Welz | <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3f585e4d46485a53457f58525e5653115c5052">[email&#160;protected]</a> |
917-593-2537.</p>
<hr />
<h2 id="glossary-of-terms">Glossary of Terms</h2>
<h3 id="tda-terms">TDA Terms</h3>
<p><strong>Persistent Homology (H0, H1, H2):</strong> Hβ‚€ counts connected
componentsβ€”the number of disconnected pieces. In GLMP, Hβ‚€ begins at 108
(one per process) and collapses as the Vietoris–Rips radius grows. H₁
counts loopsβ€”closed cycles with no filled-in face. In a gene regulatory
network, this corresponds to a feedback loop: gene A activates B, B
activates C, C represses A; the loop persists because no 2-simplex
(filled triangle) caps it off. The 33 H₁ features in GLMP are precisely
these unfilled cycles. Hβ‚‚ counts enclosed voidsβ€”hollow cavities, like the
interior of a sphere. In cancer GRN work (Masoomy et al., 2021), Hβ‚‚
features in healthy cells were interpreted as redundant regulatory
structures. GLMP yields Hβ‚‚ = 1. The intuitive ladder: Hβ‚€ asks β€œare the
pieces connected?”; H₁ asks β€œare there feedback loops?”; Hβ‚‚ asks β€œare
there enclosed cavities?” For GLMP, H₁ is biologically richest: feedback
loops are literally loops.</p>
<p><strong>Persistence (birth, death):</strong> Birth = distance scale
where a feature appears. Death = scale where it disappears. Persistence
= death minus birth = significance. Loop #1: persistence 0.563; Loop #2:
0.443; Loop #3: 0.306.</p>
<p><strong>Vietoris-Rips Complex:</strong> Points connect when within
distance epsilon; epsilon increases gradually. At each scale, shapes
(clusters, loops, voids) form. Loops persisting across scales are
treated as real structure.</p>
<p><strong>Betti numbers (Ξ²β‚€, β₁, Ξ²β‚‚, …):</strong> The ranks of the
homology groups; they count β€œholes” of each dimension. Named for Enrico
Betti (1823–1892), formalized by Henri PoincarΓ© in the 1890s.
Geometrically: Ξ²β‚€ = connected components (pieces); β₁ = loopsβ€”closed
paths that do <em>not</em> bound any filled region (1-D holes); Ξ²β‚‚ =
enclosed voids (2-D holes). A cycle is a closed path with no boundary; a
cycle that is not the boundary of any filled region represents a hole.
Betti numbers are topological invariantsβ€”stable under continuous
deformation. In our work: Ξ²β‚€ = 108 (components), β₁ = 33 (loops), Ξ²β‚‚ = 1
(void).</p>
<p><strong>Euler characteristic (Ο‡):</strong> For connected planar
graphs, Ο‡ = V βˆ’ E + F = 2, where F includes the <em>outer</em>
(unbounded) face. Examples: triangle (V=3, E=3, F=2) β†’ Ο‡=2; square (V=4,
E=4, F=2) β†’ Ο‡=2; tetrahedron (4,6,4) β†’ Ο‡=2; cube (8,12,6) β†’ Ο‡=2. This
generalizes to Ο‡ = Ξ²β‚€ βˆ’ β₁ + Ξ²β‚‚ βˆ’ … via the Betti numbers. The Euler
characteristic sits at the foundation of persistent homology.</p>
<p><strong>Faces and 2-simplices:</strong> In a planar graph, a
<em>face</em> is a region bounded by edgesβ€”including the outer region.
In homology, faces correspond to <em>2-simplices</em> (filled
triangles): three vertices within the distance threshold form a triangle
whose interior fills in the loop. When a loop is <em>not</em> bounded by
any 2-simplexβ€”no triangle fills it inβ€”that loop persists as an H₁
feature. Our 33 H₁ loops are exactly those cycles that fail to be
filled; they are the β₁ contribution to Ο‡.</p>
<p><strong>Cocycles:</strong> Mathematical representation of which
points form a loop. Used to identify which processes (e.g., lac operon)
form each H1 loop. For visualization: project the 5D feature space to 2D
(PCA), then draw edges between each (process A, process B) pair in the
cocycle; the resulting polygon is the H1 loop.</p>
<p><strong>PCA + Cocycle view:</strong> Makes homology visible by
showing where loops sit in a 2D projection. Each colored polygon
corresponds to one H1 cycle.</p>
<p><strong>Mapper graph:</strong> Nodes = clusters of similar processes;
edges = overlapping clusters. Cycles in the Mapper graph correspond to
H1 loops in homology. Current implementation (n_cubes=12,
perc_overlap=0.65): 18 nodes, 45 edges. Interactive version: <a
href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp_mapper_graph_interactive_v2.html">glmp_mapper_graph_interactive_v2.html</a>.</p>
<h3 id="biological-terms">Biological Terms</h3>
<p><strong>Lac Operon:</strong> Classic gene regulation in E. coli.
Controls lactose digestion genes. Demonstrates negative feedback;
textbook feedback circuit.</p>
<p><strong>Two-Component Signaling (EnvZ-OmpR):</strong> Sensor (EnvZ)
detects signal; response (OmpR) controls genes. Feedback: response
affects sensor. Paradigm bacterial signaling with feedback.</p>
<p><strong>SOS Response:</strong> E. coli emergency DNA repair.
Feedback: damage turns genes on; repair turns them off. SOS appears in
Loop #1 with other stress responses.</p>
<p><strong>Operon:</strong> Group of genes controlled together. Often
has feedback. Lac, trp, ara are examples.</p>
<p><strong>Quorum Sensing:</strong> Bacteria coordinate by signaling. At
quorum, behavior changes (biofilms, toxins). Positive feedback. Appears
in Loop #1.</p>
<hr />
<hr />
<!-- ============================================================ -->
<!-- FEATURED NOTES -->
<!-- ============================================================ -->
<h2 id="featured-notes">Featured Notes</h2>
<p style="font-style: italic; color: #555;">Extended discussions for readers who want to go deeper. Each note is self-contained and can be read independently of the others.</p>
<!-- ---- NOTE FOR MATHEMATICIANS ---- -->
<div style="border-left: 4px solid #2E4D7B; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5f8ff;">
<h3 id="note-for-mathematicians" style="margin-top: 0.25rem; color: #2E4D7B;">πŸ“ Note for Mathematicians: Euler's Formula, Betti Numbers, and the Shape of Regulatory Space</h3>
<p>You've probably seen Euler's formula for polyhedra:</p>
<p style="text-align: center; font-size: 1.1rem;"><strong>V βˆ’ E + F = 2</strong></p>
<p>Vertices minus edges plus faces equals 2. It holds for a cube (8 βˆ’ 12 + 6 = 2), a tetrahedron (4 βˆ’ 6 + 4 = 2), a triangle treated as a flat polyhedron with an outer face (3 βˆ’ 3 + 2 = 2). It looks like a curiosity until you understand what it is actually counting: a running tally of holes at different dimensions, with alternating signs. V counts 0-dimensional things (points). E counts 1-dimensional things (edges, which can close into loops). F counts 2-dimensional things (faces, which can cap loops or bound voids). The alternating + and βˆ’ is the algebraic signature of homology.</p>
<p>Enrico Betti generalized this in the 1870s; PoincarΓ© formalized it in the 1890s. Instead of just V, E, F, you get a sequence of numbers Ξ²β‚€, β₁, Ξ²β‚‚, … β€” <strong>Betti numbers</strong>, one per dimension β€” and the Euler characteristic generalizes to:</p>
<p style="text-align: center; font-size: 1.1rem;"><strong>Ο‡ = Ξ²β‚€ βˆ’ β₁ + Ξ²β‚‚ βˆ’ β₃ + …</strong></p>
<p>Each Betti number counts independent topological features of that dimension:</p>
<ul>
<li><strong>Ξ²β‚€</strong> β€” connected components (pieces). For a single connected shape, Ξ²β‚€ = 1.</li>
<li><strong>β₁</strong> β€” independent loops not bounding any filled region; genuine 1-dimensional holes.</li>
<li><strong>Ξ²β‚‚</strong> β€” enclosed voids; genuine 2-dimensional holes, like the interior of a hollow sphere.</li>
</ul>
<p>A few examples to build intuition:</p>
<ul>
<li>Solid triangle: Ξ²β‚€ = 1, β₁ = 0 (loop is filled), Ξ²β‚‚ = 0. Ο‡ = 1.</li>
<li>Hollow triangle (three edges, no interior): Ξ²β‚€ = 1, β₁ = 1 (loop unfilled), Ξ²β‚‚ = 0. Ο‡ = 0.</li>
<li>Hollow sphere: Ξ²β‚€ = 1, β₁ = 0, Ξ²β‚‚ = 1. Ο‡ = 2.</li>
<li>Torus: Ξ²β‚€ = 1, β₁ = 2 (one loop around the tube, one through the hole), Ξ²β‚‚ = 1. Ο‡ = 0.</li>
</ul>
<p>Euler's original V βˆ’ E + F = 2 is simply Ο‡ = Ξ²β‚€ βˆ’ β₁ + Ξ²β‚‚ = 2 for convex polyhedra, where β₁ = 0 (no through-holes) and Ξ²β‚€ = Ξ²β‚‚ = 1.</p>
<p><strong>Connection to GLMP.</strong> The 108 processes embedded in 5-dimensional feature space via the Vietoris–Rips filtration produce a simplicial complex with its own Betti numbers. Ripser computes them across the filtration:</p>
<ul>
<li>Ξ²β‚€ = 108 at birth (one component per process), collapsing toward 1 as Ξ΅ grows and processes connect.</li>
<li>β₁ = 33 persistent loops β€” 33 independent cycles that are never capped by a 2-simplex across the filtration range we study.</li>
<li>Ξ²β‚‚ = 1 enclosed void β€” consistent with expectations for 108 points in 5D; few such hollow structures form and persist.</li>
</ul>
<p>The alternating sum gives a topological fingerprint of the dataset: Ο‡ = 108 βˆ’ 33 + 1 = 76 at peak complexity. That number would differ under a different feature set, a different organism distribution, or a different pipeline β€” it is a genuine invariant of the data's shape as we have encoded it.</p>
<p><strong>Why β₁ = 33 is the productive dimension.</strong> Ξ²β‚€ describes clustering β€” informative but unsurprising; processes group by complexity. Ξ²β‚‚ = 1 is expected given dimensionality and sample size. β₁ = 33 is where the structure lives: 33 independent ways the circuit data "goes around a hole" without filling it in. Each represents a family of regulatory circuits sharing a structural niche, arranged in a ring with a gap at the center that no single process bridges. Persistence filters signal from noise β€” the five loops with the highest death-minus-birth values are the ones that resist filling across the widest range of Ξ΅, and those are the loops with biological interpretations that hold up.</p>
<p><strong>The deepest point.</strong> Euler's formula says that for any convex polyhedron, however complex, V βˆ’ E + F = 2. The topology is invariant β€” it doesn't depend on how you draw the shape. Betti numbers extend this invariance to arbitrary shapes in arbitrary dimensions. They are not sensitive to embedding, orientation, or continuous deformation β€” only to fundamental topological structure. This is why TDA is a principled choice for data analysis: you are finding invariants, not clusters whose boundaries depend on a threshold, and not model fits that depend on distributional assumptions. The 33 loops are a property of the shape of the data. And that shape, it turns out, reflects the shape of regulatory logic in living cells.</p>
</div>
<!-- ---- NOTES FOR BIOLOGISTS ---- -->
<h3 id="notes-for-biologists" style="color: #2E7D32;">πŸ”¬ Notes for Biologists</h3>
<p style="font-style: italic; color: #555;">Five short discussions on the biological meaning and assumptions behind the TDA results.</p>
<div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;">
<h4 style="margin-top: 0.25rem; color: #2E7D32;">1. Why Not Just Cluster the Circuits?</h4>
<p>A natural first instinct is to run k-means or hierarchical clustering on the feature matrix and call similar circuits a group. Clustering finds dense regions β€” it answers "which processes are near each other?" Topology answers a different question: "what is the shape of the space they occupy?" A ring of processes with a gap in the middle looks like one cluster to k-means (or two clusters if you cut it differently), but it is an H₁ loop to TDA β€” and the gap in the middle is informative. It means there is no "average" regulatory circuit that sits at the center bridging all the others. The hole is the finding. Clustering erases holes; homology counts them.</p>
</div>
<div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;">
<h4 style="margin-top: 0.25rem; color: #2E7D32;">2. What Do the Five Features Actually Capture?</h4>
<p>Biologists sometimes worry that reducing a regulatory circuit to five numbers loses everything important. That worry is legitimate β€” and it is exactly why the five features were chosen carefully. Node count captures overall circuit complexity: how many molecular players are involved. Conditional count (edges) captures connectivity: how densely those players communicate. OR gates capture circuits that respond to any one of several signals β€” alternative pathway logic. AND gates capture circuits that require coincidence of multiple signals β€” conjunction logic, often associated with tighter control. Loops (back-edges in the Mermaid flowchart) directly count explicit feedback structure. Together these five numbers encode the logical architecture of a circuit β€” not its molecular identity, but its computational shape. Two circuits from different organisms with the same architecture will be neighbors in feature space. That is a hypothesis, and the coherence check tests it.</p>
</div>
<div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;">
<h4 style="margin-top: 0.25rem; color: #2E7D32;">3. Why Cross-Organism Loops Are the Most Interesting Result</h4>
<p>When E. coli's SOS response and yeast's unfolded protein response land in the same H₁ loop, organism-level confounding cannot explain it β€” they share no evolutionary recent common ancestor for these particular pathways, no common regulator, no shared molecular machinery. What they share is circuit architecture: both are stress-induced, feedback-regulated, quality-control responses. The topology is grouping by regulatory logic that evolution has apparently arrived at independently in bacteria and eukaryotes. This is convergent evolution at the level of circuit shape rather than sequence. Loop #1 and Loop #5, which both mix organisms, are therefore the strongest evidence that the topology is capturing something real about the biology rather than reflecting database composition artifacts.</p>
</div>
<div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;">
<h4 style="margin-top: 0.25rem; color: #2E7D32;">4. What an H₁ Loop Does and Does Not Claim</h4>
<p>An H₁ loop says: these processes occupy a ring-shaped region in feature space, with a gap at the center. It does <em>not</em> say these processes interact biologically, share a common regulator, or form a pathway. SOS and quorum sensing are in the same loop not because they talk to each other but because they have similar circuit architectures β€” both involve an environmental stress signal, a cascade of regulatory steps, a feedback that shuts the response off once the stress is resolved. The topology is a statement about structural similarity, not biological interaction. This matters for interpretation: the coherence check asks whether known feedback circuits land in the same loops, not whether the loops predict protein-protein interactions. Keeping that distinction clear is essential when presenting to biologists who may read "loop" as a network connection.</p>
</div>
<div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;">
<h4 style="margin-top: 0.25rem; color: #2E7D32;">5. What the Null Model Result Means in Plain Language</h4>
<p>The null model permutation test asks: if we randomly shuffled which process gets which label β€” scrambling the biological identity of every circuit while keeping the feature values β€” how often would we get a coherence score as high as 0.750? In 1,000 random shuffles, we achieved that score only about 22 times (p = 0.022). In plain language: the fact that known feedback circuits cluster coherently in our H₁ loops is very unlikely to be a coincidence. A skeptic might object that the feature set was chosen to favor feedback detection (loops/back-edges are one of the five features), and that objection is fair β€” which is why the feature ablation study matters. Dropping the loops feature reduces coherence sharply, but so does dropping node count and conditional count. The signal is distributed across features, not manufactured by a single one. The null model result and the ablation study together make the case that the topology is reflecting something real, not something we built in by construction.</p>
</div>
<hr />
<h2 id="key-discoveries-and-innovations">Key Discoveries and
Innovations</h2>
<p><strong>Methodological contribution:</strong> A pipeline was
demonstrated: text to visual flowcharts (Mermaid) to features to
topology. Topology is extracted from descriptions, not direct
measurements.</p>
<p><strong>Novel aspects:</strong> (1) Text-to-visual-to-topology
pipeline; (2) Five features: nodes, conditionals, OR gates, AND gates,
loops; (3) Feedback loops = H1 loops (literal correspondence); (4)
LLM-assisted curation at scale.</p>
<p><strong>Main finding:</strong> With the loop-based feature set, known
feedback circuits cluster coherently: SOS, quorum sensing, biofilm in
Loop #1; ara and Pho