| <!DOCTYPE html> |
| <html lang="en"> |
| <head> |
| <meta charset="UTF-8"> |
| <meta name="viewport" content="width=device-width, initial-scale=1.0"> |
| <title>Welz Presentation Notes - TDA Seminar</title> |
| <style> |
| body { font-family: Georgia, 'Times New Roman', serif; max-width: 42rem; margin: 2rem auto; padding: 0 1.5rem; line-height: 1.6; color: #222; } |
| h1, h2, h3 { font-family: 'Segoe UI', system-ui, sans-serif; } |
| a { color: #0066cc; } |
| .featured-note { border-radius: 4px; } |
| h4 { margin-bottom: 0.5rem; } |
| </style> |
| </head> |
| <body> |
|
|
| <h1 id="welz-presentation-notes">Welz Presentation Notes</h1> |
| <h2 |
| id="feedback-loops-as-loops-topological-data-analysis-of-genetic-regulatory-circuits">Feedback |
| Loops as Loops: Topological Data Analysis of Genetic Regulatory |
| Circuits</h2> |
| <p><strong>Presentation:</strong> Gary Welz | CopernicusAI / CUNY |
| Graduate Center (PoI) <strong>Date:</strong> February 27, 2026 |
| <strong>Live deck:</strong> <a |
| href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/TDA_Seminar_Slides.html">TDA_Seminar_Slides.html</a> |
| <strong>Preprint:</strong> <a |
| href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/TDA_PREPRINT_DRAFT.html">HTML</a></p> |
| <hr /> |
| <h2 id="presentation-script-26-slides">Presentation Script (29 |
| Slides)</h2> |
| <h3 id="slide-1-title">Slide 1: Title</h3> |
| <p>Feedback Loops as Loops β Topological Data Analysis of Genetic |
| Regulatory Circuits. Gary Welz, CopernicusAI / CUNY Graduate Center |
| (PoI). February 27, 2026.</p> |
| <hr /> |
| <h3 id="slide-2-from-papers-to-flowcharts">Slide 2: From papers to |
| flowcharts</h3> |
| <p>The first attempt at a beta-galactosidase flow chart was made in 1995 |
| and appeared in an article in <em>The X Advisor</em>, an online magazine |
| for Unix developers, entitled βIs the Genome Like a Computer Program?β |
| The article contained excerpts from conversations with biologists on the |
| bionet.genome.chromosome newsgroup. The article is archived at the |
| Internet Archive; the newsgroup discussions are archived by Google. The |
| 1995 chart was created from text aloneβthe same process that large |
| language models (LLMs) use today. The source was Berg & Singer |
| (1992, pp.Β 71-73). This illustrates that diagrams are only as detailed |
| and reliable as their source material; using different sources for the |
| same process can yield different charts. In the original bionet thread, |
| the genome was proposed as a flowchart with genes connected by logical |
| βandβ and βor.β Robert Robbins replied that flow charts require careful |
| interpretation but that bringing computer-science insights to bear on |
| the genome has potentially huge payoffs. G. Dellaire emphasized that |
| genome structure, not just linear sequence, encodes how the code is |
| readβcontext that is spatial or temporal. The original chart is shown in |
| the slide image.</p> |
| <hr /> |
| <h3 id="slide-3-same-chart-30-years-later">Slide 3: Same chart, 30 years |
| later</h3> |
| <p>The same Lac operon / beta-galactosidase idea is now generated with |
| LLMs and Mermaid Markdown. The original chart was so time-consuming to |
| produce that the approach lay dormant for decades. It is now possible to |
| produce any of these flowcharts from a single prompt in seconds. The Lac |
| Operon flowchart can be viewed in the GLMP viewer via the link on the |
| slide.</p> |
| <hr /> |
| <h3 id="slide-4-the-innovation-text-to-visual-data">Slide 4: The |
| Innovation: Text to Visual Data</h3> |
| <p>Traditional topological data analysis (TDA) starts from numerical |
| data. In this work, the starting point is textβpaper descriptionsβwhich |
| is converted into visual flowcharts first. That shift is what makes the |
| rest possible. The pipeline is: text (papers) to visual flowcharts to |
| features to topology. Mermaid Markdown converts textual process |
| descriptions into structured flowcharts. Flowcharts become visual data, |
| and TDA reveals structure. Topology is extracted from descriptions, not |
| from direct measurements. Novel aspects include: a |
| text-to-visual-to-topology pipeline; five features (nodes, conditionals, |
| OR gates, AND gates, loops); feedback loops corresponding literally to |
| H1 loops in homology; and LLM-assisted curation at scale. The approach |
| is conceptually similar to the Politics case study in Carlsson & |
| Vejdemo-Johansson (2021, pp.Β 199-201) but exhibits these distinct |
| characteristics.</p> |
| <hr /> |
| <h3 id="slide-5-the-question">Slide 5: The Question</h3> |
| <p>The central question is whether the <em>shape</em> of these |
| circuitsβas captured by topologyβaligns with what biologists already |
| know: feedback loops, cascades, and regulatory motifs. Can regulatory |
| structure (feedback, cascades) be detected from circuit topology? |
| Feedback loops are literally loops; they should appear in H1. The work |
| asks whether text-derived visual data can support that.</p> |
| <hr /> |
| <h3 id="slide-6-the-glmp-database">Slide 6: The GLMP Database</h3> |
| <p>The Genome Logic Modeling Project (GLMP) provides 108 processesβeach |
| one a Mermaid flowchart with nodes, conditionals, OR/AND gates, and |
| loops (back-edges). We extract five features per process: nodes, |
| conditionals (aka edges), AND gates, OR gates, loops. The set includes |
| 66 from <em>E. coli</em>, 38 from <em>S. cerevisiae</em>, and 4 from |
| <em>Bacillus subtilis</em>. Examples include lac operon, SOS response, |
| and two-component signaling. A link to the full database table allows |
| any process to be opened for its flowchart. Code is available at |
| github.com/garywelz/glmp.</p> |
| <hr /> |
| <h3 id="slide-7-glmp-references-in-json-and-feedback">Slide 7: GLMP: |
| References in JSON and Feedback</h3> |
| <p>Each process in GLMP is grounded in the literature: the JSON holds |
| PubMed and DOI. The viewer accepts feedback so that flowcharts can be |
| corrected or improved. Flowcharts are thus citable and correctable. In |
| the viewer, Sources & Citations, Metadata, and the |
| Improve-this-process form appear below each flowchart.</p> |
| <hr /> |
| <h3 id="slide-8-from-flowcharts-to-features">Slide 8: From Flowcharts to |
| Features</h3> |
| <p>The full graph structure is not used for TDA. Instead, each flowchart |
| is summarized into five features: nodes, conditionals (aka edges), AND |
| gates, OR gates, and loops (back-edges). Features are standardized to |
| zero mean and unit variance. The matrix is 108 processes Γ 5 features. |
| These capture circuit complexity and logic structure.</p> |
| <hr /> |
| <h3 id="slide-9-tda-pipeline">Slide 9: TDA Pipeline</h3> |
| <p>From the feature matrix, a distance is built between every pair of |
| processes. A Vietoris-Rips filtration is run and Ripser is used to |
| obtain persistence diagrams. Cocycles are extracted; they indicate which |
| processes sit on which topological loop. Output includes persistence |
| diagrams for H0, H1, and H2, plus the membership of each H1 loop.</p> |
| <hr /> |
| <h3 id="slide-10-what-are-we-counting">Slide 10: What Are We Counting? Hβ, Hβ, Hβ</h3> |
| <p>Hβ counts connected componentsβare the pieces connected? In GLMP, Hβ |
| starts at 108 and collapses as the VietorisβRips radius grows. Hβ counts |
| loopsβclosed cycles with no filled face; in gene regulation, feedback |
| loops. The 33 Hβ features are these unfilled cycles; biologically richest |
| for GLMP. Hβ counts enclosed voids (hollow cavities). In cancer GRN work |
| (Masoomy et al., 2021), Hβ in healthy cells = redundant regulatory |
| structures. GLMP yields Hβ = 1.</p> |
| <hr /> |
| <h3 id="slide-11-mathematical-note-1-betti">Slide 11: Mathematical Note (1) β Betti Numbers: History & Geometry</h3> |
| <p>[NEW] Before looking at our results, a brief mathematical grounding β skip this if you'd prefer and come back to it. Betti numbers are named for Enrico Betti (1823β1892), formalized by PoincarΓ© in the 1890s. They count topological "holes" of each dimension: Ξ²β = connected components (pieces), Ξ²β = independent loops that don't bound any filled region (1-dimensional holes), Ξ²β = enclosed voids (2-dimensional holes). Euler's formula for connected planar graphs is Ο = V β E + F = 2, where F includes the outer, unbounded face β the triangle has V=3, E=3, F=2, giving Ο=2; the tetrahedron has V=4, E=6, F=4, also Ο=2; the cube has V=8, E=12, F=6, also Ο=2. This generalizes via Betti numbers to Ο = Ξ²β β Ξ²β + Ξ²β β β¦, the Euler characteristic. The key point for this talk: feedback loops in biology should show up as Ξ²β features β loops in Hβ. That is exactly what we find.</p> |
| <hr /> |
| <h3 id="slide-12-mathematical-note-2-faces">Slide 12: Mathematical Note (2) β Faces, 2-Simplices, and Hβ</h3> |
| <p>[NEW] This slide explains why some loops persist and others don't. In a planar graph, a face is a region bounded by edges, including the outer, unbounded region. In homology, faces correspond to 2-simplices β filled triangles: when three processes are mutually close enough in feature space, the VietorisβRips complex inserts a solid triangle among them. When a cycle of edges exists but no 2-simplex fills it in β no triangle caps it off β that loop is not the boundary of any face, so it cannot be "explained away," and it persists as an Hβ feature. Our 33 Hβ loops are exactly those cycles with no filling triangle. Biologically: a feedback circuit AβBβCβA persists in Hβ when there is no shortcut pathway that cuts across the loop and completes a filled triangle. The literal correspondence between feedback in biology and loops in homology is the conceptual core of this work.</p> |
| <hr /> |
| <h3 id="slide-13-persistence-diagram">Slide 13: Persistence Diagram</h3> |
| <p>The persistence diagram shows one component per process in H0 and 33 |
| loops in H1. The question is whether those H1 loops align with known |
| biologyβfeedback circuits, stress responses, and so on. H2 yields 1 |
| void (expectedβfew points form persistent 2D cavities).</p> |
| <hr /> |
| <h3 id="slide-14-what-do-the-loops-look-like-1-pca-cocycle-edges">Slide |
| 14: What Do the Loops Look Like? (1) PCA + Cocycle Edges</h3> |
| <p>The persistence diagram tells us H1 has 33 loops but not where they |
| sit in the data. To make homology visible, the 5D feature space is |
| projected to 2D via PCA (principal component analysisβfinds directions |
| of maximum variance; preserves distances for visualization), then the |
| cocycle edgesβthe pairs of processes that form each cycleβare drawn. |
| Each colored loop is one H1 cycle: red (#1), blue (#2), green (#3), |
| purple (#4), orange (#5). Lac operon, two-component, and SOS are |
| labeled. <strong>Interactive version (hover for process names):</strong> |
| <a |
| href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp_h1_loops_interactive.html">glmp_h1_loops_interactive.html</a></p> |
| <hr /> |
| <h3 id="slide-15-what-do-the-loops-look-like-2-mapper-graph">Slide 15: |
| What Do the Loops Look Like? (2) Mapper Graph</h3> |
| <p>The Mapper algorithm builds a simplicial complex: cluster nearby |
| processes, then connect clusters that overlap. Each node is a cluster of |
| similar processes (node size = process count); edges connect overlapping |
| clusters. Cycles in this graph correspond to topological loopsβso the |
| loops in the Mapper graph visualize H1 structure in a different way, |
| complementing the persistence diagram and the cocycle-in-PCA view. |
| Current parameters: n_cubes=12, perc_overlap=0.65 β 18 nodes, 45 edges. |
| <strong>Interactive version (click nodes to see processes, search by |
| name):</strong> <a |
| href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp_mapper_graph_interactive_v2.html">glmp_mapper_graph_interactive_v2.html</a></p> |
| <hr /> |
| <h3 id="slide-16-top-h1-loop-1-persistence-0.563">Slide 16: Top H1 Loop |
| #1 (Persistence = 0.563)</h3> |
| <p>The most persistent loop aggregates stress response, protein quality |
| control, and DNA repair: SOS response, quorum sensing, biofilm |
| formation, BER, BAM, ribosome assembly, RNA pol recycling, Type III |
| secretion, ubiquitin-proteasome, UPR. E. coli and yeast; shared βstress |
| + quality control + feedbackβ character.</p> |
| <hr /> |
| <h3 id="slide-17-example-sos-response-loop-1">Slide 17: Example: SOS |
| Response (Loop #1)</h3> |
| <p>The SOS response is E. coliβs emergency DNA repair system: damage |
| activates RecA, which inactivates LexA repressor, inducing repair genes. |
| Classic feedback β repair turns genes off. SOS sits in the top H1 loop |
| alongside quorum sensing, biofilm, UPR, and protein quality control.</p> |
| <hr /> |
| <h3 id="slide-18-top-h1-loop-2-persistence-0.443">Slide 18: Top H1 Loop |
| #2 (Persistence = 0.443)</h3> |
| <p>Six processes: antibiotic efflux pumps, arginine biosynthesis, |
| osmotic stress response, tryptophan biosynthesis, peroxisome biogenesis, |
| vacuolar protein sorting. Metabolic regulation and organelle |
| biogenesisβE. coli and yeast.</p> |
| <hr /> |
| <h3 id="slide-19-top-h1-loop-3-persistence-0.306">Slide 19: Top H1 Loop |
| #3 (Persistence = 0.306)</h3> |
| <p>Six processes: biofilm formation, DNA replication elongation, |
| flagellar assembly, osmotic stress, sigma factor competition, peroxisome |
| biogenesis. Gene regulation, replication, motility, stressβE. coli and |
| yeast.</p> |
| <hr /> |
| <h3 id="slide-20-top-h1-loop-4-persistence-0.279">Slide 20: Top H1 Loop |
| #4 (Persistence = 0.279)</h3> |
| <p>Six processes: phosphate regulation, translation elongation, translation |
| termination, tryptophan biosynthesis, osmotic stress response, sporulation |
| initiation. Gene regulation, translation, stress, developmentalβE. coli, |
| yeast, Bacillus.</p> |
| <hr /> |
| <h3 id="slide-21-top-h1-loop-5-persistence-0.198">Slide 21: Top H1 Loop |
| #5 (Persistence = 0.198)</h3> |
| <p>Five processes: ara operon, maltose regulon, Pho regulon, nitrogen |
| catabolite repression (NCR/TORC1), competence development. Nutrient and |
| developmental regulationβara and Pho are classic feedback circuits. E. |
| coli, yeast, Bacillus.</p> |
| <hr /> |
| <h3 id="slide-22-example-ara-operon-loop-5">Slide 22: Example: Ara |
| Operon (Loop #5)</h3> |
| <p>AraC acts as repressor or activator depending on arabinose; DNA |
| looping and CRP-cAMP integration. Ara sits in Loop #5 with Pho regulon, |
| maltose regulon, nitrogen catabolite repression, and competenceβall |
| nutrient-sensing or developmental decisions with shared regulatory |
| logic.</p> |
| <hr /> |
| <h3 id="slide-23-biological-coherence-check">Slide 23: Biological |
| Coherence Check</h3> |
| <p>With the new loop-based feature set, known feedback circuits cluster |
| coherently: SOS, quorum sensing, biofilm in Loop #1; ara and Pho in Loop |
| #5; trp biosynthesis in Loops #2 and #4. Topology recovers stress, |
| nutrient-sensing, and feedback architecture.</p> |
| <hr /> |
| <h3 id="slide-24-organism-patterns">Slide 24: Organism Patterns</h3> |
| <p>All top five loops mix organisms. Loop #1, #2, #3: E. coli and yeast. |
| Loop #4 and #5: E. coli, yeast, and Bacillus. Regulatory logic |
| transcends organism boundaries.</p> |
| <hr /> |
| <h3 id="slide-25-why-these-features-work">Slide 25: Why These Features |
| Work</h3> |
| <p>Using loop (back-edge) features instead of NOT gates yields richer |
| persistence values and clearer biological groupings. Stress circuits β |
| Loop #1; nutrient-sensing (ara, Pho) β Loop #5; metabolic feedback (trp) |
| β Loops #2 and #4. The new feature set is a distinct experiment; results |
| are richer and more interpretable.</p> |
| <hr /> |
| <h3 id="slide-26-limitations-and-caveats">Slide 26: Limitations and |
| Caveats</h3> |
| <p>Sample size: 108 processesβenough to reveal structure; scaling to |
| 200-500+ is a priority. Five features (nodes, conditionals, OR gates, |
| AND gates, loops); ablation shows node_count carries the most weight. |
| Graph-theoretic features (cycle rank, longest path, gate ratios) are |
| planned. LLM-generated flowcharts require expert fact-checking. Open |
| question: Does topology predict function or correlate with known |
| biology? The coherence check supports the latter.</p> |
| <hr /> |
| <h3 id="slide-27-next-steps">Slide 27: Next Steps</h3> |
| <p>Directions include Mapper, ablation and null-model validation, richer |
| features. Longer-term goal: flowcharts and TDA as a Rosetta Stone |
| linking topology to genetic βmachine codeββsequence motifs for AND/OR. |
| Falsifiable if circuits in the same H1 loop share enriched motifs. Null |
| model permutation test: p = 0.022. Graph-theoretic features, persistent |
| cohomology, scaling to 200-500+ planned. Code: |
| github.com/garywelz/glmp/tree/main/tda-analysis.</p> |
| <hr /> |
| <h3 id="slide-28-references">Slide 28: References</h3> |
| <p>Carlsson & Vejdemo-Johansson (2021). Bauer (2021). Berg & |
| Singer (1992). Masoomy et al. (2021). Rivera-Cancel et al.Β (2014). Swingle et al.Β (2025). |
| Tralie et al.Β (2018). Welz (1995).</p> |
| <hr /> |
| <h3 id="slide-29-acknowledgments-and-questions">Slide 29: |
| Acknowledgments and Questions</h3> |
| <p>Jordan Matuszewski; CUNY Graduate Center TDA seminar group; Kevin |
| Gardner and colleagues (ASRC, CCNY). GLMP and TDA analysis: |
| github.com/garywelz/glmp. Contact: Gary Welz | <a href="/cdn-cgi/l/email-protection" class="__cf_email__" data-cfemail="3f585e4d46485a53457f58525e5653115c5052">[email protected]</a> | |
| 917-593-2537.</p> |
| <hr /> |
| <h2 id="glossary-of-terms">Glossary of Terms</h2> |
| <h3 id="tda-terms">TDA Terms</h3> |
| <p><strong>Persistent Homology (H0, H1, H2):</strong> Hβ counts connected |
| componentsβthe number of disconnected pieces. In GLMP, Hβ begins at 108 |
| (one per process) and collapses as the VietorisβRips radius grows. Hβ |
| counts loopsβclosed cycles with no filled-in face. In a gene regulatory |
| network, this corresponds to a feedback loop: gene A activates B, B |
| activates C, C represses A; the loop persists because no 2-simplex |
| (filled triangle) caps it off. The 33 Hβ features in GLMP are precisely |
| these unfilled cycles. Hβ counts enclosed voidsβhollow cavities, like the |
| interior of a sphere. In cancer GRN work (Masoomy et al., 2021), Hβ |
| features in healthy cells were interpreted as redundant regulatory |
| structures. GLMP yields Hβ = 1. The intuitive ladder: Hβ asks βare the |
| pieces connected?β; Hβ asks βare there feedback loops?β; Hβ asks βare |
| there enclosed cavities?β For GLMP, Hβ is biologically richest: feedback |
| loops are literally loops.</p> |
| <p><strong>Persistence (birth, death):</strong> Birth = distance scale |
| where a feature appears. Death = scale where it disappears. Persistence |
| = death minus birth = significance. Loop #1: persistence 0.563; Loop #2: |
| 0.443; Loop #3: 0.306.</p> |
| <p><strong>Vietoris-Rips Complex:</strong> Points connect when within |
| distance epsilon; epsilon increases gradually. At each scale, shapes |
| (clusters, loops, voids) form. Loops persisting across scales are |
| treated as real structure.</p> |
| <p><strong>Betti numbers (Ξ²β, Ξ²β, Ξ²β, β¦):</strong> The ranks of the |
| homology groups; they count βholesβ of each dimension. Named for Enrico |
| Betti (1823β1892), formalized by Henri PoincarΓ© in the 1890s. |
| Geometrically: Ξ²β = connected components (pieces); Ξ²β = loopsβclosed |
| paths that do <em>not</em> bound any filled region (1-D holes); Ξ²β = |
| enclosed voids (2-D holes). A cycle is a closed path with no boundary; a |
| cycle that is not the boundary of any filled region represents a hole. |
| Betti numbers are topological invariantsβstable under continuous |
| deformation. In our work: Ξ²β = 108 (components), Ξ²β = 33 (loops), Ξ²β = 1 |
| (void).</p> |
| <p><strong>Euler characteristic (Ο):</strong> For connected planar |
| graphs, Ο = V β E + F = 2, where F includes the <em>outer</em> |
| (unbounded) face. Examples: triangle (V=3, E=3, F=2) β Ο=2; square (V=4, |
| E=4, F=2) β Ο=2; tetrahedron (4,6,4) β Ο=2; cube (8,12,6) β Ο=2. This |
| generalizes to Ο = Ξ²β β Ξ²β + Ξ²β β β¦ via the Betti numbers. The Euler |
| characteristic sits at the foundation of persistent homology.</p> |
| <p><strong>Faces and 2-simplices:</strong> In a planar graph, a |
| <em>face</em> is a region bounded by edgesβincluding the outer region. |
| In homology, faces correspond to <em>2-simplices</em> (filled |
| triangles): three vertices within the distance threshold form a triangle |
| whose interior fills in the loop. When a loop is <em>not</em> bounded by |
| any 2-simplexβno triangle fills it inβthat loop persists as an Hβ |
| feature. Our 33 Hβ loops are exactly those cycles that fail to be |
| filled; they are the Ξ²β contribution to Ο.</p> |
| <p><strong>Cocycles:</strong> Mathematical representation of which |
| points form a loop. Used to identify which processes (e.g., lac operon) |
| form each H1 loop. For visualization: project the 5D feature space to 2D |
| (PCA), then draw edges between each (process A, process B) pair in the |
| cocycle; the resulting polygon is the H1 loop.</p> |
| <p><strong>PCA + Cocycle view:</strong> Makes homology visible by |
| showing where loops sit in a 2D projection. Each colored polygon |
| corresponds to one H1 cycle.</p> |
| <p><strong>Mapper graph:</strong> Nodes = clusters of similar processes; |
| edges = overlapping clusters. Cycles in the Mapper graph correspond to |
| H1 loops in homology. Current implementation (n_cubes=12, |
| perc_overlap=0.65): 18 nodes, 45 edges. Interactive version: <a |
| href="https://storage.googleapis.com/regal-scholar-453620-r7-podcast-storage/glmp_mapper_graph_interactive_v2.html">glmp_mapper_graph_interactive_v2.html</a>.</p> |
| <h3 id="biological-terms">Biological Terms</h3> |
| <p><strong>Lac Operon:</strong> Classic gene regulation in E. coli. |
| Controls lactose digestion genes. Demonstrates negative feedback; |
| textbook feedback circuit.</p> |
| <p><strong>Two-Component Signaling (EnvZ-OmpR):</strong> Sensor (EnvZ) |
| detects signal; response (OmpR) controls genes. Feedback: response |
| affects sensor. Paradigm bacterial signaling with feedback.</p> |
| <p><strong>SOS Response:</strong> E. coli emergency DNA repair. |
| Feedback: damage turns genes on; repair turns them off. SOS appears in |
| Loop #1 with other stress responses.</p> |
| <p><strong>Operon:</strong> Group of genes controlled together. Often |
| has feedback. Lac, trp, ara are examples.</p> |
| <p><strong>Quorum Sensing:</strong> Bacteria coordinate by signaling. At |
| quorum, behavior changes (biofilms, toxins). Positive feedback. Appears |
| in Loop #1.</p> |
| <hr /> |
| <hr /> |
|
|
| |
| |
| |
|
|
| <h2 id="featured-notes">Featured Notes</h2> |
|
|
| <p style="font-style: italic; color: #555;">Extended discussions for readers who want to go deeper. Each note is self-contained and can be read independently of the others.</p> |
|
|
| |
|
|
| <div style="border-left: 4px solid #2E4D7B; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5f8ff;"> |
| <h3 id="note-for-mathematicians" style="margin-top: 0.25rem; color: #2E4D7B;">π Note for Mathematicians: Euler's Formula, Betti Numbers, and the Shape of Regulatory Space</h3> |
|
|
| <p>You've probably seen Euler's formula for polyhedra:</p> |
|
|
| <p style="text-align: center; font-size: 1.1rem;"><strong>V β E + F = 2</strong></p> |
|
|
| <p>Vertices minus edges plus faces equals 2. It holds for a cube (8 β 12 + 6 = 2), a tetrahedron (4 β 6 + 4 = 2), a triangle treated as a flat polyhedron with an outer face (3 β 3 + 2 = 2). It looks like a curiosity until you understand what it is actually counting: a running tally of holes at different dimensions, with alternating signs. V counts 0-dimensional things (points). E counts 1-dimensional things (edges, which can close into loops). F counts 2-dimensional things (faces, which can cap loops or bound voids). The alternating + and β is the algebraic signature of homology.</p> |
|
|
| <p>Enrico Betti generalized this in the 1870s; PoincarΓ© formalized it in the 1890s. Instead of just V, E, F, you get a sequence of numbers Ξ²β, Ξ²β, Ξ²β, β¦ β <strong>Betti numbers</strong>, one per dimension β and the Euler characteristic generalizes to:</p> |
|
|
| <p style="text-align: center; font-size: 1.1rem;"><strong>Ο = Ξ²β β Ξ²β + Ξ²β β Ξ²β + β¦</strong></p> |
|
|
| <p>Each Betti number counts independent topological features of that dimension:</p> |
| <ul> |
| <li><strong>Ξ²β</strong> β connected components (pieces). For a single connected shape, Ξ²β = 1.</li> |
| <li><strong>Ξ²β</strong> β independent loops not bounding any filled region; genuine 1-dimensional holes.</li> |
| <li><strong>Ξ²β</strong> β enclosed voids; genuine 2-dimensional holes, like the interior of a hollow sphere.</li> |
| </ul> |
|
|
| <p>A few examples to build intuition:</p> |
| <ul> |
| <li>Solid triangle: Ξ²β = 1, Ξ²β = 0 (loop is filled), Ξ²β = 0. Ο = 1.</li> |
| <li>Hollow triangle (three edges, no interior): Ξ²β = 1, Ξ²β = 1 (loop unfilled), Ξ²β = 0. Ο = 0.</li> |
| <li>Hollow sphere: Ξ²β = 1, Ξ²β = 0, Ξ²β = 1. Ο = 2.</li> |
| <li>Torus: Ξ²β = 1, Ξ²β = 2 (one loop around the tube, one through the hole), Ξ²β = 1. Ο = 0.</li> |
| </ul> |
|
|
| <p>Euler's original V β E + F = 2 is simply Ο = Ξ²β β Ξ²β + Ξ²β = 2 for convex polyhedra, where Ξ²β = 0 (no through-holes) and Ξ²β = Ξ²β = 1.</p> |
|
|
| <p><strong>Connection to GLMP.</strong> The 108 processes embedded in 5-dimensional feature space via the VietorisβRips filtration produce a simplicial complex with its own Betti numbers. Ripser computes them across the filtration:</p> |
| <ul> |
| <li>Ξ²β = 108 at birth (one component per process), collapsing toward 1 as Ξ΅ grows and processes connect.</li> |
| <li>Ξ²β = 33 persistent loops β 33 independent cycles that are never capped by a 2-simplex across the filtration range we study.</li> |
| <li>Ξ²β = 1 enclosed void β consistent with expectations for 108 points in 5D; few such hollow structures form and persist.</li> |
| </ul> |
|
|
| <p>The alternating sum gives a topological fingerprint of the dataset: Ο = 108 β 33 + 1 = 76 at peak complexity. That number would differ under a different feature set, a different organism distribution, or a different pipeline β it is a genuine invariant of the data's shape as we have encoded it.</p> |
|
|
| <p><strong>Why Ξ²β = 33 is the productive dimension.</strong> Ξ²β describes clustering β informative but unsurprising; processes group by complexity. Ξ²β = 1 is expected given dimensionality and sample size. Ξ²β = 33 is where the structure lives: 33 independent ways the circuit data "goes around a hole" without filling it in. Each represents a family of regulatory circuits sharing a structural niche, arranged in a ring with a gap at the center that no single process bridges. Persistence filters signal from noise β the five loops with the highest death-minus-birth values are the ones that resist filling across the widest range of Ξ΅, and those are the loops with biological interpretations that hold up.</p> |
|
|
| <p><strong>The deepest point.</strong> Euler's formula says that for any convex polyhedron, however complex, V β E + F = 2. The topology is invariant β it doesn't depend on how you draw the shape. Betti numbers extend this invariance to arbitrary shapes in arbitrary dimensions. They are not sensitive to embedding, orientation, or continuous deformation β only to fundamental topological structure. This is why TDA is a principled choice for data analysis: you are finding invariants, not clusters whose boundaries depend on a threshold, and not model fits that depend on distributional assumptions. The 33 loops are a property of the shape of the data. And that shape, it turns out, reflects the shape of regulatory logic in living cells.</p> |
| </div> |
|
|
| |
|
|
| <h3 id="notes-for-biologists" style="color: #2E7D32;">π¬ Notes for Biologists</h3> |
|
|
| <p style="font-style: italic; color: #555;">Five short discussions on the biological meaning and assumptions behind the TDA results.</p> |
|
|
| <div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;"> |
| <h4 style="margin-top: 0.25rem; color: #2E7D32;">1. Why Not Just Cluster the Circuits?</h4> |
| <p>A natural first instinct is to run k-means or hierarchical clustering on the feature matrix and call similar circuits a group. Clustering finds dense regions β it answers "which processes are near each other?" Topology answers a different question: "what is the shape of the space they occupy?" A ring of processes with a gap in the middle looks like one cluster to k-means (or two clusters if you cut it differently), but it is an Hβ loop to TDA β and the gap in the middle is informative. It means there is no "average" regulatory circuit that sits at the center bridging all the others. The hole is the finding. Clustering erases holes; homology counts them.</p> |
| </div> |
|
|
| <div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;"> |
| <h4 style="margin-top: 0.25rem; color: #2E7D32;">2. What Do the Five Features Actually Capture?</h4> |
| <p>Biologists sometimes worry that reducing a regulatory circuit to five numbers loses everything important. That worry is legitimate β and it is exactly why the five features were chosen carefully. Node count captures overall circuit complexity: how many molecular players are involved. Conditional count (edges) captures connectivity: how densely those players communicate. OR gates capture circuits that respond to any one of several signals β alternative pathway logic. AND gates capture circuits that require coincidence of multiple signals β conjunction logic, often associated with tighter control. Loops (back-edges in the Mermaid flowchart) directly count explicit feedback structure. Together these five numbers encode the logical architecture of a circuit β not its molecular identity, but its computational shape. Two circuits from different organisms with the same architecture will be neighbors in feature space. That is a hypothesis, and the coherence check tests it.</p> |
| </div> |
|
|
| <div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;"> |
| <h4 style="margin-top: 0.25rem; color: #2E7D32;">3. Why Cross-Organism Loops Are the Most Interesting Result</h4> |
| <p>When E. coli's SOS response and yeast's unfolded protein response land in the same Hβ loop, organism-level confounding cannot explain it β they share no evolutionary recent common ancestor for these particular pathways, no common regulator, no shared molecular machinery. What they share is circuit architecture: both are stress-induced, feedback-regulated, quality-control responses. The topology is grouping by regulatory logic that evolution has apparently arrived at independently in bacteria and eukaryotes. This is convergent evolution at the level of circuit shape rather than sequence. Loop #1 and Loop #5, which both mix organisms, are therefore the strongest evidence that the topology is capturing something real about the biology rather than reflecting database composition artifacts.</p> |
| </div> |
|
|
| <div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;"> |
| <h4 style="margin-top: 0.25rem; color: #2E7D32;">4. What an Hβ Loop Does and Does Not Claim</h4> |
| <p>An Hβ loop says: these processes occupy a ring-shaped region in feature space, with a gap at the center. It does <em>not</em> say these processes interact biologically, share a common regulator, or form a pathway. SOS and quorum sensing are in the same loop not because they talk to each other but because they have similar circuit architectures β both involve an environmental stress signal, a cascade of regulatory steps, a feedback that shuts the response off once the stress is resolved. The topology is a statement about structural similarity, not biological interaction. This matters for interpretation: the coherence check asks whether known feedback circuits land in the same loops, not whether the loops predict protein-protein interactions. Keeping that distinction clear is essential when presenting to biologists who may read "loop" as a network connection.</p> |
| </div> |
|
|
| <div style="border-left: 4px solid #2E7D32; padding: 0.75rem 1.25rem; margin: 1.5rem 0; background: #f5fff5;"> |
| <h4 style="margin-top: 0.25rem; color: #2E7D32;">5. What the Null Model Result Means in Plain Language</h4> |
| <p>The null model permutation test asks: if we randomly shuffled which process gets which label β scrambling the biological identity of every circuit while keeping the feature values β how often would we get a coherence score as high as 0.750? In 1,000 random shuffles, we achieved that score only about 22 times (p = 0.022). In plain language: the fact that known feedback circuits cluster coherently in our Hβ loops is very unlikely to be a coincidence. A skeptic might object that the feature set was chosen to favor feedback detection (loops/back-edges are one of the five features), and that objection is fair β which is why the feature ablation study matters. Dropping the loops feature reduces coherence sharply, but so does dropping node count and conditional count. The signal is distributed across features, not manufactured by a single one. The null model result and the ablation study together make the case that the topology is reflecting something real, not something we built in by construction.</p> |
| </div> |
|
|
| <hr /> |
|
|
| <h2 id="key-discoveries-and-innovations">Key Discoveries and |
| Innovations</h2> |
| <p><strong>Methodological contribution:</strong> A pipeline was |
| demonstrated: text to visual flowcharts (Mermaid) to features to |
| topology. Topology is extracted from descriptions, not direct |
| measurements.</p> |
| <p><strong>Novel aspects:</strong> (1) Text-to-visual-to-topology |
| pipeline; (2) Five features: nodes, conditionals, OR gates, AND gates, |
| loops; (3) Feedback loops = H1 loops (literal correspondence); (4) |
| LLM-assisted curation at scale.</p> |
| <p><strong>Main finding:</strong> With the loop-based feature set, known |
| feedback circuits cluster coherently: SOS, quorum sensing, biofilm in |
| Loop #1; ara and Pho |