122 kB

Title: Causal Discovery in Astrophysics: Unraveling Supermassive Black Hole and Galaxy Coevolution

URL Source: https://arxiv.org/html/2410.00965

Markdown Content: Back to arXiv

This is experimental HTML to improve accessibility. We invite you to report rendering errors. Use Alt+Y to toggle on accessible reporting links and Alt+Shift+Y to toggle off. Learn more about this project and help improve conversions.

Why HTML? Report Issue Back to Abstract Download PDF Abstract 1Introduction 2Causal Inference and Discovery 3Data 4Exact Posterior Methodology 5A Compendium of Causal Structures 6Posterior Distribution Inspection 7Findings 8Experiment with Semi-analytic Models 9Crosschecking with PC and FCI 10Testing Limitations 11Possible Extension to More Variables with DAG-GFN 12Conclusions References

HTML conversions sometimes display errors due to content that did not convert correctly from the source. This paper uses the following packages that are not yet supported by the HTML conversion tool. Feedback on these issues are not necessary; they are known and are being worked on.

failed: CJK failed: scalerel failed: dashrule

Authors: achieve the best HTML results from your LaTeX submissions by following these best practices.

License: CC BY 4.0 arXiv:2410.00965v2 [astro-ph.GA] null Causal Discovery in Astrophysics: Unraveling Supermassive Black Hole and Galaxy Coevolution Zehao Jin (金泽灏) New York University Abu Dhabi, P.O. Box 129188, Abu Dhabi, United Arab Emirates Center for Astrophysics and Space Science (CASS), New York University Abu Dhabi, P.O. Box 129188, Abu Dhabi, UAE Montréal Institute for Astrophysical Data Analysis and Machine Learning (Ciela), Montréal, Canada Mario Pasquato These authors contributed equally to this work. Montréal Institute for Astrophysical Data Analysis and Machine Learning (Ciela), Montréal, Canada Montréal Institute for Learning Algorithms (Mila), Quebec Artificial Intelligence Institute, 6666 Rue Saint-Urbain, Montréal, Canada Département de Physique, Université de Montréal, 1375 Avenue Thérèse-Lavoie-Roux, Montréal, Canada Dipartimento di Fisica e Astronomia, Università di Padova, Vicolo dell’Osservatorio 5, Padova, Italy Istituto di Astrofisica Spaziale e Fisica Cosmica (INAF IASF-MI), Via Alfonso Corti 12, I-20133, Milan, Italy Benjamin L. Davis These authors contributed equally to this work. New York University Abu Dhabi, P.O. Box 129188, Abu Dhabi, United Arab Emirates Center for Astrophysics and Space Science (CASS), New York University Abu Dhabi, P.O. Box 129188, Abu Dhabi, UAE Tristan Deleu Montréal Institute for Learning Algorithms (Mila), Quebec Artificial Intelligence Institute, 6666 Rue Saint-Urbain, Montréal, Canada Département d’Informatique et de Recherche Opérationnelle, Université de Montréal, 2920 Chemin de la Tour, Montréal, Canada Yu Luo (罗煜) Department of Physics, School of Physics and Electronics, Hunan Normal University, Changsha 410081, China Purple Mountain Observatory, 10 Yuan Hua Road, Nanjing 210034, China Changhyun Cho New York University Abu Dhabi, P.O. Box 129188, Abu Dhabi, United Arab Emirates Center for Astrophysics and Space Science (CASS), New York University Abu Dhabi, P.O. Box 129188, Abu Dhabi, UAE Pablo Lemos Montréal Institute for Astrophysical Data Analysis and Machine Learning (Ciela), Montréal, Canada Montréal Institute for Learning Algorithms (Mila), Quebec Artificial Intelligence Institute, 6666 Rue Saint-Urbain, Montréal, Canada Département de Physique, Université de Montréal, 1375 Avenue Thérèse-Lavoie-Roux, Montréal, Canada Laurence Perreault-Levasseur Montréal Institute for Astrophysical Data Analysis and Machine Learning (Ciela), Montréal, Canada Montréal Institute for Learning Algorithms (Mila), Quebec Artificial Intelligence Institute, 6666 Rue Saint-Urbain, Montréal, Canada Département de Physique, Université de Montréal, 1375 Avenue Thérèse-Lavoie-Roux, Montréal, Canada Center for Computational Astrophysics, Flatiron Institute, New York, NY, United States of America Yoshua Bengio Montréal Institute for Learning Algorithms (Mila), Quebec Artificial Intelligence Institute, 6666 Rue Saint-Urbain, Montréal, Canada Département d’Informatique et de Recherche Opérationnelle, Université de Montréal, 2920 Chemin de la Tour, Montréal, Canada Canadian Institute for Advanced Research Artificial Intelligence Chair Canadian Institute for Advanced Research Senior Fellow Xi Kang (康熙) Institute for Astronomy, Zhejiang University, Hangzhou 310027, China Purple Mountain Observatory, 10 Yuan Hua Road, Nanjing 210034, China Andrea Valerio Macciò New York University Abu Dhabi, P.O. Box 129188, Abu Dhabi, United Arab Emirates Center for Astrophysics and Space Science (CASS), New York University Abu Dhabi, P.O. Box 129188, Abu Dhabi, UAE Max-Planck-Institut für Astronomie, Königstuhl 17, 69117 Heidelberg, Germany Yashar Hezaveh Montréal Institute for Astrophysical Data Analysis and Machine Learning (Ciela), Montréal, Canada Montréal Institute for Learning Algorithms (Mila), Quebec Artificial Intelligence Institute, 6666 Rue Saint-Urbain, Montréal, Canada Département de Physique, Université de Montréal, 1375 Avenue Thérèse-Lavoie-Roux, Montréal, Canada Center for Computational Astrophysics, Flatiron Institute, New York, NY, United States of America (Received September 16, 2024; Revised December 10, 2024; Accepted December 10, 2024) Abstract

Correlation does not imply causation, but patterns of statistical association between variables can be exploited to infer a causal structure (even with purely observational data) with the burgeoning field of causal discovery. As a purely observational science, astrophysics has much to gain by exploiting these new methods. The supermassive black hole (SMBH)–galaxy interaction has long been constrained by observed scaling relations, that is low-scatter correlations between variables such as SMBH mass and the central velocity dispersion of stars in a host galaxy’s bulge. This study, using advanced causal discovery techniques and an up-to-date dataset, reveals a causal link between galaxy properties and dynamically-measured SMBH masses. We apply a score-based Bayesian framework to compute the exact conditional probabilities of every causal structure that could possibly describe our galaxy sample. With the exact posterior distribution, we determine the most likely causal structures and notice a probable causal reversal when separating galaxies by morphology. In elliptical galaxies, bulge properties (built from major mergers) tend to influence SMBH growth, while in spiral galaxies, SMBHs are seen to affect host galaxy properties, potentially through feedback in gas-rich environments. For spiral galaxies, SMBHs progressively quench star formation, whereas in elliptical galaxies, quenching is complete, and the causal connection has reversed. Our findings support theoretical models of hierarchical assembly of galaxies and active galactic nuclei feedback regulating galaxy evolution. Our study suggests the potentiality for further exploration of causal links in astrophysical and cosmological scaling relations, as well as any other observational science.

Astrostatistics (1882); Black hole physics (159); Galaxies (573); Galaxy evolution (594); Galaxy formation (595); Galaxy physics (612); Galaxy properties (615); Scaling relations (2031); Supermassive black holes (1663) †journal: ApJ †software: causal-learn (Zheng et al., 2024) DAG-GFN (Deleu et al., 2022) Gym (Brockman et al., 2016) JAX (Bradbury et al., 2018) Matplotlib (Hunter, 2007) NetworkX (Hagberg et al., 2008) NumPy (Harris et al., 2020) Pandas (McKinney, 2010) pgmpy (Ankur Ankan & Abinash Panda, 2015) PyGraphviz Python (Van Rossum & Drake, 2009) SciPy (Virtanen et al., 2020) seaborn (Waskom, 2021) {CJK*}

UTF8gbsn

1Introduction

Purely observational sciences have long relied on correlations between variables to assess the validity of theoretical models. However, observed correlations between two variables do not provide information about the direction of causality, making it impossible to discriminate between different causal mechanisms that predict the same correlational trends through traditional methods. While interventions (such as randomized controlled trials) are commonly used to identify causal factors in laboratory settings, this is rarely possible in observational fields such as astrophysics. Causal inference overcomes this limitation by exploiting the fact that different causal models produce distinct joint distributions of correlated variables with additional observables, allowing us to discriminate between these models and shed light on the direction of causality. With this, it becomes possible to investigate causal relationships in a purely data-driven manner.

The existence of correlations between the mass of central supermassive black holes (SMBHs) and the properties of their host galaxies has long been observationally established (Magorrian et al., 1998; Ferrarese & Merritt, 2000; Gebhardt et al., 2000) and reproduced by specific prescriptions in numerical simulations (Soliman et al., 2023). More recently, Natarajan et al. (2023a, b) presented QUOTAS, a novel research platform for the data-driven investigation of SMBH populations, combining simulations, observations, and machine learning to explore the co-evolution of SMBHs and their hosts. High-redshift, luminous quasar populations at 𝑧 ≥ 3 alongside simulated data of the same epochs are assembled and co-located to demonstrate the connection between SMBH host galaxies and their parent dark matter halos. However, unveiling the causal structure underpinning these correlations has remained an open problem: does galaxy evolution influence the growth of SMBHs by regulating accretion, or do SMBHs shape their host galaxies’ properties via feedback (Silk & Rees, 1998; Di Matteo et al., 2005, 2008; Hopkins et al., 2006, 2007, 2008a, 2008b; Schaye et al., 2010; Gaspari et al., 2013; Kormendy & Ho, 2013; Delvecchio et al., 2014; Heckman & Best, 2014; Aird et al., 2015; Peca et al., 2023; Pouliasis et al., 2024)? With the advent of the James Webb Space Telescope, this debate has been reinvigorated by the detection of massive high-redshift quasars (Larson et al., 2023).

The few attempts at identifying causal relations in the astrophysical literature focus on two variables at a time or on estimating causal coefficients given a causal structure (causal inference). Pasquato & Matsiuk (2019) used a regression discontinuity approach (Imbens & Lemieux, 2008) to measure the causal effect of galactic disk-shocking (Ostriker et al., 1972) on open star cluster properties. A similar approach was followed by Pang et al. (2021) to measure the causal effect of a supernova explosion in the Vela OB2 complex on star formation.

Ellison et al. (2019) used a matching strategy to measure the causal effect of galaxy mergers on active galactic nuclei (AGNs) activity. Matching is a popular way of controlling for confounds in quasi-experimental data, where assignment to treatment is not determined at random (Greenwood, 1945). A precursor to matching in the astrophysical literature is the study of “second-parameter pairs” in globular clusters (Catelan et al., 2001): globular clusters were matched based on metallicity and other properties, looking for the reason one member of the pair displayed a hot horizontal branch and its match would not.

Bluck et al. (2022) utilized a Random Forest classifier to extract causal insights from observations to find the most predictive parameters associated with the quenching of star formation. Gebhard et al. (2022) applied a denoising technique based on causal principles, half-sibling regression (Schölkopf et al., 2016), to exoplanet imaging. In physics, outside of the context of astronomy and cosmology, causal techniques have found direct application in geophysics (Runge et al., 2019) and climate science (Di Capua et al., 2020), and have functioned as a basis for theoretical development in quantum foundations (Spekkens, 2023; Allen et al., 2017; Leifer & Spekkens, 2013; Wood & Spekkens, 2015) and thermodynamics (Janzing, 2007; Allahverdyan & Janzing, 2008). Our work builds upon a preliminary pilot study to explore causal connections in galaxy–SMBH systems (Pasquato et al., 2023; Pasquato, 2024).

In this paper, we present a first-of-its-kind causal study of galaxies and their SMBHs, ultimately finding a compelling data-driven result. In the text, we provide a primer on causal inference/discovery (§2), a detailed accounting of our data (§3), our exact posterior methodology (§4), the causal structures identified (§5), a deeper look into the distribution of identified causal structures (§6), our findings as they pertain to galaxy evolution and AGNs’ feedback (§7), an experiment with semi-analytical models (§8), a cross-check with alternative causal discovery methods (§9), a series of discussions on limitations of the data and the method (§10) including unobserved confounders (§10.1), observational errors (§10.2), outliers (§10.3), cyclicity (§10.4), and different priors (§10.5). Finally, we show possible extension of our methods into more complex scenarios (§11), and conclude by offering some insights and future directions (§12). All uncertainties are quoted at 1 𝜎 ≡ 68.3 % confidence intervals.

2Causal Inference and Discovery

The seminal book Causality (Pearl, 2009) introduced operational definitions for the presence of several types of causal relations between different variables.1 While these build on empirically observable statistical dependencies between pairs of variables, they leverage the presence of additional variables to break the symmetry inherent in such associations. For instance, if variables 𝑋 and 𝑌 are dependent when conditioning on any set of other variables 𝑆 (that is, they are persistently associated) and there exists a third variable 𝑍 such that (conditional on some 𝑆 ) 𝑋 and 𝑍 are independent while 𝑍 and 𝑌 are dependent (there is something else, independent of 𝑋 , with which 𝑌 is associated), then 𝑋 is dubbed a potential cause of 𝑌 . In addition to potential cause, Causality (Pearl, 2009) also defines the notions of genuine cause and spurious association.

In the following subsections, we gradually introduce readers to casual structures and the methods to infer them. We start with the three fundamental causal structures (§2.1) and discussing how larger, more complicated causal structures can be built from the three fundamental building blocks (§2.2). In §2.3, we present two common approaches for discovering causal structures from purely-observational data: constraint-based methods (§2.3.1) and score-based methods (§2.3.2). Finally, we provide a list of terminology that is used in the field of causal inference, in §2.4 and its Table 2.

2.1Basic Causal Structures

Causal structures are often represented by Directed Acyclic Graphs (DAGs), which are made of nodes and edges. A directed edge between nodes suggests the direction of causality, i.e., 𝐴 → 𝐵 means 𝐴 causes 𝐵 . There are three basic causal structures: chains, forks, and colliders (Figure 1).

Figure 1: Three basic causal models, their (conditional) independencies, and a more complex graph on the right. In a complex graph, variables are potentially connected through multiple paths made of chains, forks, and colliders. Following the d-separation rules introduced in §2.2 and Table 1, one should find 𝑍 2 ⟂ ⟂ 𝑋 , 𝑍 2 ⟂ ⟂ 𝑋 ∣ ( 𝑍 3 , 𝑍 1 ) , and 𝑍 2 ⟂ ⟂ 𝑋 ∣ ( 𝑍 3 , 𝑍 1 , 𝑌 ) . uncondition condition Chain unblock block Fork unblock block Collider block unblock Table 1:In a DAG with multiple nodes, variables can potentially be connected through multiple paths, with several chains, forks, or colliders. Two variables are defined to be d-separated, i.e., (conditionally) independent, if every path between them is blocked. A path is blocked or unblocked based on the rules outlined in the above table. A more detailed description and a case study can be found in §2.2. •

In the case of a chain, 𝑋 causes ( → ) 𝑍 , and 𝑍 causes ( → ) 𝑌 . In a chain model, 𝑋 and 𝑌 are not independent ( 𝑋 ⟂ ⟂ 𝑌 ) without conditioning on 𝑍 . For example, consider three standing dominoes in order 𝑋 , 𝑍 , and 𝑌 . The falling of 𝑋 will cause 𝑍 to fall, which in turn will cause 𝑌 to fall. However, when we condition on 𝑍 , the other two variables, 𝑋 and 𝑌 , will be independent ( 𝑋 ⟂ ⟂ 𝑌 ∣ 𝑍 ). In other words, if we let domino 𝑍 fall, the subsequent domino 𝑌 will fall regardless of whether the prior domino 𝑋 fell or not.

•

In the case of a fork, a single variable 𝑍 , called a confounder, causally influences two other variables 𝑋 and 𝑌 . For instance, consider the influence of rainy weather ( 𝑍 ) on both umbrella sales ( 𝑋 ) and the number of people jogging outside ( 𝑌 ). On rainy days, more umbrellas are likely to be sold, and less people will go out for a jog. In a fork model, without conditioning on the confounder 𝑍 , the other variables, 𝑋 and 𝑌 , will be dependent on each other ( 𝑋 ⟂ ⟂ 𝑌 ). If one were to analyze umbrella sales and jogging activity without considering the weather, they will find them to be dependent. However, once we condition on the confounder 𝑍 and compare days with the same weather condition, umbrella sales and jogging activity should be independent of each other ( 𝑋 ⟂ ⟂ 𝑌 ∣ 𝑍 ).

•

A collider refers to the case that two variables, 𝑋 and 𝑌 , independently cause a third variable 𝑍 . Consider the tossing of two fair coins 𝑋 and 𝑌 , and a bell 𝑍 that rings whenever both coins lands on heads (this example is still valid when 𝑍 is a bell that rings whenever at least one of the coins lands on heads, see Pearl et al. (2016) for a detailed Bayesian proof). Without revealing if the bell rings or not, the head/tail states of two coins are independent to each other ( 𝑋 ⟂ ⟂ 𝑌 )—simply as how coin tosses naturally works. However, if we condition on the bell 𝑍 not ringing, knowing one of the coins landed on heads immediately informs us that the other coin landed on tails ( 𝑋 ⟂ ⟂ 𝑌 ∣ 𝑍 ), otherwise the bell would have rung.

These three causal models each encode (conditional) independencies as discussed above and summarized in Figure 1. Notice that chains and forks share the same (conditional) independencies, while colliders have a different set of (conditional) independencies. Chains and forks are then considered as the same Markov Equivalence Class (MEC), while colliders belong to a different MEC. Note that these examples operate under the Markov assumption: 𝑋 ⟂ ⟂ Graph 𝑌 ∣ 𝑍 ⇒ 𝑋 ⟂ ⟂ Data 𝑌 ∣ 𝑍 , meaning that the (conditional) independencies encoded in a causal graph should appear in its data.2

2.2Composite Causal Structures

In cases with more than three variables, such as in the right side of Figure 1, variables can potentially be connected through multiple paths, with several chains, forks, or colliders. Following the (conditional) independencies encoded by chains, forks, and colliders, a path is blocked when conditioning on the middle variable of a chain or a fork and unblocked when not conditioning on the middle variable of a chain or a fork. Furthermore, a path is blocked when not conditioning on the middle variable of a collider and unblocked when conditioning on the middle variable of a collider. Two variables are defined to be d-separated if every path between them is blocked, and d-separated variables are (conditional) independent.3 The above rules are summarized in Table 1.

In the Figure 1, one will find 𝑍 2 ⟂ ⟂ 𝑋 without any conditioning, since there is an unblocked chain path 𝑍 2 → 𝑍 3 → 𝑋 . One should also find 𝑍 2 ⟂ ⟂ 𝑋 ∣ ( 𝑍 3 , 𝑍 1 ) . Conditioning on 𝑍 3 blocks the 𝑍 2 → 𝑍 3 → 𝑋 chain. Over the 𝑍 2 → 𝑍 3 ← 𝑍 1 → 𝑋 path, although conditioning on 𝑍 3 unblocks the 𝑍 2 → 𝑍 3 ← 𝑍 1 collider, the conditioning on 𝑍 1 blocks the 𝑍 3 ← 𝑍 1 → 𝑋 fork, making this path blocked. The remaining 𝑍 2 → 𝑌 ← 𝑊 ← 𝑋 path is blocked by the collider 𝑍 2 → 𝑌 ← 𝑊 without conditioning on 𝑌 . Similarly, 𝑍 2 ⟂ ⟂ 𝑋 ∣ ( 𝑍 3 , 𝑍 1 , 𝑌 ) .

2.3Causal Discovery from Observational Data

Causal discovery is most easily achieved through interventions. However, in observational fields such as astrophysics, interventions are rarely possible (Cheng et al., 2018). Therefore, the field of causal discovery aims to reveal the causal relations between variables from purely observational data without interventions using alternative strategies. As the universe is one of the most challenging entities for humankind to intervene with, there is perhaps no field of science that stands to gain more from the study of causation than astronomy (Glymour, 2012).

2.3.1Constraint-based Methods

One of the most straightforward approaches to discovering causal structures from observational data is conducting conditional independence tests among variables since different MECs encode distinct conditional independencies.4 These approaches are generally referred to as constraint-based methods.

A commonly used constraint-based method is the Peter-Clark (Spirtes et al., 2001), PC, algorithm. The PC algorithm consists of three steps:

Start with a fully-connected, undirected graph among all variables, and remove edges based on conditional independence tests to arrive at a graph skeleton.

Identify colliders with conditional independence tests and orient them.

Orient edges that are incident on colliders such that no new colliders will be constructed.

Note that in addition to the Markov assumption and faithfulness assumption, the PC algorithm further assumes causal sufficiency (i.e., no unobserved confounders) and acyclicity.

Another constraint-based method, the Fast Causal Inference (Spirtes, 2001), FCI, algorithm relaxes the assumption of causal sufficiency, allowing unobserved confounders. The FCI algorithm is based on the same independence testing procedure as the PC algorithm but differs at the stage of labeling and orienting edges.

It has also been proven that PC and FCI algorithms are sound and complete under cyclic settings (M. Mooij & Claassen, 2020). However, these methods only provide a point estimate for the true MEC without quantifying their uncertainties; this becomes particularly problematic when the number of data points is small and the reliability of conditional independence tests degrades.

2.3.2Score-based Methods

Instead of finding a single causal structure, we can adopt a Bayesian perspective and define a posterior over all possible DAGs, 𝑃 ⁢ ( 𝐺 ∣ 𝐷 ) . To do this, score-based methods assign a numerical score to every DAG given the data. There are several possible ways to define such a score, such as the Bayesian Information Criterion (BIC) score (Chickering, 2002), generalized score (Huang et al., 2018), and the Bayesian Gaussian equivalent (BGe) score (Geiger & Heckerman, 1994, 2002; Kuipers et al., 2014).5

In an exact posterior approach, one evaluates the chosen score for every possible DAG, i.e., 𝑃 ⁢ ( 𝐺 ∣ 𝐷 ) . However, the cost of an exact search grows super-exponentially as the number of variables (nodes) increases. For example, the number of possible DAGs for three variables is only 25 but exceeds 4 × 10 18 for ten variables6, making an exact posterior search quickly computationally intractable. For the current study, the total number of DAGs for seven variables is 1,138,779,265, which is near the limit of computational feasibility.

As a result, sampling algorithms have been developed to approximate the exact posterior distribution without going over all DAGs. Such approximation is often done with Markov Chain Monte Carlo (MCMC) methods, such as the MC3 algorithm (Madigan et al., 1995) and Gadget (Viinikka et al., 2020). More recently, Deleu et al. (2022) developed DAG-GFN as an alternative to MCMC, and showed that DAG-GFN compares favorably against other methods based on MCMC. Here, we also show that DAG-GFN does recover the exact posterior distribution fairly well under the SMBH–galaxy context in §11. Some benchmark studies on causal discovery algorithms can be found in several references (Emezue et al., 2023; Vowels et al., 2022; Menegozzo et al., 2022; Runge et al., 2019; Ahmed et al., 2020; Kalainathan et al., 2020; Scutari, 2017).

2.4List of Terminology

See Table 2 for a list of causal inference terminology and their explanations.

Table 2:Causal Inference Terminology Term Definition

DAG A Directed Acyclic Graph (DAG) is a graphical representation of causal relationships between variables, consists of nodes and edges, with each edge directed from one node to another (“directed”), such that following those directions will never form a closed loop (“acyclic”). Directed edges suggest the direction of causality, i.e., 𝐴 → 𝐵 means 𝐴 causes 𝐵 .

MEC A Markov Equivalence Class (MEC) is a set of DAGs that encode the same conditional independence relationships among variables, making them observationally indistinguishable based on data alone. See Figure 3 for examples of MECs and their corresponding DAGs.

PDAG A Partially Directed Acyclic Graph (PDAG) is a graph that contains both directed and undirected edges to represent a MEC. An undirected edge 𝐴 ⁢ — ⁢ 𝐵 suggests both directions are possible (either 𝐴 → 𝐵 or 𝐴 ← 𝐵 ), with the exception when new MEC/conditional independencies are introduced by creating new colliders. For example, 𝐴 ⁢ — ⁢ 𝐵 ⁢ — ⁢ 𝐶 corresponds to 𝐴 → 𝐶 → 𝐵 , 𝐴 ← 𝐶 ← 𝐵 , and 𝐴 ← 𝐶 → 𝐵 , but not 𝐴 → 𝐶 ← 𝐵 . Examples of PDAGs can be found in the left column of Figure 3.

PAG A Partial Ancestral Graph (PAG) is a graphical model that represents equivalence classes of causal structures when unmeasured confounders or selection biases are present, capturing possible ancestral relationships among variables. There are two additional edge types: 𝐴 ⟷ 𝐵 corresponding to a confounding relation (i.e., a third variable causes both A and B). Empty circles ( ∘ ) representing uncertainty regarding the ending symbol of the edge (i.e., 𝐴 ∘ → 𝐵 may correspond to either 𝐴 → 𝐵 or to 𝐴 ⟷ 𝐵 , but rules out 𝐵 → 𝐴 ). Examples of PAGs can be found in the bottom row of Figure 11.

Child A variable that is directly influenced by another variable (its causal parent) in a causal relationship. For example, if 𝐴 → 𝐵 → 𝐶 , then 𝐵 is the child of 𝐴 , and 𝐶 is the child of 𝐵 .

Parent A variable that directly influences another variable (its causal child) in a causal relationship. For example, if 𝐴 → 𝐵 → 𝐶 , then 𝐴 is the parent of 𝐵 , and 𝐵 is the parent of 𝐶 .

Ancestor A variable that influences another variable either directly or indirectly through one or more causal paths. For example, if 𝐴 → 𝐵 → 𝐶 , then both 𝐴 and 𝐵 are ancestors of 𝐶 .

Descendant A variable that is influenced by another variable either directly or indirectly through one or more causal paths. For example, if 𝐴 → 𝐵 → 𝐶 , then both 𝐵 and 𝐶 are descendants of 𝐴 .

PC, FCI Peter-Clark (Spirtes et al., 2001), PC, algorithm and Fast Causal Inference (Spirtes, 2001), FCI, algorithm are two time-tested causal discovery algorithms. They are described in detail in §2.3.1.

BGe score The Bayesian Gaussian equivalent (BGe) score (Geiger & Heckerman, 1994, 2002; Kuipers et al., 2014) gives the exact posterior probabilities of every DAG given the data, 𝑃 ⁢ ( 𝐺 ∣ 𝐷 ) . The BGe score is explained in detail in §2.3.2 and §4.2. 3Data

Our data is composed of an up-to-date sample of most known galaxies with dynamically-measured SMBH masses.7 These include 145 nearby galaxies with a median luminosity distance of 21.5 Mpc that host SMBHs with directly resolved spheres of influence. From this parent sample, 101 out of 145 galaxies have all of the seven desired variables of interest for our study: dynamically-measured black hole mass ( 𝑀 ∙ ), central stellar velocity dispersion ( 𝜎 0 ), effective (half-light) radius of the spheroid8 ( 𝑅 e ), the average projected density within 𝑅 e ( ⟨ Σ e ⟩ ), total stellar mass ( 𝑀 ∗ ), color ( 𝑊 ⁢ 2 − 𝑊 ⁢ 3 ), and specific star formation rate (sSFR).

We include the three parameters that define the fundamental plane of elliptical galaxies (Djorgovski & Davis, 1987), i.e., central stellar velocity dispersion ( 𝜎 0 ), effective (half-light) radius of the spheroid9 ( 𝑅 e ), and the average projected density within 𝑅 e ( ⟨ Σ e ⟩ ). 𝑀 ∙ values are compiled by a series of progressive studies on black hole mass scaling relations (Graham & Scott, 2013; Scott et al., 2013; Savorgnan et al., 2013, 2016; Sahu et al., 2019a, b, 2020; Graham & Sahu, 2023a; Davis et al., 2017, 2018, 2019a, 2019b; Davis & Jin, 2023, 2024). 𝜎 0 values are collected from several works (Davis et al., 2017, 2019b; Sahu et al., 2019b), which are obtained primarily from the HyperLeda database (Makarov et al., 2014) and homogenized to produce an estimate of the mean velocity dispersion within an aperture of 595 pc. 𝑅 e and ⟨ Σ e ⟩ measurements were produced via the multi-component decompositions of surface brightness light profiles (primarily of 3.6 ⁢ 𝜇 ⁢ m Spitzer Space Telescope imaging) from succeeding works (Savorgnan & Graham, 2016; Davis et al., 2019a; Sahu et al., 2019a; Graham & Sahu, 2023b).10

This choice of parameters also allows us to explore the well-known 𝑀 ∙ – 𝜎 0 relation (Ferrarese & Merritt, 2000; Gebhardt et al., 2000). Indeed, one impetus for this study of SMBH–galaxy causality was the significant difference in intrinsic scatter ( 𝜖 ) observed in elliptical galaxies ( 𝜖

0.31 dex) vs. spiral galaxies ( 𝜖

0.67 dex) as determined by Sahu et al. (2019b). This implies that the 𝑀 ∙ – 𝜎 0 relation is ≈ 2.3 times less accurate for predicting SMBH masses in spiral galaxies as opposed to elliptical galaxies. As shown in this work, this difference in the scatter of the relationship between morphological types foreshadows their inherent dichotomy in causal structures.

The remaining parameters we explore concern properties related to the star-formation rate (SFR) in galaxies. For this, we consider data from the Wide-field Infrared Survey Explorer (Wright et al., 2010), WISE, to provide the color, total stellar masses ( 𝑀 ∗ ), and SFRs for our galaxies. These WISE values are all compiled from Graham et al. (2024): 𝑀 ∗ is derived from the prescriptions of Jarrett et al. (2023) for W1 ( 3.4 ⁢ 𝜇 ⁢ m ) photometry and colors from WISE; and SFRs accounting for activity over the past 100 Myr is ascertained via the WISE total integrated fluxes as per the calibrations of Cluver et al. (2017, 2024). For WISE colors, we considered both W1 − W2 ( 3.4 ⁢ 𝜇 ⁢ m − 4.6 ⁢ 𝜇 ⁢ m ) and W2 − W3 ( 4.6 ⁢ 𝜇 ⁢ m − 12.1 ⁢ 𝜇 ⁢ m ) colors, but ultimately elected to conduct our analyses with only the latter color, which exhibits a greater range of diversity across morphological classes of galaxies. Rather than absolute SFR, we convert to specific star-formation rate (sSFR) by normalizing each SFR by the stellar mass of each galaxy (i.e., sSFR ≡ SFR/ 𝑀 ∗ ).

We split the sample of galaxies into three morphological classes:

•

highly-evolved, massive, gas-poor elliptical (E) galaxies, which have been exposed to the full range of feedback and merging processes throughout their long histories spanning large fractions of the age of the Universe,

•

spiral (S) galaxies, at the opposite end of galaxy morphological classification schemes (Jeans, 1928; Hubble, 1936; Graham, 2019), which are unlikely to have encountered any major mergers and still retain a large fraction of their gas, and

•

lenticular (S0) galaxies, which represent a bridging population between E and S types.

Altogether, this gives us a sample of 35 elliptical, 38 lenticular, and 28 spiral galaxies for a total of 101 galaxies, each with six physical measurements of the host galaxy plus a dynamically-measured SMBH mass (see Table 3 and its pairplot in Figure 2). Although this division makes our already small sample into even smaller subsets, this exploration of morphologically-distinct causality is the overarching goal of our study. All morphologies have been determined by the multi-component decompositions of surface brightness light profiles (Savorgnan & Graham, 2016; Davis et al., 2019a; Sahu et al., 2019a; Graham & Sahu, 2023b). Our general classification scheme defines elliptical galaxies as spheroids (with or without embedded disk components), lenticular galaxies as spheroids with extended disk components (without spiral structure), and spiral galaxies as disk galaxies (with classical bulges, pseudobulges, or no bulges) with spiral structure. For our purposes in this study, we have not considered barred morphologies as a distinct classification element.

\startlongtable Table 3:Sample of 101 Galaxies with Dynamical SMBH Mass Measurements Galaxy log ⁡ ( 𝑀 ∙ )

log ⁡ ( 𝜎 0 )

log ⁡ ( 𝑅 e )

log ( ⟨ Σ e ⟩ ) W2 − W3 log ⁡ ( 𝑀 ∗ )

log ⁡ ( sSFR )

[M☉]	[km s-1]	[kpc]	[M☉ pc-2]	[mag]	[M☉]	[yr-1]

(1) (2) (3) (4) (5) (6) (7) (8) 35 Elliptical Galaxies IC 1459 9.38 ± 0.18

2.47 ± 0.01

0.89 ± 0.09

2.89 ± 0.13

0.39 ± 0.06

11.24 ± 0.08

− 12.00 ± 0.10

IC 4296 9.10 ± 0.09

2.52 ± 0.01

0.96 ± 0.31

2.87 ± 0.09

0.02 ± 0.08

11.47 ± 0.08

− 12.47 ± 0.11

NGC 821 7.59 ± 0.19

2.30 ± 0.01

0.54 ± 0.01

2.91 ± 0.09

0.27 ± 0.13

10.64 ± 0.08

− 11.65 ± 0.11

NGC 1275 8.90 ± 0.24

2.39 ± 0.02

1.24 ± 0.31

2.51 ± 0.13

3.00 ± 0.04

11.52 ± 0.09

− 9.81 ± 0.11

NGC 1399 8.67 ± 0.06

2.52 ± 0.01

0.76 ± 0.09

3.01 ± 0.09

0.17 ± 0.07

11.23 ± 0.08

− 12.72 ± 0.13

NGC 1407 9.65 ± 0.06

2.42 ± 0.01

0.80 ± 0.31

3.03 ± 0.12

0.07 ± 0.09

11.39 ± 0.08

− 15.69 ± 0.44

NGC 1600 10.28 ± 0.04

2.52 ± 0.01

1.22 ± 0.09

2.66 ± 0.07

− 0.34 ± 0.10

11.71 ± 0.09

− 15.23 ± 0.44

NGC 3091 9.61 ± 0.02

2.49 ± 0.01

1.15 ± 0.09

2.61 ± 0.17

− 0.21 ± 0.11

11.39 ± 0.09

− 15.09 ± 0.44

NGC 3377 8.24 ± 0.23

2.13 ± 0.01

0.36 ± 0.01

2.77 ± 0.07

− 0.09 ± 0.08

10.13 ± 0.08

− 14.43 ± 0.44

NGC 3379 8.63 ± 0.11

2.31 ± 0.00

0.43 ± 0.31

3.19 ± 0.14

0.12 ± 0.05

10.64 ± 0.08

− 14.94 ± 0.44

NGC 3414 8.38 ± 0.05

2.38 ± 0.01

0.47 ± 0.09

3.07 ± 0.15

0.50 ± 0.07

10.63 ± 0.08

− 12.05 ± 0.10

NGC 3585 8.49 ± 0.14

2.33 ± 0.01

0.90 ± 0.31

2.67 ± 0.10

0.06 ± 0.09

11.03 ± 0.09

− 13.16 ± 0.28

NGC 3607 8.17 ± 0.17

2.35 ± 0.01

0.90 ± 0.31

2.74 ± 0.13

0.73 ± 0.06

11.13 ± 0.08

− 11.73 ± 0.09

NGC 3608 8.63 ± 0.10

2.29 ± 0.01

0.66 ± 0.31

2.75 ± 0.09

− 0.13 ± 0.10

10.56 ± 0.09

− 14.86 ± 0.44

NGC 3842 9.94 ± 0.12

2.49 ± 0.01

1.48 ± 0.09

2.03 ± 0.08

− 0.43 ± 0.07

11.45 ± 0.09

− 14.80 ± 0.44

NGC 3923 9.47 ± 0.13

2.39 ± 0.01

0.92 ± 0.09

2.80 ± 0.13

− 0.03 ± 0.08

11.30 ± 0.08

− 15.60 ± 0.44

NGC 4261 9.21 ± 0.08

2.47 ± 0.01

0.84 ± 0.31

2.89 ± 0.10

0.22 ± 0.08

11.17 ± 0.08

− 11.99 ± 0.10

NGC 4291 8.97 ± 0.14

2.47 ± 0.01

0.27 ± 0.31

3.35 ± 0.14

0.00 ± 0.09

10.47 ± 0.08

− 14.77 ± 0.44

NGC 4374 8.95 ± 0.04

2.44 ± 0.00

1.04 ± 0.31

2.59 ± 0.08

− 0.04 ± 0.07

11.14 ± 0.08

− 15.44 ± 0.44

NGC 4472 9.36 ± 0.03

2.45 ± 0.00

1.01 ± 0.09

2.81 ± 0.08

0.28 ± 0.12

11.41 ± 0.08

− 12.08 ± 0.10

NGC 4473 7.95 ± 0.22

2.25 ± 0.01

0.43 ± 0.31

2.96 ± 0.08

0.15 ± 0.08

10.53 ± 0.08

− 12.05 ± 0.11

NGC 4486 9.85 ± 0.02

2.51 ± 0.01

0.85 ± 0.31

3.05 ± 0.08

0.33 ± 0.05

11.31 ± 0.08

− 12.02 ± 0.10

NGC 4552 8.67 ± 0.04

2.40 ± 0.01

0.71 ± 0.31

2.68 ± 0.09

0.50 ± 0.10

10.77 ± 0.09

− 12.12 ± 0.11

NGC 4621 8.59 ± 0.04

2.36 ± 0.01

0.88 ± 0.09

2.58 ± 0.10

0.42 ± 0.13

10.89 ± 0.08

− 11.82 ± 0.10

NGC 4649 9.67 ± 0.10

2.52 ± 0.01

0.80 ± 0.09

3.04 ± 0.09

0.42 ± 0.10

11.24 ± 0.08

− 12.02 ± 0.09

NGC 4697 8.10 ± 0.02

2.22 ± 0.00

1.09 ± 0.40

2.03 ± 0.08

0.09 ± 0.06

10.65 ± 0.08

− 12.11 ± 0.11

NGC 4889 10.29 ± 0.33

2.59 ± 0.01

1.43 ± 0.09

2.38 ± 0.09

− 0.17 ± 0.09

11.72 ± 0.09

− 15.02 ± 0.44

NGC 5077 8.85 ± 0.23

2.40 ± 0.01

0.64 ± 0.09

3.16 ± 0.17

0.22 ± 0.07

11.02 ± 0.08

− 11.94 ± 0.10

NGC 5419 9.86 ± 0.14

2.54 ± 0.01

1.01 ± 0.01

2.87 ± 0.09

0.04 ± 0.12

11.64 ± 0.08

− 12.47 ± 0.12

NGC 5576 8.20 ± 0.10

2.26 ± 0.01

0.76 ± 0.09

2.52 ± 0.09

− 0.23 ± 0.05

10.70 ± 0.08

− 15.00 ± 0.44

NGC 5846 9.04 ± 0.04

2.38 ± 0.01

0.98 ± 0.31

2.64 ± 0.10

− 0.13 ± 0.08

11.18 ± 0.09

− 15.48 ± 0.44

NGC 6251 8.77 ± 0.14

2.49 ± 0.03

1.16 ± 0.09

2.66 ± 0.09

1.05 ± 0.04

11.51 ± 0.08

− 11.49 ± 0.09

NGC 7052 9.35 ± 0.02

2.45 ± 0.02

0.77 ± 0.09

3.04 ± 0.07

0.58 ± 0.05

11.22 ± 0.08

− 11.69 ± 0.10

NGC 7619 9.35 ± 0.10

2.50 ± 0.01

1.11 ± 0.31

2.52 ± 0.07

− 0.01 ± 0.12

11.29 ± 0.08

− 15.11 ± 0.44

NGC 7768 9.10 ± 0.15

2.46 ± 0.02

1.32 ± 0.31

2.36 ± 0.09

− 0.38 ± 0.05

11.44 ± 0.09

− 14.63 ± 0.44

38 Lenticular Galaxies NGC 404 5.74 ± 0.10

1.54 ± 0.04

− 1.24 ± 0.31

3.64 ± 0.12

1.28 ± 0.05

8.85 ± 0.09

− 10.36 ± 0.16

NGC 524 8.68 ± 0.10

2.37 ± 0.01

0.04 ± 0.31

3.83 ± 0.07

0.52 ± 0.06

11.10 ± 0.08

− 12.16 ± 0.10

NGC 1023 7.62 ± 0.04

2.29 ± 0.01

− 0.41 ± 0.09

4.21 ± 0.09

0.18 ± 0.07

10.61 ± 0.08

− 12.35 ± 0.11

NGC 1194 7.82 ± 0.04

2.17 ± 0.07

− 0.04 ± 0.40

3.96 ± 0.09

2.83 ± 0.04

10.46 ± 0.08

− 9.87 ± 0.09

NGC 1316 8.16 ± 0.22

2.35 ± 0.01

0.14 ± 0.31

3.94 ± 0.30

0.65 ± 0.05

11.43 ± 0.08

− 11.96 ± 0.09

NGC 1332 9.16 ± 0.06

2.47 ± 0.02

0.28 ± 0.40

3.68 ± 0.10

0.42 ± 0.05

10.88 ± 0.08

− 11.70 ± 0.11

NGC 1374 8.76 ± 0.04

2.25 ± 0.01

0.03 ± 0.31

3.35 ± 0.08

0.12 ± 0.07

10.33 ± 0.08

− 14.63 ± 0.44

NGC 2549 7.14 ± 0.23

2.15 ± 0.01

− 0.73 ± 0.09

4.25 ± 0.13

0.33 ± 0.06

9.97 ± 0.08

− 11.97 ± 0.15

NGC 2778 7.14 ± 0.43

2.19 ± 0.01

− 0.63 ± 0.31

3.85 ± 0.14

0.12 ± 0.05

9.89 ± 0.08

− 11.70 ± 0.17

NGC 2787 7.60 ± 0.05

2.28 ± 0.01

− 0.86 ± 0.31

4.16 ± 0.16

0.59 ± 0.04

9.80 ± 0.08

− 11.89 ± 0.14

NGC 3115 8.94 ± 0.31

2.42 ± 0.01

0.19 ± 0.09

3.58 ± 0.08

0.14 ± 0.12

10.63 ± 0.08

− 12.60 ± 0.13

NGC 3245 8.30 ± 0.11

2.32 ± 0.02

− 0.63 ± 0.09

4.50 ± 0.10

1.09 ± 0.04

10.45 ± 0.08

− 11.22 ± 0.09

NGC 3384 7.02 ± 0.20

2.16 ± 0.01

− 0.52 ± 0.09

4.29 ± 0.08

0.24 ± 0.05

10.37 ± 0.08

− 11.85 ± 0.11

NGC 3489 6.76 ± 0.06

2.02 ± 0.01

− 1.02 ± 0.31

4.74 ± 0.09

1.16 ± 0.04

10.14 ± 0.08

− 11.24 ± 0.09

NGC 3665 8.76 ± 0.10

2.33 ± 0.02

0.33 ± 0.31

3.57 ± 0.09

1.33 ± 0.04

11.13 ± 0.08

− 11.27 ± 0.09

NGC 3998 8.42 ± 0.18

2.42 ± 0.02

− 0.51 ± 0.40

4.20 ± 0.10

1.39 ± 0.05

10.30 ± 0.08

− 11.18 ± 0.09

NGC 4026 8.26 ± 0.11

2.24 ± 0.01

− 0.83 ± 0.40

4.97 ± 0.13

0.52 ± 0.05

10.18 ± 0.08

− 13.23 ± 0.39

NGC 4339 7.63 ± 0.12

2.05 ± 0.01

− 0.31 ± 0.31

3.48 ± 0.10

0.67 ± 0.14

10.02 ± 0.08

− 11.16 ± 0.12

NGC 4342 8.65 ± 0.18

2.38 ± 0.01

− 0.29 ± 0.31

3.68 ± 0.07

0.31 ± 0.04

10.10 ± 0.08

− 14.40 ± 0.44

NGC 4350 8.87 ± 0.14

2.26 ± 0.01

0.20 ± 0.31

3.08 ± 0.07

0.51 ± 0.06

10.35 ± 0.08

− 12.05 ± 0.10

NGC 4371 6.83 ± 0.07

2.11 ± 0.01

− 0.15 ± 0.31

3.37 ± 0.19

0.64 ± 0.09

10.38 ± 0.08

− 12.02 ± 0.11

NGC 4429 8.18 ± 0.03

2.24 ± 0.01

− 0.05 ± 0.31

3.76 ± 0.08

0.86 ± 0.05

10.75 ± 0.08

− 11.59 ± 0.09

NGC 4434 7.84 ± 0.05

2.07 ± 0.01

− 0.25 ± 0.31

3.59 ± 0.07

0.00 ± 0.06

10.03 ± 0.08

− 14.33 ± 0.44

NGC 4459 7.83 ± 0.08

2.24 ± 0.01

− 0.01 ± 0.31

3.69 ± 0.11

1.17 ± 0.10

10.56 ± 0.08

− 11.12 ± 0.09

NGC 4526 8.66 ± 0.01

2.35 ± 0.02

0.06 ± 0.31

3.73 ± 0.10

1.14 ± 0.05

10.84 ± 0.08

− 11.38 ± 0.09

NGC 4564 7.90 ± 0.12

2.19 ± 0.01

− 0.38 ± 0.09

3.96 ± 0.09

0.31 ± 0.05

10.12 ± 0.08

− 12.54 ± 0.13

NGC 4578 7.28 ± 0.08

2.05 ± 0.02

− 0.31 ± 0.31

3.58 ± 0.07

− 0.24 ± 0.10

10.05 ± 0.08

− 14.35 ± 0.44

NGC 4596 7.90 ± 0.20

2.15 ± 0.01

− 0.13 ± 0.09

3.64 ± 0.07

0.34 ± 0.08

10.54 ± 0.08

− 11.84 ± 0.09

NGC 4742 7.13 ± 0.15

2.01 ± 0.01

− 0.61 ± 0.31

4.28 ± 0.09

0.40 ± 0.05

9.99 ± 0.09

− 11.79 ± 0.15

NGC 4762 7.24 ± 0.05

2.15 ± 0.01

− 0.74 ± 0.31

4.38 ± 0.07

0.18 ± 0.07

10.56 ± 0.08

− 14.86 ± 0.44

NGC 5018 8.00 ± 0.08

2.33 ± 0.01

0.05 ± 0.31

4.00 ± 0.09

0.89 ± 0.07

11.10 ± 0.08

− 11.40 ± 0.09

NGC 5128 7.65 ± 0.13

2.01 ± 0.03

0.04 ± 0.40

3.75 ± 0.07

2.53 ± 0.03

10.86 ± 0.08

− 10.61 ± 0.09

NGC 5252 9.03 ± 0.35

2.27 ± 0.06

− 0.15 ± 0.31

4.37 ± 0.09

2.27 ± 0.04

11.05 ± 0.08

− 10.30 ± 0.09

NGC 5813 8.83 ± 0.04

2.37 ± 0.01

0.32 ± 0.31

3.44 ± 0.10

0.03 ± 0.12

11.10 ± 0.08

− 12.49 ± 0.12

NGC 5845 8.41 ± 0.16

2.36 ± 0.01

− 0.20 ± 0.31

3.70 ± 0.11

0.53 ± 0.04

10.14 ± 0.08

− 11.75 ± 0.11

NGC 6861 9.30 ± 0.22

2.59 ± 0.02

0.41 ± 0.31

3.29 ± 0.14

0.76 ± 0.05

10.84 ± 0.08

− 11.81 ± 0.09

NGC 7332 7.06 ± 0.20

2.11 ± 0.01

− 0.59 ± 0.40

4.49 ± 0.10

0.46 ± 0.05

10.48 ± 0.08

− 11.78 ± 0.11

NGC 7457 6.96 ± 0.26

1.83 ± 0.02

− 0.40 ± 0.31

3.32 ± 0.10

0.27 ± 0.11

9.92 ± 0.08

− 11.74 ± 0.14

28 Spiral Galaxies Circinus 6.22 ± 0.08

2.17 ± 0.05

− 0.34 ± 0.03

3.86 ± 0.17

4.02 ± 0.03

10.04 ± 0.09

− 9.37 ± 0.13

IC 2560 6.51 ± 0.09

2.14 ± 0.01

− 0.21 ± 0.03

3.29 ± 0.15

3.38 ± 0.04

10.43 ± 0.08

− 10.01 ± 0.09

NGC 224 8.15 ± 0.19

2.19 ± 0.01

− 0.16 ± 0.00

3.67 ± 0.08

2.08 ± 0.04

10.71 ± 0.08

− 10.94 ± 0.09

NGC 253 7.00 ± 0.30

1.98 ± 0.08

− 0.33 ± 0.01

3.61 ± 0.07

3.81 ± 0.04

10.43 ± 0.08

− 9.85 ± 0.09

NGC 1097 8.38 ± 0.09

2.29 ± 0.01

0.13 ± 0.07

3.75 ± 0.07

3.41 ± 0.04

11.22 ± 0.08

− 10.04 ± 0.10

NGC 1300 7.86 ± 0.31

2.34 ± 0.06

− 0.13 ± 0.10

3.19 ± 0.09

2.91 ± 0.04

10.56 ± 0.08

− 10.41 ± 0.09

NGC 1320 6.77 ± 0.16

2.04 ± 0.04

− 0.70 ± 0.07

4.24 ± 0.09

3.34 ± 0.04

10.13 ± 0.09

− 9.68 ± 0.10

NGC 1398 8.03 ± 0.08

2.29 ± 0.04

0.09 ± 0.04

3.58 ± 0.17

2.14 ± 0.04

11.17 ± 0.08

− 10.88 ± 0.09

NGC 2273 6.95 ± 0.06

2.15 ± 0.03

− 0.55 ± 0.03

3.77 ± 0.15

3.14 ± 0.04

10.43 ± 0.08

− 10.02 ± 0.09

NGC 2960 7.07 ± 0.04

2.22 ± 0.04

− 0.13 ± 0.05

3.89 ± 0.09

2.98 ± 0.04

10.72 ± 0.08

− 10.42 ± 0.09

NGC 2974 8.23 ± 0.05

2.37 ± 0.01

− 0.17 ± 0.01

3.76 ± 0.12

1.36 ± 0.08

10.61 ± 0.08

− 11.28 ± 0.09

NGC 3031 7.83 ± 0.09

2.18 ± 0.01

− 0.14 ± 0.01

3.65 ± 0.08

1.80 ± 0.03

10.57 ± 0.08

− 11.00 ± 0.09

NGC 3079 6.28 ± 0.30

2.24 ± 0.03

− 0.46 ± 0.05

4.04 ± 0.17

3.64 ± 0.04

10.41 ± 0.08

− 9.84 ± 0.09

NGC 3227 7.25 ± 0.25

2.10 ± 0.02

0.01 ± 0.03

3.38 ± 0.13

3.09 ± 0.04

10.65 ± 0.08

− 10.11 ± 0.09

NGC 3368 6.89 ± 0.09

2.07 ± 0.01

− 0.60 ± 0.02

4.22 ± 0.14

2.13 ± 0.04

10.55 ± 0.08

− 10.81 ± 0.09

NGC 3627 6.94 ± 0.05

2.10 ± 0.02

− 0.71 ± 0.07

4.33 ± 0.17

3.44 ± 0.04

10.63 ± 0.08

− 10.06 ± 0.09

NGC 4151 7.29 ± 0.37

1.96 ± 0.05

− 0.25 ± 0.02

3.97 ± 0.13

2.82 ± 0.04

10.53 ± 0.08

− 9.89 ± 0.09

NGC 4258 7.61 ± 0.01

2.12 ± 0.02

− 0.01 ± 0.06

3.27 ± 0.07

2.44 ± 0.04

10.56 ± 0.08

− 10.59 ± 0.09

NGC 4303 6.78 ± 0.04

1.98 ± 0.04

− 0.70 ± 0.02

4.65 ± 0.07

3.87 ± 0.04

10.71 ± 0.09

− 9.86 ± 0.10

NGC 4388 6.95 ± 0.09

2.00 ± 0.04

0.09 ± 0.02

3.07 ± 0.19

3.15 ± 0.04

10.12 ± 0.08

− 9.85 ± 0.09

NGC 4501 7.31 ± 0.08

2.22 ± 0.02

0.22 ± 0.04

3.22 ± 0.07

3.05 ± 0.04

10.89 ± 0.08

− 10.35 ± 0.09

NGC 4594 8.81 ± 0.03

2.35 ± 0.01

0.28 ± 0.02

3.48 ± 0.08

0.90 ± 0.05

11.06 ± 0.08

− 11.55 ± 0.09

NGC 4699 8.28 ± 0.05

2.28 ± 0.02

− 0.64 ± 0.06

3.25 ± 0.18

2.20 ± 0.04

11.06 ± 0.08

− 10.86 ± 0.09

NGC 4736 6.83 ± 0.10

2.03 ± 0.01

− 0.64 ± 0.01

4.46 ± 0.10

2.71 ± 0.04

10.38 ± 0.08

− 10.46 ± 0.09

NGC 4826 6.20 ± 0.11

1.99 ± 0.02

− 0.38 ± 0.01

3.73 ± 0.11

2.21 ± 0.04

10.54 ± 0.08

− 10.68 ± 0.09

NGC 4945 6.12 ± 0.30

2.07 ± 0.06

− 0.80 ± 0.14

3.78 ± 0.07

3.56 ± 0.03

10.23 ± 0.08

− 9.91 ± 0.09

NGC 7582 7.74 ± 0.18

2.07 ± 0.02

− 0.32 ± 0.11

4.08 ± 0.17

3.29 ± 0.04

10.59 ± 0.08

− 9.64 ± 0.09

UGC 3789 7.07 ± 0.04

2.03 ± 0.05

− 0.24 ± 0.01

3.59 ± 0.13

3.22 ± 0.04

10.51 ± 0.08

− 10.19 ± 0.09

Note. — Column (1): galaxy name. Column (2): black hole mass. Column (3): central stellar velocity dispersion. Column (4): equivalent-axis, effective (half-light) radius of the spheroid component. Column (5): average projected density within 𝑅 e . Column (6): WISE W2 − W3 color. Column (7): galaxy stellar mass. Column (8): specific star-formation rate, i.e., log ⁡ ( sSFR ) ≡ log ⁡ ( SFR ) − log ⁡ ( 𝑀 ∗ ) .

Figure 2: A pairplot of the data listed in Table 3.

As can be seen from the shape of the data in Figure 2, the data is predominantly characterized by linear relations and appears normally distributed in their logarithmic form, which underpins the general assumption for calculating BGe scores.11

3.1Notes on Sample Selection

Our sample is initially derived from a total sample of 145 galaxies that host directly-measured SMBHs, 103 of which (with WISE luminosities) are listed in Graham et al. (2024). We reduced the sample further down to 101 galaxies by excluding the galaxies NGC 4395 and NGC 6926 because they are bulge-less galaxies and hence lack a Sérsic (1963) bulge component (including 𝑅 e values). Readers are invited to read Graham et al. (2024, §2) for a detailed description of the galaxies, SMBHs, stellar masses, SFRs, etc. Extended accounts of the provenances behind the directly-measured SMBH masses and photometric multi-component decompositions are given in Sahu et al. (2020) and Graham & Sahu (2023a), respectively.

In the first instance, our sample is selected from the available pool of directly-measured SMBHs in the literature, which numbers at only 145 galaxies at last check. Because SMBHs generally scale along with the stellar mass of their host galaxies (Davis et al., 2018), this leads to an unavoidable selection bias where only the closest and/or most massive galaxies have dynamically-measurable SMBHs (i.e., no galaxy in our sample is further than NGC 7768, at a luminosity distance of 108.2 Mpc, or a redshift of 𝑧

0.02424 assuming the Planck Collaboration et al. 2020 cosmological parameters). This is more noticeable when considering the spiral galaxies; our sample is primarily composed of massive spiral galaxies (i.e., earlier/redder types) and is devoid of irregular galaxies. The latest type galaxy in our sample is NGC 3079, which is an SBc or 𝑇

6.4 ± 1.1 Hubble type galaxy (Makarov et al., 2014). Recently, Winkel et al. (2024) have shown from their study of direct black hole mass measurements that active and quiescent galaxies follow the same black hole mass scaling relations, which strengthens the merits of applying local relations to high- 𝑧 AGNs.

Our sample contains one dwarf galaxy, which is the dwarf lenticular galaxy NGC 404 with 𝑀 ∗

( 7.08 ± 1.47 ) × 10 8 ⁢ M ☉ . Not surprisingly, it also hosts the least massive black hole in our sample, 𝑀 ∙

( 5.50 ± 1.27 ) × 10 5 ⁢ M ☉ , which is near the intermediate-mass black hole (IMBH) range ( 10 2 ≤ M ☉ < 10 5 ). Notably, we lack any dwarf elliptical galaxies. If it were possible to include them, they would likely warrant being segregated from other ellipticals due to their distinctive diminutive masses and higher sSFRs. This is something we plan on testing in a similar causal study using simulated galaxies. For now, the lack of more dwarf galaxies or any IMBHs is not detrimental to our study; Limberg (2024) recently demonstrated that dwarf galaxies and IMBHs appear to follow the 𝑀 ∙ – 𝜎 0 and 𝑀 ∙ – 𝑀 ∗ relations extrapolated from local massive galaxies.

The second criteria for inclusion is the availability of mid-infrared luminosities from WISE. This provides us with a robust tracer of dust-obscured star formation activity. It is helpful for us to have consistency regarding the source of imaging when calculating stellar masses. Indeed, Sahu et al. (2023) revealed that inconsistencies between stellar mass derivations across mixed samples led to the misled claim (Shankar et al., 2016) that galaxies with directly-measured SMBHs represented a biased sample relative to the general population. Although, mid-infrared sample selection is known for its propensity to miss low-mass dust-poor S0 galaxies with low sSFRs (Eales et al., 2018; Graham et al., 2024). However, this effect will be somewhat mitigated in our sample because our galaxies are all nearby. These galaxies hold some significance as being possible primordial galaxies that have avoided mergers (Eggen et al., 1962; Harikane et al., 2023; Graham, 2023a; Graham et al., 2024). If so, that would negate the effect of any causal mechanism from hierarchical growth for these galaxies, which would be another prime opportunity for testing with galaxy simulations.

Despite the inherent limitations of restricting our data to a small sample, it is crucial for our study of causality to use only directly-measured black holes. One of the primary focuses of our study is to determine the causal direction in the 𝑀 ∙ – 𝜎 0 relation. Because of this, we are restricted to using only directly-measured black holes, i.e., black holes masses that are not derived from the 𝑀 ∙ – 𝜎 0 relation. For example, the 𝑀 ∙ – 𝜎 0 relation governs all black hole masses that are calculated using single-epoch spectra or reverberation mapping, because both of these temporal methods require calibrating their viral products with the 𝑀 ∙ – 𝜎 0 relation (Graham et al., 2011). Therefore, any causal study we might attempt using indirect methods of measuring black holes masses, that are at their heart calibrated to the 𝑀 ∙ – 𝜎 0 relation, will inevitably be biased by the artificial 𝜎 0 → 𝑀 ∙ causal direction.

4Exact Posterior Methodology 4.1General Description

To represent the causal structure of the dataset, we use Directed Acyclic Graphs (DAGs). Each DAG encodes a set of conditional independencies, and DAGs that encode the same conditional independencies belong to the same Markov Equivalence Class (MEC).12 This choice assumes that no cyclical dependencies between variables exist. This is a reasonable assumption, given the clear differences in gas fractions and merger histories between the different morphological classes (see §10.4 for more details). To achieve a purely data-driven study, we adopt a uniform prior, giving an equal prior probability, 𝑃 ⁢ ( 𝐺 ) , to every one of the nearly 1.14 × 10 9 possible DAGs (OEIS Foundation Inc., 2024a). We calculate the exact posterior probabilities of every DAG given the data, 𝑃 ⁢ ( 𝐺 ∣ 𝐷 ) , using the Bayesian Gaussian equivalent (BGe) score (Geiger & Heckerman, 1994, 2002; Kuipers et al., 2014). The BGe score gives the marginal likelihood by examining conditional independencies and ensures that DAGs belonging to the same MEC are scored equally.

4.2Theory

The posterior probability of a graph given the data 𝑃 ⁢ ( 𝐺 ∣ 𝐷 ) is proportional to the posterior probability of the data given a graph 𝑃 ⁢ ( 𝐷 ∣ 𝐺 ) under a uniform prior, through Bayes’ rule 𝑃 ⁢ ( 𝐺 ∣ 𝐷 ) ∝ 𝑃 ⁢ ( 𝐷 ∣ 𝐺 ) ⁢ 𝑃 ⁢ ( 𝐺 ) . Under the assumption of linear and Gaussian data, the BGe score gives the marginal likelihood that the distribution of the data sample 𝑑

{ 𝐱 1 , … , 𝐱 𝑁 } of 𝑁 variables is faithful (i.e., the data satisfies only and all the conditional independencies encoded by the DAG) to a hypothetical DAG model 𝑚 ℎ as a product of local scores:

𝑝 ⁢ ( 𝑑 ∣ 𝑚 ℎ )

∏ 𝑖

1 𝑛 𝑝 ⁢ ( 𝑑 𝐏𝐚 𝑖 ∪ { 𝑋 𝑖 } ∣ 𝑚 𝑐 ℎ ) 𝑝 ⁢ ( 𝑑 𝐏𝐚 𝑖 ∣ 𝑚 𝑐 ℎ ) ,

(1)

where 𝐏𝐚 𝑖 is the parent variables of the vertex 𝑖 , and 𝑑 𝑌 is the data restricted to the subset of data 𝑌 . The modularity (i.e., the full score is a product of local scores over all vertices 𝑖 ) of local scores ensures that all DAGs in the same MEC are scored equally, and simplifies the posterior calculation over a large amount of DAGs. The local scores are further characterized by

𝑝 ⁢ ( 𝑑 𝐘 ∣ 𝑚 𝑐 ℎ )

( 𝛼 𝜇 𝑁 + 𝛼 𝜇 ) 𝑙 / 2

Γ 𝑙 ⁢ ( ( 𝑁 + 𝛼 𝑤 − 𝑛 + 𝑙 ) / 2 ) 𝜋 𝑙 ⁢ 𝑁 / 2 ⁢ Γ 𝑙 ⁢ ( ( 𝛼 𝑤 − 𝑛 + 𝑙 ) / 2 ) ⁢ | 𝑇 𝐘𝐘 | ( 𝛼 𝑤 − 𝑛 + 𝑙 ) / 2 | 𝑅 𝐘𝐘 | ( 𝑁 + 𝛼 𝑤 − 𝑛 + 𝑙 ) / 2 .

(2)

The detailed explanation and full derivation of Equations 1 and 2 can be found in Kuipers et al. (2014). Empirically, many causal discovery methods based on the BGe score have been proven to successfully recover the ground truth causal structures in benchmark tests (Deleu et al., 2022; Emezue et al., 2023).

4.3Calculating Exact Posteriors

The steps of calculating exact posteriors can be summarized in the following steps:

Generate all possible DAGs for 𝑁 variables represented by 𝑁 × 𝑁 adjacency matrices 𝐴 , with 𝐴 𝑖 , 𝑗

1 if there is an arrow from node 𝑖 to node 𝑗 .

For every DAG, generate its transitive closure represented by an adjacency matrix.

Calculate the posterior probability for every DAG given the data with the BGe score following Equation 1 (the sum of the scores over all DAGs is equal to unity by construction).

Perform a weighted average on all DAG adjacency matrices according to their posterior probabilities to get the matrix of edge marginals.

Perform a weighted average on all transitive closure adjacency matrices according to their posterior probabilities to get the matrix of path marginals.

For a given value of 𝑁 , steps 1 and 2 only need to be done once (i.e., the possible DAGs for 𝑁 variables are unique), and only steps 3–5 need to be repeated for different datasets.

In this work, steps 2–5 are coded in a highly optimized and parallelized way on graphics processing units (GPUs), powered by a Python package JAX (Bradbury et al., 2018). The calculation of transitive closure adopts Warshall’s algorithm (Warshall, 1962). The MECs for analysis are generated with a Python package causal-learn (Zheng et al., 2024). The visualization of causal graphs is made possible through Python packages NetworkX (Hagberg et al., 2008) and PyGraphviz13.

5A Compendium of Causal Structures

Among all possible causal structures, the most probable MEC and its corresponding DAGs for E, S0, and S galaxies are shown in Figure 3. More detailed information about the DAGs, MECs, and their exact posterior distributions can be found in §4. We find that in the most probable MEC for elliptical galaxies, the SMBH mass is a causal child, i.e., an effect of galaxy properties, while in the most probable MEC for spirals, the SMBH mass is a parent of galaxy properties (with lenticulars being in the middle).

Most probable MEC	Corresponding DAGs

Figure 3: The most probable Markov Equivalence Class (MEC) for each morphology and their corresponding Directed Acyclic Graphs (DAGs) demonstrate a reversal of the causal position of 𝑀 ∙ . MECs are represented as Partially Directed Acyclic Graphs (PDAGs). Directed edges suggest the direction of causality. The undirected edge 𝐴 — 𝐵 suggests both directions are possible (either 𝐴 → 𝐵 or 𝐴 ← 𝐵 ), as long as no new MEC/conditional independencies are introduced by creating new colliders (i.e., two nodes both pointing towards a third node, 𝐴 → 𝐶 ← 𝐵 ). In the ellipticals, 𝑀 ∙ is strictly a child, while in spiral galaxies, 𝑀 ∙ is always connected with four galaxy properties through four undirected edges, suggesting either 𝑀 ∙ is the parent of all of the four galaxy properties, or 𝑀 ∙ is the parent of three of the galaxy properties, and the child of the remaining one (as shown in the corresponding DAGs), ruling out more than one galaxy property pointing towards 𝑀 ∙ , since this creates a new collider and breaks the encoded conditional independencies. The percentage listed above each graph indicates the posterior probability of the graph, whereas the prior probability for each individual DAG is equal to the reciprocal of the total number of DAGs, approximately 8.78 × 10 − 10 (OEIS Foundation Inc., 2024a). The MEC probabilities are the sum of their corresponding DAGs.

The morphologically-dependent trend holds not only in the most probable graphs but is common over the entire posterior distribution. This can be quantified using edge and path marginals. Edge marginals are the posterior probability of a direct causal relation between two variables, marginalized over the causal structures of the other nodes. Similarly, path marginals provide the probability of a causal connection between two variables through a potentially indirect path (e.g., through intermediate nodes). These marginal causal structures can be represented in matrix form as shown in Figure 4. The first row ( 𝑀 ∙ → galaxy) and column (galaxy → 𝑀 ∙ ) of each matrix contain information pertaining to the inferred causal relationship between SMBH masses and their host galaxy properties.

Figure 4: Exact posterior edge marginals (top matrices) and path marginals (bottom matrices) for elliptical (left matrices), lenticular (middle matrices), and spiral (right matrices) galaxies. Edge marginals give the probability of Parent → Child through directed edges summed over all DAGs and their probabilities, and path marginals give the probability of Ancestor → Descendant through both direct and indirect paths. Focusing on the first row and column of each matrix, it is generally evident the first row becomes progressively more light and and first column becomes progressively more dark as you look across the matrices from left to right, going from ellipticals to spirals, i.e., 𝑀 ∙ becomes more of a causal parent/ancestor.

Among all possible DAGs, the percentage of graphs exhibiting a direct edge from 𝜎 0 to 𝑀 ∙ is 78 % in ellipticals, 72 % in lenticulars, and only 22 % in spirals. The path marginals in the bottom row of matrices support a similar picture, as by considering all possible paths relating these two nodes, we find that 79 % of DAGs in ellipticals and 72 % in lenticulars have 𝜎 0 as an ancestor of 𝑀 ∙ , whereas this is the case in only 25 % of DAGs in spirals. For comparison, the null results (i.e., the posterior from a uniform prior without any data) for the edge marginals are 𝑃 ⁢ ( Parent )

29 % , 𝑃 ⁢ ( Child )

29 % , and 𝑃 ⁢ ( Disconnected )

42 % ; for the path marginals these probabilities are 𝑃 ⁢ ( Ancestor )

42 % , 𝑃 ⁢ ( Descendant )

42 % , and 𝑃 ⁢ ( Disconnected )

16 % (see §6).

6Posterior Distribution Inspection

In addition to Figure 3 (which shows the MEC with the highest posterior probability along with its corresponding DAGs) and Figure 4 (which shows the edge and path marginals), here we take a deeper look at the posterior distribution. Figure 5 shows the top four (in terms of posterior probabilities) MECs and Figure 6 shows the top ten DAGs. The top graphs within each morphology class are similar to each other, and most of them convey the idea that elliptical galaxy properties → SMBH mass, in spirals SMBH mass → galaxy properties, and lenticulars occupy the middle ground. The paltry percentage of the total population for individual DAGs or MECs is not a rare and surprising phenomenon; due to the huge space of possible causal structures, the number of possible DAGs grows super-exponentially as the number of variables increases. The chance of drawing any DAG from a uniform distribution out of all possible DAGs is 8.781333053161975 × 10 − 10 , which is ∼ 10 6 times smaller than the typical proportion around ∼ 10 − 3 we observed for the top DAGs (see Figure 6).

Figure 5: Exact posterior result for the top four Markov Equivalence Classes (MECs), represented as Partially Directed Acyclic Graphs (PDAGs) for elliptical (top panel), lenticular (middle panel), and spiral (bottom panel) galaxies. The posterior probability is labeled on top of each MEC, and is calculated by the sum of all DAG posterior probabilities within that MEC. The top MECs within each morphology class share noticeable similarities. In ellipticals, 𝑀 ∙ generally sits at the bottom of the graph as a descendant of galaxy properties, and rises up as an ancestor of galaxy properties in spirals.

Figure 6: Exact posterior result for the top 10 most probable Directed Acyclic Graphs (DAGs) for elliptical (top panel), lenticular (middle panel), and spiral (bottom panel) galaxies. The percentage listed above each DAG indicates the posterior probability of the DAG, whereas the prior probability for every DAGs is equal to precisely 8.781333053161975 × 10 − 10 (OEIS Foundation Inc., 2024a). The top DAGs within each morphology are similar to each other, and the general trend of 𝑀 ∙ predominantly rising to the top of the DAGs (from a descendant to an ancestor of galaxy properties) when going from the top (ellipticals) to bottom (spirals) panel is again manifest.

To better understand the relative posterior probability distribution and quantify the difference between graphs, in Figure 7 we ordered DAGs and MECs by their posterior probabilities from highest to lowest. The posterior probability is shown as solid lines and labeled on the left 𝑦 -axis, and the rapidly dropping curves show that a few leading DAGs and MECs are relatively much more probable than the DAGs and MECs in the long tail (note that the 𝑥 -axis is in a logarithmic scale). The dashed lines and the right 𝑦 -axis show the structural Hamming distance (SHD), a standard metric to evaluate the distance between graphs14, from each unique DAG or MEC to the most probable DAG or MEC. From the SHDs, the top few DAGs or MECs are more similar to each other with fewer edges away from each other, and less prominent DAGs or MECs are statistically more and more distinctive from the top ones.

The SHD increases the slowest in spirals, making the posterior distribution of spirals the most unimodal, while the SHD grows fastest in lenticulars, reinforcing the picture that lenticulars, as the middle ground between ellipticals and spirals, have more sub-modes of causal structures and no clear dominance of one particular causal direction between black hole mass and galaxy properties. The probability distribution and SHD distribution together shows that despite the existence of some sub-modes, a general mode of causal structure is detected in ellipticals and spirals respectively, and this general mode can be visualized statistically via the edge marginals and path marginals in Figure 4, as discussed in §5.

Finally, we note that it is useful to remember for all interpretations of edge/path marginals that the null solution is nontrivial. Indeed, there is not a 50–50 chance of one direction of causality or the other, but rather there are differing probabilities of one direction, its reverse, or no direction of causality that vary depending on whether it is an edge or path marginal, and it is not intuitive (see Figure 8)

Figure 7: The exact posterior probability distribution and the structural Hamming distances (SHDs) to the most probable graph. There are in total 3.12510571 × 10 8 MECs in the case of seven nodes (OEIS Foundation Inc., 2024b). Here, only the first 10 6 MECs are plotted for simplicity. The DAGs (left plot) and MECs (right plot) are ordered by their posterior probabilities from highest to lowest. The solid lines and the left 𝑦 -axes show the posterior probability of the DAGs/MECs. The dashed lines and the right 𝑦 -axes show the SHD, a measure of distance between graphs, from each DAG or MEC to the most probable DAG or MEC. The red dashed line marks the 10 𝑡 ⁢ ℎ DAG and the 4 𝑡 ⁢ ℎ MEC, which are shown in Figures 6 and 5, respectively. Together, these panels demonstrate that a few DAGs or MECs that are similar to each other (in terms of SHDs) have much higher posterior probabilities than the rest of the distinctive DAGs and MECs that reside in the very long tail of nearly zero probability. Figure 8: The edge marginals (left matrix) and path marginals (right matrix) for a uniform prior, i.e., all possible DAGs share the same probability. Note that under a uniform prior, the probability of having 𝐴 → 𝐵 , 𝐴 ← 𝐵 , and 𝐴 disconnected to 𝐵 is not 1/3 each. The edge marginals and path marginals under the uniform prior will vary slightly as the number of nodes change, but the probability of having opposite directions of causality (i.e., 𝐴 → 𝐵 and 𝐴 ← 𝐵 ) will remain equal. 7Findings 7.1Causal Connections for Galaxy Evolution

We find that these results are consistent with theoretical models of galactic evolution. Ellipticals are highly-evolved galaxies, being the result of a large number of galactic mergers. Modern hydrodynamical cosmological simulations such as EAGLE (Crain et al., 2015) or IllustrisTNG (Marinacci et al., 2018; Naiman et al., 2018; Nelson et al., 2018; Pillepich et al., 2018; Springel et al., 2018) show that elliptical galaxies with log ⁡ ( 𝑀 ∗ / M ⊙ ) ≥ 11 are generally the end result of two or more major merger events, such that the typical present-day fraction of stars with ex situ origins is greater than 50% (Cannarozzo et al., 2023)15. In even more general terms, the process of successive mergers will act to erase the preexisting causal connection from the SMBH to its host galaxy and establish new correlations via the central limit theorem Jahnke & Macciò (2011).

During a merger, the SMBHs at the center of each merging galaxy play no role in the large-scale dynamics; it is the galaxy properties (chiefly size and mass) that shape the galaxy mergers and their outcomes. Central SMBHs are passively driven to the bottom of the post-galaxy-merger potential well by dynamical friction, eventually merging together (Chandrasekhar, 1943; Khan et al., 2020). So it stands to reason to expect that in ellipticals, the distribution of SMBH masses is determined by that of galaxy properties and not vice versa.

For spiral galaxies, this is not the case, since they experience at most a few relatively minor mergers. Unlike elliptical galaxies, spirals are predominantly composed of in situ stellar populations. Causal relations between SMBH mass and galaxy properties may thus be set primordially in a secular co-evolution phase, and they are not erased by mergers. As a result, spiral galaxies behave markedly differently compared to ellipticals. Interestingly, lenticulars appear to lie in-between, as expected, based on the fact that lenticulars have undergone enough mergers to erase spiral structure while still maintaining an extended disk structure, but are not yet comparable to ellipticals in terms of mass and pressure support.16 Moreover, by extension of Cannarozzo et al. (2023)’s results to all early-type (i.e., lenticular and elliptical) galaxies, all but the most massive lenticular galaxies should still maintain in situ stellar fractions greater than 50%.

The six galaxy variables studied here can be split into the three parameters defining the fundamental plane (FP) of elliptical galaxies and three parameters related to star formation. The FP is a manifestation of dynamical equilibrium reached in the largely pressure-supported stellar dynamics of massive elliptical galaxies (Mould, 2020). Moreover, it is a consequence of the merger formation of these galaxies via dissipation and feedback that ultimately places them on the FP. Although only 35/101 of our galaxies are ellipticals, the classical bulges of lenticular and spiral galaxies are also governed by the FP. Indeed, it has been found that the bulges of type S0–Sbc galaxies tightly follow the same FP relation as ellipticals (Falcón-Barroso et al., 2002).

The matrices in Figure 4 also provide information about the causal nature of the observed FP relationship. By looking at the path marginals for elliptical galaxies (bottom left), we find that ⟨ Σ e ⟩ is the ancestor (86%) of 𝑅 e and that 𝜎 0 is an ancestor (76%) of 𝑅 e . This implies ⟨ Σ e ⟩ and 𝜎 0 are both upstream of 𝑅 e , confirming that the density and dynamics of stellar populations in an elliptical galaxy govern its size. Furthermore, we find that there is nearly no chance that 𝑀 ∗ is disconnected from 𝑅 e (i.e., 54 % + 46 %

100 % , they are never 𝑑 -separated, thus always correlated), indicating the existence of a size–mass relation due to the virial theorem (i.e., 𝑀 ∼ 𝜎 2 ⁢ 𝑅 ).

7.2Causal Active Galactic Nuclei Feedback

From Figure 4, we find that, in spirals, 𝑀 ∙ is the ancestor (74%) of sSFR, in lenticulars, there is no dominant causal direction between the two parameters (38% and 14%), while in ellipticals, 𝑀 ∙ becomes the descendant (80%) of a galaxy’s sSFR. This can be interpreted as a direct consequence of the presence or absence of gas through AGNs feedback. If there is a substantial gas reservoir (as in spirals), the SMBH is the ancestor since its feedback is responsible for shutting down star formation and hence stopping the growth of stellar mass. With a dearth of gas, as in ellipticals, even large AGNs bursts will not affect the stellar mass, and thus the SMBH cannot be an ancestor of galaxy properties. This is further supported by the fact that we find that 𝑀 ∙ is the parent (69%) of 𝑀 ∗ in spirals, but becomes the descendant (56%) or child (49%) of 𝑀 ∗ in elliptical galaxies. However, it is true that in the absence of gas, mergers are the main pathway for SMBH growth, and this will also cause the SMBH to become a descendant or child in hierarchical assembly (Hopkins et al., 2006, 2007, 2008a, 2008b; Jahnke & Macciò, 2011; Treister et al., 2012; Graham & Sahu, 2023a; Graham, 2023c; Natarajan et al., 2023b).

8Experiment with Semi-analytic Models

As an additional benchmark test for the methodology under the SMBH–galaxy context, we practiced the same causal discovery method on data generated by SAMs, where the ground truth causal direction is clearly defined, propagated through a series of coupled partial-differential equations, and is easily customizable.

SAMs are powerful tools to model galaxy formation using dark matter halo merger trees from 𝑁 -body simulations with some phenomenological descriptions of baryonic physical processes like cosmic reionization, hot gas cooling and cold gas infall, star formation and metal production, supernova feedback, gas stripping and tidal disruption of satellites, galaxy mergers, bulge formation, black hole growth, AGNs feedback, etc. We adopt the model of Luo et al. (2016), which is the resolution-independent version of the Munich galaxy formation model: L-Galaxies (mainly based on models of Fu et al. 2013 and Guo et al. 2011, 2013). The dark matter only 𝑁 -body simulation is the JiuTian-1G simulation with 6144 3 dark matter particles in a 1 Gpc/ ℎ cubic simulation box, based on Planck 2018 (Planck Collaboration et al., 2020) cosmological parameters.

In the model, there are two processes related to black hole growth and its feedback. The first is “quasar mode” where SMBHs can accrete cold gas directly during galaxy mergers. The other is “radio mode” where SMBHs can accrete hot gas continually from their host galaxies and inject energy into the hot atmosphere. The quasar mode is the main black hole growth channel, while the radio mode is the main AGNs feedback channel to suppress hot gas cooling. More details can be found in the supplementary material of Henriques et al. (2015). The models used here are in broad agreement with many of the standard studies on AGN feedback (e.g., Kauffmann & Haehnelt, 2000; Benson et al., 2003; Granato et al., 2004; Di Matteo et al., 2005; Bower et al., 2006; Croton et al., 2006).

In SAM galaxies, black hole feedback is actively affecting galaxies and is hard-coded to turn off as soon as a galaxy is quenched. Therefore, in SAM elliptical galaxies that become quenched, galaxy properties cause the black hole mass via the only remaining mechanism (i.e., mergers/accretion), and in SAM spiral galaxies with black hole feedback still on, black holes primarily cause galaxy properties. We conducted an additional check where the black hole feedback is manually turned off throughout the entire life of galaxies as “SAM no feedback” galaxies.

This gives us three groups of SAM galaxies: SAM E galaxies, SAM S galaxies, and SAM no-feedback galaxies. SAM E galaxies are galaxies with 𝐵 / 𝑇

0.78 (bulge-to-total ratio, 𝑀 bulge ∗ / 𝑀 ∗

0.78 from Graham & Worley 2008), and SAM S galaxies are galaxies with 𝐵 / 𝑇 ≤ 0.78 . Additional stellar mass 𝑀 ∗ cuts are applied such that the 𝑀 ∗ distributions of SAM E and SAM S galaxies are similar to that of the real observational data used in this work for a fair comparison, as shown in Figure 9. No cuts are applied to SAM no-feedback galaxies since they do not have any realistic counterparts and are generated solely for this test. This gives us 1189 SAM E galaxies, 1999 SAM S galaxies, and 2663 SAM no-feedback galaxies.

Figure 9: The 𝑀 ∗ distribution for semi-analytical model (SAM) galaxies compared to the 𝑀 ∗ distribution of real galaxies used in this work. The 𝑀 ∗ distributions of SAM E and SAM S galaxies are similar to that of the real observational data for a fair comparison.

The results we present in Figure 10 indeed confirm the designed causal structure in the SAMs. The edge marginals clearly show that in SAM E galaxies, galaxy properties cause the black hole mass, in SAM S galaxies, black hole mass causes galaxy properties, and in SAM no feedback galaxies, galaxy properties cause the black hole mass, as the opposite direction is forbidden by construction.

Figure 10: Edge (top matrices) and path (bottom matrices) marginals for SAM galaxies. These matrices are similar to those found in Figure 4. Causal discovery is performed on ellipticals (SAM E), spirals (SAM S), and galaxies with black hole feedback intentionally turned off (SAM no feedback). Here, we are restricted to four parameters that are tracked in the SAMs. This test shows that the same pattern of causality seen in the real galaxies for elliptical vs. spiral (i.e., quenched vs. star-forming) galaxies is upheld in these simulated galaxies. Moreover, the causal structure identified in SAM no feedback galaxies matches exactly with our design where the black hole → galaxy direction is strictly turned off. 9Crosschecking with PC and FCI

The PC and FCI algorithms, two constraint-based methods (in contrast with the score-based method adopted in this work), are also applied to the same observational data to cross-check our results. The details of these two time-tested algorithms are already presented in §2.3.1. We adopt the implementation of PC and FCI in the Python package causal-learn (Zheng et al., 2024), and the results are reported in Figure 11. The exact posterior result including edge/path marginals (Figure 4) and the top MECs/DAGs (Figures 3, 5, and 6) are generally consistent with the causal graphs found by PC and FCI.

Figure 11: Graphs learned by the PC algorithm (upper row) and by the FCI algorithm (bottom row). The graphs in the top row use PDAGs to represent MECs of DAGs by leaving some edges undirected. The graphs in the bottom row are Partial Ancestral Graphs (PAGs) and introduce additional edge types: 𝐴 ⟷ 𝐵 corresponding to a confounding relation (i.e., a third variable causes both A and B) and empty circles representing uncertainty regarding the ending symbol of the edge (i.e., 𝐴 ∘ → 𝐵 may correspond to either 𝐴 → 𝐵 or to 𝐴 ⟷ 𝐵 , but rules out 𝐵 → 𝐴 ). The significance cutoff for conditional independence tests is set to 𝛼

0.15 in all graphs. This demonstration shows that the same general causal trends are recovered using these constraint-based methods when comparing with the primary score-based method we utilized in our study.

In ellipticals, the PC algorithm finds 𝜎 0 and sSFR cause 𝑀 ∙ . In our Bayesian approach, 𝜎 0 → 𝑀 ∙ and sSFR → 𝑀 ∙ indeed have the highest and second highest edge/path marginals among the potential causes of 𝑀 ∙ in ellipticals. In spirals, the PC algorithm finds 𝜎 0 , 𝑅 e , and 𝑀 ∗ as effects of 𝑀 ∙ , and this is also consistent with the edge/path marginals reported in our Bayesian approach. The FCI algorithm produces results compatible with those of the PC algorithm, with the difference that, without the assumption of causal sufficiency, it leaves open the possibility that all the relations between SMBH mass and its causal parents are confounded by unobserved variables. Particularly, the double arrow between 𝑀 ∙ and sSFR in the lower left DAG of Figure 11 may indicate an unobserved confounder, possibly the gas fraction, which in the future can be tested through hydrodynamical simulations where the gas fraction is more accessible.

Note that here we adopt a relatively high value of 𝛼

0.15 , the significance cutoff for the 𝑝 -value of conditional independence tests. Generally, a lower 𝛼 value gives more false negative errors (i.e., fails to identify causal relations that exist), and a higher 𝛼 results in more false positive errors (i.e., identifies causal relations that do not exist). Practically, the choice of 𝛼 is often empirical and highly depends on the context. Here in our case, the conditional independence tests, which are the core of PC and FCI, suffer from the limited size of the dataset (35, 38, and 28 for E, S0, and S galaxies, respectively). We therefore selected a higher value of 𝛼 to mitigate this. These limitations of PC and FCI are one of the main motivations for our adoption of a Bayesian approach by relatively comparing the posterior probabilities across all possible DAGs.

10Testing Limitations 10.1Possible Unobserved Confounders

Our posterior calculation approach implicitly adopts the assumption of causal sufficiency, i.e., assuming there are no unobserved confounders17. With the presence of an unobserved confounder, non-existing causal relations might be falsely identified. Some potential unobserved confounders, such as the reserve of gas or merger history, are practically difficult to observe but are already integrated into our interpretation. However, the distance from us to galaxies does not directly play any role in galaxy formation theory nor in our interpretation, but might influence multiple variables we examined, since our ability to measure all these seven variables decreases as distance increases and thus bias our sample towards nearby and more massive BHs/galaxies. Therefore, we examined the impact of distance by performing causal discovery with distance as one of the seven variables.

Since 𝑊 ⁢ 2 − 𝑊 ⁢ 3 and sSFR are highly degenerate with each other, we replaced 𝑊 ⁢ 2 − 𝑊 ⁢ 3 with 𝐷 𝐿 , the luminosity distance to our targets18. The edge and path marginals with distance included are presented in Figure 12. Comparing against the original marginals without distance (Figure 4), the presence of distance barely changes any previously identified causal relations, since the edge and path marginals between galaxy properties and SMBH masses remains unchanged with or without the inclusion of distance.19

Figure 12: Edge marginals (top matrices) and path marginals (bottom matrices) with luminosity distance ( 𝐷 𝐿 ) as one of the variables. Qualitatively, we find no change to the causal directions (as compared with Figure 4) when testing distance as a possible confounding variable. 10.2Stability under Observational Errors

The variables used in this work are affected by observational errors and their marginal posterior probability distributions are given in Table 3 (assuming Gaussian posteriors). However, the causal structures explored so far have been calculated for the mean of these posteriors without considering their uncertainties. We now quantify the effect of this uncertainty on our inference of the causal structures. To do this, we draw samples from the posterior distribution of each variable to produce 100 mock datasets. The causal discovery method outlined in this paper is repeated on each of these 100 randomly-sampled datasets to arrive at 100 pairs of different edge marginal and path marginal matrices for each of the three morphological types considered. The edge marginal and path marginal matrices are summarized in Figure 13.

Figure 13: The mean and standard deviation of edge marginals (top matrices) and path marginals (bottom matrices) over 100 random sampling realizations for each morphological class. Qualitatively, we find no change to the causal directions (as compared with Figure 4) when considering observational errors on all variables, and the uncertainties remain low ( ≤ 22 % ).

We find that, overall, the key findings of this study are robust against these uncertainties. For example, in ellipticals, the edge marginal between 𝜎 0 and 𝑀 ∙ in both directions across random sampling realizations are 0.70 ± 0.09 and 0.26 ± 0.07 , giving a 3.84 𝜎 discrepancy (in other words, the probability that the inferred direction of causality is due to noise and the resulting uncertainties in the variables is about 10 − 4 ). Figure 14 shows the distribution of the edge marginals and path marginals for 𝜎 0 → 𝑀 ∙ . The difference between ellipticals and spirals is evident in all realizations.

Figure 14: Edge marginal (\hdashrule[0.35ex]8mm0.4mm) and path marginal (\hdashrule[0.35ex]8mm1pt1mm) distributions of 𝜎 0 → 𝑀 ∙ over 100 random realizations for each morphological class. This plot illustrates the distribution of the element at the second row and first column (2, 1) in all matrices from Figure 13, showing a clear separation between the causal directions found in late-type vs. early-type galaxies. 10.3Stability under Possible Outliers

We also explored the possibility of individual outlier galaxies biasing the inferred causal relations. To do this, we performed leave-one-out cross-validation. For ellipticals, lenticulars, and spirals, respectively, one galaxy is taken out at a time, and causal discovery is performed repeatedly (e.g., for 35 elliptical galaxies this procedure will be repeated 35 times). The mean and standard deviation of the resulting marginals are shown in Figure 15, and the marginals for 𝜎 0 → 𝑀 ∙ are highlighted in Figure 16. As can be seen, the fluctuations due to leave-one-out are much smaller than the uncertainties resulting from observational errors (§10.2), suggesting that the results are not driven by any potential individual outlier galaxy.

Figure 15: The mean and standard deviation of edge marginals (top matrices) and path marginals (bottom matrices) over all leave-one-out realizations. Qualitatively, we find no change to the causal directions (as compared with Figure 4) when testing for possible outliers, and the uncertainties remain low ( ≤ 9 % ). Figure 16: Edge marginal (\hdashrule[0.35ex]8mm0.4mm) and path marginal (\hdashrule[0.35ex]8mm1pt1mm) distributions of 𝜎 0 → 𝑀 ∙ over all leave-one-out realizations. This plot illustrates the distribution of the element at the second row and first column (2, 1) in all matrices from Figure 15, showing a clear separation between the causal directions found in late-type vs. early-type galaxies. 10.4Cyclicity

By calculating the posterior probabilities of all possible DAGs, we implicitly assumed acyclicity, i.e., no loops in a graph. In fact, the existence of feedback loops between black hole mass and galaxy properties (i.e., having black hole mass causing the galaxy properties, and then galaxy properties also causing black hole mass at the same time) is trivial in ellipticals and spirals according to galaxy formation theory. Black holes affect their host galaxies through black hole feedback, a process that heats the gas and pushes gas out to starve star formation, while galaxies also affect the central black hole through mergers and accretion. In an ideal spiral galaxy, there have been (at most) only minor mergers, thus killing off the merger path of galaxy → black hole.

The accretion onto the black hole is mainly regulated by the black hole mass itself and the gas density in the central region (Bondi, 1952). This latter quantity is found to be relatively constant in gas-rich galaxies, as confirmed by modern numerical simulations, like the NIHAO suite (Wang et al., 2015; Blank et al., 2019) as shown in Figure 17. This implies that accretion is fairly constant in all gas-rich galaxies, diminishing the causal relation galaxy → black hole.

Figure 17: Gas mass within 5 kpc versus total stellar mass in NIHAO simulated galaxies (Wang et al., 2015). The central gas mass is fairly constant in gas-rich galaxies, implying that gas accretion onto the black hole, which is mainly regulated by the black hole mass and the local gas density (Bondi, 1952), is also quite uniform across galaxies, weakening the galaxy → SMBH causal relation in spirals. This is shown to demonstrate a weakened argument for the possibility of a cyclic situation where both “galaxy → SMBH” and “galaxy ← SMBH” are present at the same time in spirals.

Therefore, in spiral galaxies, the causal relation of galaxy → black hole is expected to be very weak compared to the black hole → galaxy direction. On the other hand, ellipticals are in short supply of gas, therefore the central SMBH lacks the media in which to project its energy to regulate star formation. As a result, in ellipticals, the black hole → galaxy direction is negligible compared to the galaxy → black hole path enabled by major mergers.

In all (in spirals and ellipticals), one of the causal directions between SMBHs and galaxies is expected to considerably overwhelm the other, making the causal structure acyclic. The lenticulars, however, might have both major mergers and black hole feedback simultaneously, thus being more cyclic in their causal structure. This may be one of the reasons why we see many sub-modes in the posterior distributions of lenticulars, as shown in Figure 7. To fully identify cyclic causal structures, time-series data is usually required. While in our case of SMBH–galaxy co-evolution, which happens on a timescale of billions of years, obtaining time-series data is impossible within the lifetime of humanity20, studies of samples of galaxies with different ages may provide observational clues about the presence or absence of cyclicity in future studies.

10.5Alternative Priors

As detailed in §4.2, the posterior probability of a graph given the data is given by

𝑃 ⁢ ( 𝐺 ∣ 𝐷 ) ∝ 𝑃 ⁢ ( 𝐷 ∣ 𝐺 ) ⁢ 𝑃 ⁢ ( 𝐺 ) .

(3)

𝑃 ⁢ ( 𝐷 ∣ 𝐺 ) can be calculated through the BGe score, and 𝑃 ⁢ ( 𝐺 ) is the prior probability of graphs, i.e., DAGs. In this work we adopt a uniform prior,

𝑃 uniform ⁢ ( 𝐺 )

1 𝑁 DAGs ,

(4)

such that every one of the nearly 1.14 × 10 9 possible DAGs have the same prior probability. By construction, this uniform prior does not encode any assumptions or biases about the structure of the causal graph.

Although a uniform prior best fits the data-driven purpose of this study, here we explore possible alternative priors to test the robustness of the identified causal structure. Another possible choice of prior could be a sparsity-based prior, which embraces the idea of Occam’s razor and favors simpler graphs with fewer edges. Here we adopt a sparsity-based prior that penalizes the number of edges exponentially,

𝑃 sparse ⁢ ( 𝐺 )

𝑒 − 𝜆 ⋅ 𝐸 ⁢ ( 𝐺 ) ∑ DAGs 𝑒 − 𝜆 ⋅ 𝐸 ⁢ ( 𝐺 ) ,

(5)

where 𝜆

0 is a sparsity penalty parameter and a higher 𝜆 enforces stronger sparsity.

We performed causal discovery under this sparsity-based prior with 𝜆

0.5 , and the edge marginals and path marginals found are presented in Figure 18. Compared with the results under a uniform prior (Figure 4), the result under a sparsity-based prior does show a tendency towards simpler graphs with less edges since the probability of less likely edges becomes even more unlikely under a sparsity-based prior, i.e., a darker cell becomes even darker from a uniform prior (Figure 4) to a sparsity-based prior (Figure 18). For example, in E galaxies, 𝑃 ⁢ ( sSFR → 𝜎 0 )

0.11 under a uniform prior, but this number drops to 𝑃 ⁢ ( sSFR → 𝜎 0 )

0.08 under a sparsity-based prior. On the other hand, the probabilities of causal connections between galaxy properties and SMBH masses, which are the main findings of this work, remain almost the same, and some even increase, i.e., a brighter cell becomes even brighter from a uniform prior (Figure 4) to a sparsity-based prior (Figure 18). For example, in S galaxies, 𝑃 ⁢ ( 𝑀 ∙ → 𝜎 0 )

0.62 under a uniform prior, but rises to 𝑃 ⁢ ( 𝑀 ∙ → 𝜎 0 )

0.63 under a sparsity-based prior. Therefore, as darker cells become darker and brighter cells get brighter, the causal connection between galaxy properties and SMBH masses get even more highlighted, further supporting the main claims of this work.

Figure 18: Exact posterior edge marginals (top matrices) and path marginals (bottom matrices) for elliptical (left matrices), lenticular (middle matrices), and spiral (right matrices) galaxies under a sparsity-based prior detailed in Equation 5 and §10.5. The marginals found are very similar to the ones found under a uniform prior (Figure 4). Despite the fact that a sparsity-based prior favors simpler DAGs with minimal number of edges, it barely changes any edge or path marginal between galaxy properties and SMBH masses.

Besides a sparsity-based prior and a uniform prior, there are many other priors such as a Erdős-Rényi prior (Erdos & Renyi, 1960), which favors a certain average number of edges per node; a combinatorial fair prior, which incorporates combinatorial fairness and treats graphs with different parent configurations equally; and a biased prior which favors certain configuration of graphs based on prior domain knowledge, for example, a higher prior probability for 𝑀 ∙ → 𝜎 0 over 𝜎 0 → 𝑀 ∙ . Given the aim of this paper is to make the result free of presumed models and possible biases, and then use this data-driven result to compare against theoretical galaxy formation models, we stick to the uniform prior.

11Possible Extension to More Variables with DAG-GFN

The general timescale to perform the exact posterior search for seven variables outlined in this work, including generating all possible DAGs, transitive closures, calculating posterior probabilities, and getting edge/path marginals, is approximately a few hours. However, as the number of possible DAGs grows by a factor of ∼ 10 2 when the number of variables increases from seven to eight, and by a factor of ∼ 10 6 when the number of variables increases from seven to nine, an exact search becomes impractical. Here, we explore DAG-GFN as a feasible way to approximate the posteriors as the number of variables increases.

The DAG-GFN method (Deleu et al., 2022) uses the framework of Generative Flow Networks (Bengio et al., 2021, 2023), GFlowNets, to (approximately) sample from the posterior distribution. GFlowNets treat the problem of sampling from an unnormalized distribution over discrete and compositional objects as a sequential decision-making problem, where actions are taken by sampling from a learned policy at each step of generation. In the context of (Bayesian) causal discovery, DAGs are constructed one edge at a time, starting from the empty graph. The objective is to find a policy 𝜋 ⁢ ( 𝐺 ′ ∣ 𝐺 ) giving the probability of adding an edge to the DAG 𝐺 to transform it into a new graph 𝐺 ′ , such that sampling sequentially from it would yield samples from a distribution proportional to 𝑅 ⁢ ( 𝐺 ) (i.e., an unnormalized distribution). Deleu et al. (2022) showed that such a policy satisfies

1 | 𝐺 | + 1 ⁢ 𝑅 ⁢ ( 𝐺 ′ ) ⁢ 𝜋 ⁢ ( stop ∣ 𝐺 )

𝑅 ⁢ ( 𝐺 ) ⁢ 𝜋 ⁢ ( 𝐺 ′ ∣ 𝐺 ) ⁢ 𝜋 ⁢ ( stop ∣ 𝐺 ′ ) ,

(6)

where | 𝐺 | is the number of edges in 𝐺 , and 𝜋 ⁢ ( stop ∣ 𝐺 ) is the probability of stopping the sequential process, effectively returning 𝐺 as a sample of the posterior. To sample the posterior 𝑃 ⁢ ( 𝐺 ∣ 𝐷 ) ∝ 𝑃 ⁢ ( 𝐷 ∣ 𝐺 ) ⁢ 𝑃 ⁢ ( 𝐺 ) (by Bayes’ rule), we can then use 𝑅 ⁢ ( 𝐺 )

𝑃 ⁢ ( 𝐷 ∣ 𝐺 ) ⁢ 𝑃 ⁢ ( 𝐺 ) .

DAG-GFN was trained on our data, and 10 5 DAGs were sampled from the trained network. The frequency of each sampled unique DAG gives the approximated posterior probability of that DAG. The marginals, as well as the top MECs and DAGs are presented in Figures 19, 20, and 21, respectively. The approximated posteriors by DAG-GFN are highly consistent with the exact posteriors from our primary analysis. Visual inspection reveals that Figure 4 and Figure 19 present noticeable similarities.

Figure 19: Edge marginals (top matrices) and path marginals (bottom matrices) approximated by DAG-GFN. Qualitatively, we find no change to the causal directions (as compared with Figure 4) when using DAG-GFN to draw representative samples from the full Bayesian posterior.

Figure 20: Top four MECs sampled by DAG-GFN for elliptical (top panel), lenticular (middle panel), and spiral (bottom panel) galaxies. Qualitatively, we find no change to the causal structures (as compared with Figure 5) when using DAG-GFN to draw representative samples from the full Bayesian posterior.

Figure 21: Top 10 DAGs sampled by DAG-GFN for elliptical (top panel), lenticular (middle panel), and spiral (bottom panel) galaxies. Qualitatively, we find no change to the causal structures (as compared with Figure 6) when using DAG-GFN to draw representative samples from the full Bayesian posterior. 12Conclusions

We present the first data-driven evidence on the direction of the causal relationship between supermassive black holes and their host galaxies. Our findings suggest that in elliptical galaxies, bulge properties influence SMBH growth, whereas in spiral galaxies, SMBHs shape galaxy characteristics. The process of quenching can be causally explained as follows:

quenching starts in gas-rich (i.e., spiral) galaxies, and hence there is a causal connection; and

the quenching is over in elliptical galaxies, where we only see the end product of such quenching, and the causal connection is now reversed.

These findings support theoretical models of galactic evolution driven by feedback processes and mergers. Withal, our causal mechanisms are defined from a relatively modest number of galaxies in the local Universe. Further insights can be gained in the future with wider and deeper surveys for more SMBH mass measurements, or by using time-series data and control variables in galaxy simulations (Waterval et al., 2024) across a wide range of redshifts to test the causal findings and explanations presented here. Besides that, continual advances in the nascent field of causal discovery will verily help alleviate potential biases imposed by unobserved confounders (Schölkopf et al., 2021; Zhang et al., 2024; Jin et al., 2024) or cyclicity (Ghassami et al., 2020; Dai et al., 2024).

With the knowledge we gain from learning the underlying causal structures and mechanisms behind galaxy–SMBH co-evolution, it should ultimately be possible to create physically-motivated black mass scaling relations that faithfully model the reality of action/interaction. The successful application of causal discovery to this astrophysical dataset paves the way for a deeper understanding of the fundamental physical processes driving galaxy evolution and establishes causal discovery as a promising tool for data-driven insights across various scientific disciplines.

This research was carried out on the high-performance computing resources at New York University Abu Dhabi. We acknowledge the usage of the HyperLeda database (http://leda.univ-lyon1.fr). Z.J. and M.P. wish to extend their heartfelt thanks to Jithendaraa Subramanian for providing in-depth support and clarifications regarding DAG-GFN, and to Michelle Liu for comments and discussion. Y.H. thanks Andrew Benson and Dhanya Sridhar for helpful discussions. Z.J. thanks Michael Blanton and Joseph Gelfand for useful suggestions. Z.J. genuinely thanks Mohamad Ali-Dib for his very timely help with HPC technical issues. This material is based upon work supported by Tamkeen under the NYU Abu Dhabi Research Institute grant CASS. This work is partially supported by Schmidt Futures, a philanthropic initiative founded by Eric and Wendy Schmidt as part of the Virtual Institute for Astrophysics (VIA). M.P. acknowledges financial support from the European Union’s Horizon 2020 research and innovation program under the Marie Skłodowska-Curie grant agreement No. 896248. The data and code used for this work are available for download from the following GitHub repository: https://github.com/ZehaoJin/causalbh. ORCID iDs {CJK*}

UTF8gbsn

Zehao Jin (金泽灏) \scalerel*B https://orcid.org/0009-0000-2506-6645

Mario Pasquato \scalerel*B https://orcid.org/0000-0003-3784-5245

Benjamin L. Davis \scalerel*B https://orcid.org/0000-0002-4306-5950

Tristan Deleu \scalerel*B https://orcid.org/0009-0005-1943-3484

Yu Luo (罗煜) \scalerel*B https://orcid.org/0000-0003-2341-9755

Changhyun Cho \scalerel*B https://orcid.org/0000-0002-9879-1749

Pablo Lemos \scalerel*B https://orcid.org/0000-0002-4728-8473

Laurence Perreault-Levasseur \scalerel*B https://orcid.org/0000-0003-3544-3939

Yoshua Bengio \scalerel*B https://orcid.org/0000-0002-9322-3515

Xi Kang (康熙) \scalerel*B https://orcid.org/0000-0002-5458-4254

Andrea Valerio Macciò \scalerel*B https://orcid.org/0000-0002-8171-6507

Yashar Hezaveh \scalerel*B https://orcid.org/0000-0002-8669-5733

References Ahmed et al. (2020) ↑ Ahmed, O., Träuble, F., Goyal, A., et al. 2020, CausalWorld: A Robotic Manipulation Benchmark for Causal Structure and Transfer Learning Aird et al. (2015) ↑ Aird, J., Coil, A. L., Georgakakis, A., et al. 2015, MNRAS, 451, 1892 Allahverdyan & Janzing (2008) ↑ Allahverdyan, A. E., & Janzing, D. 2008, Journal of Statistical Mechanics: Theory and Experiment, 2008, 04001 Allen et al. (2017) ↑ Allen, J.-M. A., Barrett, J., Horsman, D. C., Lee, C. M., & Spekkens, R. W. 2017, Phys. Rev. X, 7, 031021 Ankur Ankan & Abinash Panda (2015) ↑ Ankur Ankan, & Abinash Panda. 2015, in Proceedings of the 14th Python in Science Conference, ed. Kathryn Huff & James Bergstra, 6 – 11 Bengio et al. (2021) ↑ Bengio, E., Jain, M., Korablyov, M., Precup, D., & Bengio, Y. 2021, in Advances in Neural Information Processing Systems, ed. M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, & J. W. Vaughan, Vol. 34 (Curran Associates, Inc.), 27381–27394.https://proceedings.neurips.cc/paper_files/paper/2021/file/e614f646836aaed9f89ce58e837e2310-Paper.pdf Bengio et al. (2023) ↑ Bengio, Y., Lahlou, S., Deleu, T., et al. 2023, Journal of Machine Learning Research, 24, 1–55 Benson et al. (2003) ↑ Benson, A. J., Bower, R. G., Frenk, C. S., et al. 2003, ApJ, 599, 38 Blank et al. (2019) ↑ Blank, M., Macciò, A. V., Dutton, A. A., & Obreja, A. 2019, MNRAS, 487, 5476 Bluck et al. (2022) ↑ Bluck, A. F. L., Maiolino, R., Brownson, S., et al. 2022, A&A, 659, A160 Bondi (1952) ↑ Bondi, H. 1952, MNRAS, 112, 195 Bower et al. (2006) ↑ Bower, R. G., Benson, A. J., Malbon, R., et al. 2006, MNRAS, 370, 645 Bradbury et al. (2018) ↑ Bradbury, J., Frostig, R., Hawkins, P., et al. 2018, JAX: composable transformations of Python+NumPy programs, 0.3.13.http://github.com/google/jax Brockman et al. (2016) ↑ Brockman, G., Cheung, V., Pettersson, L., et al. 2016, OpenAI Gym.https://arxiv.org/abs/1606.01540 Cannarozzo et al. (2023) ↑ Cannarozzo, C., Leauthaud, A., Oyarzún, G. A., et al. 2023, MNRAS, 520, 5651 Catelan et al. (2001) ↑ Catelan, M., Ferraro, F. R., & Rood, R. T. 2001, ApJ, 560, 970 Chandrasekhar (1943) ↑ Chandrasekhar, S. 1943, ApJ, 97, 255 Cheng et al. (2018) ↑ Cheng, A. F., Rivkin, A. S., Michel, P., et al. 2018, Planet. Space Sci., 157, 104 Chickering (2002) ↑ Chickering, D. 2002, Journal of Machine Learning Research, 3, 507-554 Cluver et al. (2017) ↑ Cluver, M. E., Jarrett, T. H., Dale, D. A., et al. 2017, ApJ, 850, 68 Cluver et al. (2024) ↑ —. 2024, arXiv e-prints, arXiv:2410.13483 Crain et al. (2015) ↑ Crain, R. A., Schaye, J., Bower, R. G., et al. 2015, MNRAS, 450, 1937 Croton et al. (2006) ↑ Croton, D. J., Springel, V., White, S. D. M., et al. 2006, MNRAS, 365, 11 Dai et al. (2024) ↑ Dai, H., Ng, I., Zheng, Y., Gao, Z., & Zhang, K. 2024, in Proceedings of Machine Learning Research, Vol. 238, Proceedings of The 27th International Conference on Artificial Intelligence and Statistics, ed. S. Dasgupta, S. Mandt, & Y. Li (PMLR), 154–162.https://proceedings.mlr.press/v238/dai24a.html Davis & Jin (2024) ↑ Davis, B., & Jin, Z. 2024, in American Astronomical Society Meeting Abstracts, Vol. 56, American Astronomical Society Meeting Abstracts, 152.06 Davis et al. (2018) ↑ Davis, B. L., Graham, A. W., & Cameron, E. 2018, ApJ, 869, 113 Davis et al. (2019a) ↑ —. 2019a, ApJ, 873, 85 Davis et al. (2019b) ↑ Davis, B. L., Graham, A. W., & Combes, F. 2019b, ApJ, 877, 64 Davis et al. (2017) ↑ Davis, B. L., Graham, A. W., & Seigar, M. S. 2017, MNRAS, 471, 2187 Davis & Jin (2023) ↑ Davis, B. L., & Jin, Z. 2023, ApJ, 956, L22 Deleu et al. (2022) ↑ Deleu, T., Góis, A., Emezue, C. C., et al. 2022, in The 38th Conference on Uncertainty in Artificial Intelligence.https://openreview.net/forum?id=HElfed8j9g9 Delvecchio et al. (2014) ↑ Delvecchio, I., Gruppioni, C., Pozzi, F., et al. 2014, MNRAS, 439, 2736 Di Capua et al. (2020) ↑ Di Capua, G., Runge, J., Donner, R. V., et al. 2020, Weather and Climate Dynamics, 1, 519 Di Matteo et al. (2008) ↑ Di Matteo, T., Colberg, J., Springel, V., Hernquist, L., & Sijacki, D. 2008, ApJ, 676, 33 Di Matteo et al. (2005) ↑ Di Matteo, T., Springel, V., & Hernquist, L. 2005, Nature, 433, 604 Djorgovski & Davis (1987) ↑ Djorgovski, S., & Davis, M. 1987, ApJ, 313, 59 Eales et al. (2018) ↑ Eales, S., Smith, D., Bourne, N., et al. 2018, MNRAS, 473, 3507 Eggen et al. (1962) ↑ Eggen, O. J., Lynden-Bell, D., & Sandage, A. R. 1962, ApJ, 136, 748 Ellison et al. (2019) ↑ Ellison, S. L., Viswanathan, A., Patton, D. R., et al. 2019, MNRAS, 487, 2491 Emezue et al. (2023) ↑ Emezue, C. C., Drouin, A., Deleu, T., Bauer, S., & Bengio, Y. 2023, in ICML 2023 Workshop on Structured Probabilistic Inference & Generative Modeling.https://openreview.net/forum?id=9aDnWNPyeC Erdos & Renyi (1960) ↑ Erdos, P., & Renyi, A. 1960, Publications of the Mathematical Institute of the Hungarian Academy of Sciences, 5, 17-61 Falcón-Barroso et al. (2002) ↑ Falcón-Barroso, J., Peletier, R. F., & Balcells, M. 2002, MNRAS, 335, 741 Ferrarese & Merritt (2000) ↑ Ferrarese, L., & Merritt, D. 2000, ApJ, 539, L9 Fu et al. (2013) ↑ Fu, J., Kauffmann, G., Huang, M.-l., et al. 2013, MNRAS, 434, 1531 Gaspari et al. (2013) ↑ Gaspari, M., Ruszkowski, M., & Oh, S. P. 2013, MNRAS, 432, 3401 Gebhard et al. (2022) ↑ Gebhard, T. D., Bonse, M. J., Quanz, S. P., & Schölkopf, B. 2022, A&A, 666, A9 Gebhardt et al. (2000) ↑ Gebhardt, K., Bender, R., Bower, G., et al. 2000, ApJ, 539, L13 Geiger & Heckerman (1994) ↑ Geiger, D., & Heckerman, D. 1994, in Proceedings of the Tenth International Conference on Uncertainty in Artificial Intelligence, UAI’94 (San Francisco, CA, USA: Morgan Kaufmann Publishers Inc.), 235–243 Geiger & Heckerman (2002) ↑ Geiger, D., & Heckerman, D. 2002, Ann. Statist., 30, 1412–1440 Ghassami et al. (2020) ↑ Ghassami, A., Yang, A., Kiyavash, N., & Zhang, K. 2020, in Proceedings of Machine Learning Research, Vol. 119, Proceedings of the 37th International Conference on Machine Learning, ed. H. D. III & A. Singh (PMLR), 3494–3504.https://proceedings.mlr.press/v119/ghassami20a.html Glymour (2012) ↑ Glymour, C. 2012, in Advances in Machine Learning and Data Mining for Astronomy, ed. M. J. Way, J. D. Scargle, K. M. Ali, & A. N. Srivastava, 11–26 Graham (2019) ↑ Graham, A. W. 2019, MNRAS, 487, 4995 Graham (2023a) ↑ —. 2023a, MNRAS, 522, 3588 Graham (2023b) ↑ —. 2023b, MNRAS, 521, 1023 Graham (2023c) ↑ —. 2023c, MNRAS, 518, 6293 Graham (2024a) ↑ —. 2024a, MNRAS, 531, 230 Graham (2024b) ↑ —. 2024b, MNRAS, 535, 299 Graham et al. (2024) ↑ Graham, A. W., Jarrett, T. H., & Cluver, M. E. 2024, MNRAS, 527, 10059 Graham et al. (2011) ↑ Graham, A. W., Onken, C. A., Athanassoula, E., & Combes, F. 2011, MNRAS, 412, 2211 Graham & Sahu (2023a) ↑ Graham, A. W., & Sahu, N. 2023a, MNRAS, 518, 2177 Graham & Sahu (2023b) ↑ —. 2023b, MNRAS, 520, 1975 Graham & Scott (2013) ↑ Graham, A. W., & Scott, N. 2013, ApJ, 764, 151 Graham & Worley (2008) ↑ Graham, A. W., & Worley, C. C. 2008, MNRAS, 388, 1708 Granato et al. (2004) ↑ Granato, G. L., De Zotti, G., Silva, L., Bressan, A., & Danese, L. 2004, ApJ, 600, 580 Greenwood (1945) ↑ Greenwood, E. 1945, Experimental Sociology (New York Chichester, West Sussex: Columbia University Press).https://doi.org/10.7312/gree91078 Guo et al. (2013) ↑ Guo, Q., White, S., Angulo, R. E., et al. 2013, MNRAS, 428, 1351 Guo et al. (2011) ↑ Guo, Q., White, S., Boylan-Kolchin, M., et al. 2011, MNRAS, 413, 101 Hagberg et al. (2008) ↑ Hagberg, A. A., Schult, D. A., & Swart, P. J. 2008, in Proceedings of the 7th Python in Science Conference, ed. G. Varoquaux, T. Vaught, & J. Millman, Pasadena, CA USA, 11 – 15 Harikane et al. (2023) ↑ Harikane, Y., Ouchi, M., Oguri, M., et al. 2023, ApJS, 265, 5 Harris et al. (2020) ↑ Harris, C. R., Millman, K. J., van der Walt, S. J., et al. 2020, Nature, 585, 357–362 Heckerman et al. (1995) ↑ Heckerman, D., Geiger, D., & Chickering, D. 1995, Machine Learning, 20, 197-243 Heckman & Best (2014) ↑ Heckman, T. M., & Best, P. N. 2014, ARA&A, 52, 589 Henriques et al. (2015) ↑ Henriques, B. M. B., White, S. D. M., Thomas, P. A., et al. 2015, MNRAS, 451, 2663 Hopkins et al. (2007) ↑ Hopkins, P. F., Bundy, K., Hernquist, L., & Ellis, R. S. 2007, ApJ, 659, 976 Hopkins et al. (2008a) ↑ Hopkins, P. F., Cox, T. J., Kereš, D., & Hernquist, L. 2008a, ApJS, 175, 390 Hopkins et al. (2006) ↑ Hopkins, P. F., Hernquist, L., Cox, T. J., et al. 2006, ApJS, 163, 1 Hopkins et al. (2008b) ↑ Hopkins, P. F., Hernquist, L., Cox, T. J., & Kereš, D. 2008b, ApJS, 175, 356 Huang et al. (2018) ↑ Huang, B., Zhang, K., Lin, Y., Schölkopf, B., & Glymour, C. 2018, in Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18 (New York, NY, USA: Association for Computing Machinery), 1551–1560.https://doi.org/10.1145/3219819.3220104 Hubble (1936) ↑ Hubble, E. P. 1936, Realm of the Nebulae Hunter (2007) ↑ Hunter, J. D. 2007, Computing in Science & Engineering, 9, 90–95 Imbens & Lemieux (2008) ↑ Imbens, G. W., & Lemieux, T. 2008, Journal of econometrics, 142, 615–635 Jahnke & Macciò (2011) ↑ Jahnke, K., & Macciò, A. V. 2011, ApJ, 734, 92 Janzing (2007) ↑ Janzing, D. 2007, arXiv e-prints, arXiv:0708.3411 Jarrett et al. (2023) ↑ Jarrett, T. H., Cluver, M. E., Taylor, E. N., et al. 2023, ApJ, 946, 95 Jeans (1928) ↑ Jeans, J. H. 1928, Astronomy and cosmogony Jin et al. (2024) ↑ Jin, Z., Pasquato, M., Davis, B. L., Macciò, A. V., & Hezaveh, Y. 2024, arXiv e-prints, arXiv:2410.14775 Kalainathan et al. (2020) ↑ Kalainathan, D., Goudet, O., & Dutta, R. 2020, Journal of Machine Learning Research, 21, 1–5 Kauffmann & Haehnelt (2000) ↑ Kauffmann, G., & Haehnelt, M. 2000, MNRAS, 311, 576 Khan et al. (2020) ↑ Khan, F. M., Mirza, M. A., & Holley-Bockelmann, K. 2020, MNRAS, 492, 256 Kormendy & Ho (2013) ↑ Kormendy, J., & Ho, L. C. 2013, ARA&A, 51, 511 Kuipers et al. (2014) ↑ Kuipers, J., Moffa, G., & Heckerman, D. 2014, arXiv e-prints, arXiv:1402.6863 Larson et al. (2023) ↑ Larson, R. L., Finkelstein, S. L., Kocevski, D. D., et al. 2023, ApJ, 953, L29 Leifer & Spekkens (2013) ↑ Leifer, M. S., & Spekkens, R. W. 2013, Phys. Rev. A, 88, 052130 Limberg (2024) ↑ Limberg, G. 2024, arXiv e-prints, arXiv:2411.11251 Luo et al. (2016) ↑ Luo, Y., Kang, X., Kauffmann, G., & Fu, J. 2016, MNRAS, 458, 366 M. Mooij & Claassen (2020) ↑ M. Mooij, J., & Claassen, T. 2020, in Proceedings of Machine Learning Research, Vol. 124, Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence (UAI), ed. J. Peters & D. Sontag (PMLR), 1159–1168.https://proceedings.mlr.press/v124/m-mooij20a.html Madigan et al. (1995) ↑ Madigan, D., York, J., & Allard, D. 1995, International Statistical Review / Revue Internationale de Statistique, 63, 215–232 Magorrian et al. (1998) ↑ Magorrian, J., Tremaine, S., Richstone, D., et al. 1998, AJ, 115, 2285 Makarov et al. (2014) ↑ Makarov, D., Prugniel, P., Terekhova, N., Courtois, H., & Vauglin, I. 2014, A&A, 570, A13 Marinacci et al. (2018) ↑ Marinacci, F., Vogelsberger, M., Pakmor, R., et al. 2018, MNRAS, 480, 5113 McKinney (2010) ↑ McKinney, W. 2010, in Proceedings of the 9th Python in Science Conference, Vol. 445, Austin, TX, 51–56 Menegozzo et al. (2022) ↑ Menegozzo, G., Dall’Alba, D., & Fiorini, P. 2022, in 2022 IEEE 18th International Conference on Automation Science and Engineering (CASE), 2124–2131 Mould (2020) ↑ Mould, J. 2020, Frontiers in Astronomy and Space Sciences, 7, 21 Naiman et al. (2018) ↑ Naiman, J. P., Pillepich, A., Springel, V., et al. 2018, MNRAS, 477, 1206 Natarajan et al. (2023a) ↑ Natarajan, P., Tang, K. S., Khochfar, S., et al. 2023a, Nature Astronomy, 7, 879 Natarajan et al. (2023b) ↑ Natarajan, P., Tang, K. S., McGibbon, R., et al. 2023b, The Astrophysical Journal, 952, 146 Nelson et al. (2018) ↑ Nelson, D., Pillepich, A., Springel, V., et al. 2018, MNRAS, 475, 624 OEIS Foundation Inc. (2024a) ↑ OEIS Foundation Inc. 2024a, Number of acyclic digraphs (or DAGs) with n labeled nodes, Entry A003024 in The On-Line Encyclopedia of Integer Sequences.https://oeis.org/A003024 OEIS Foundation Inc. (2024b) ↑ —. 2024b, Number of essential graphs with n nodes (in 1-1 correspondence with Markov equivalence classes of acyclic digraphs).https://oeis.org/A007984 Ostriker et al. (1972) ↑ Ostriker, J. P., Spitzer, Lyman, J., & Chevalier, R. A. 1972, ApJ, 176, L51 Pang et al. (2021) ↑ Pang, X., Yu, Z., Tang, S.-Y., et al. 2021, ApJ, 923, 20 Pasquato (2024) ↑ Pasquato, M. 2024, in EAS2024, European Astronomical Society Annual Meeting, 362 Pasquato et al. (2023) ↑ Pasquato, M., Jin, Z., Lemos, P., Davis, B. L., & Macciò, A. V. 2023, arXiv e-prints, arXiv:2311.15160 Pasquato & Matsiuk (2019) ↑ Pasquato, M., & Matsiuk, N. 2019, Research Notes of the American Astronomical Society, 3, 179 Pearl (2009) ↑ Pearl, J. 2009, Causality (Cambridge university press) Pearl et al. (2016) ↑ Pearl, J., Glymour, M., & Jewell, N. P. 2016, Causal inference in statistics : a primer (Chichester, West Sussex: Wiley) Peca et al. (2023) ↑ Peca, A., Cappelluti, N., Urry, C. M., et al. 2023, ApJ, 943, 162 Pillepich et al. (2018) ↑ Pillepich, A., Nelson, D., Hernquist, L., et al. 2018, MNRAS, 475, 648 Planck Collaboration et al. (2020) ↑ Planck Collaboration, Aghanim, N., Akrami, Y., et al. 2020, A&A, 641, A6 Pouliasis et al. (2024) ↑ Pouliasis, E., Ruiz, A., Georgantopoulos, I., et al. 2024, A&A, 685, A97 Rodriguez-Gomez et al. (2015) ↑ Rodriguez-Gomez, V., Genel, S., Vogelsberger, M., et al. 2015, MNRAS, 449, 49 Rodriguez-Gomez et al. (2016) ↑ Rodriguez-Gomez, V., Pillepich, A., Sales, L. V., et al. 2016, MNRAS, 458, 2371 Runge et al. (2019) ↑ Runge, J., Bathiany, S., Bollt, E., et al. 2019, Nature communications, 10, 2553 Sachs et al. (2005) ↑ Sachs, K., Perez, O., Pe’er, D., Lauffenburger, D. A., & Nolan, G. P. 2005, Science, 308, 523-529 Sahu et al. (2019a) ↑ Sahu, N., Graham, A. W., & Davis, B. L. 2019a, ApJ, 876, 155 Sahu et al. (2019b) ↑ —. 2019b, ApJ, 887, 10 Sahu et al. (2020) ↑ —. 2020, ApJ, 903, 97 Sahu et al. (2023) ↑ Sahu, N., Graham, A. W., & Hon, D. S. H. 2023, MNRAS, 518, 1352 Savorgnan et al. (2013) ↑ Savorgnan, G., Graham, A. W., Marconi, A., et al. 2013, MNRAS, 434, 387 Savorgnan & Graham (2016) ↑ Savorgnan, G. A. D., & Graham, A. W. 2016, ApJS, 222, 10 Savorgnan et al. (2016) ↑ Savorgnan, G. A. D., Graham, A. W., Marconi, A., & Sani, E. 2016, ApJ, 817, 21 Schaye et al. (2010) ↑ Schaye, J., Dalla Vecchia, C., Booth, C. M., et al. 2010, MNRAS, 402, 1536 Schölkopf et al. (2016) ↑ Schölkopf, B., Hogg, D. W., Wang, D., et al. 2016, Proceedings of the National Academy of Science, 113, 7391 Schölkopf et al. (2021) ↑ Schölkopf, B., Locatello, F., Bauer, S., et al. 2021, Proceedings of the IEEE, 109, 612-634 Scott et al. (2013) ↑ Scott, N., Graham, A. W., & Schombert, J. 2013, ApJ, 768, 76 Scutari (2017) ↑ Scutari, M. 2017, Journal of Statistical Software, 77, 1-20 Sérsic (1963) ↑ Sérsic, J. L. 1963, Boletin de la Asociacion Argentina de Astronomia La Plata Argentina, 6, 41 Shankar et al. (2016) ↑ Shankar, F., Bernardi, M., Sheth, R. K., et al. 2016, MNRAS, 460, 3119 Silk & Rees (1998) ↑ Silk, J., & Rees, M. J. 1998, A&A, 331, L1 Soliman et al. (2023) ↑ Soliman, N. H., Macciò, A. V., & Blank, M. 2023, MNRAS, 525, 12 Spekkens (2023) ↑ Spekkens, R. 2023, Causal Inference Lecture - 230306, Perimeter Institute.https://pirsa.org/23030069 Spirtes (2001) ↑ Spirtes, P. 2001, in Proceedings of Machine Learning Research, Vol. R3, Proceedings of the Eighth International Workshop on Artificial Intelligence and Statistics, ed. T. S. Richardson & T. S. Jaakkola (PMLR), 278–285.https://proceedings.mlr.press/r3/spirtes01a.html Spirtes et al. (2001) ↑ Spirtes, P., Glymour, C., & Scheines, R. 2001, Causation, Prediction, and Search (The MIT Press).https://doi.org/10.7551/mitpress/1754.001.0001 Springel et al. (2018) ↑ Springel, V., Pakmor, R., Pillepich, A., et al. 2018, MNRAS, 475, 676 Treister et al. (2012) ↑ Treister, E., Schawinski, K., Urry, C. M., & Simmons, B. D. 2012, ApJ, 758, L39 Van Rossum & Drake (2009) ↑ Van Rossum, G., & Drake, F. L. 2009, Python 3 Reference Manual (Scotts Valley, CA: CreateSpace) Viinikka et al. (2020) ↑ Viinikka, J., Hyttinen, A., Pensar, J., & Koivisto, M. 2020, in Advances in Neural Information Processing Systems, ed. H. Larochelle, M. Ranzato, R. Hadsell, M. Balcan, & H. Lin, Vol. 33 (Curran Associates, Inc.), 6584–6594.https://proceedings.neurips.cc/paper_files/paper/2020/file/48f7d3043bc03e6c48a6f0ebc0f258a8-Paper.pdf Virtanen et al. (2020) ↑ Virtanen, P., Gommers, R., Oliphant, T. E., et al. 2020, Nature Methods, 17, 261 Vowels et al. (2022) ↑ Vowels, M. J., Camgoz, N. C., & Bowden, R. 2022, ACM Comput. Surv., 55 Wang et al. (2015) ↑ Wang, L., Dutton, A. A., Stinson, G. S., et al. 2015, MNRAS, 454, 83 Warshall (1962) ↑ Warshall, S. 1962, J. ACM, 9, 11–12 Waskom (2021) ↑ Waskom, M. L. 2021, Journal of Open Source Software, 6, 3021 Waterval et al. (2024) ↑ Waterval, S., Macciò, A. V., Buck, T., et al. 2024, MNRAS, 533, 1463 Winkel et al. (2024) ↑ Winkel, N., Bennert, V. N., Remigio, R. P., et al. 2024, The Astrophysical Journal, 978, 115 Wood & Spekkens (2015) ↑ Wood, C. J., & Spekkens, R. W. 2015, New Journal of Physics, 17, 033002 Wright et al. (2010) ↑ Wright, E. L., Eisenhardt, P. R. M., Mainzer, A. K., et al. 2010, AJ, 140, 1868 Zhang et al. (2024) ↑ Zhang, K., Xie, S., Ng, I., & Zheng, Y. 2024, arXiv e-prints, arXiv:2402.05052 Zheng et al. (2024) ↑ Zheng, Y., Huang, B., Chen, W., et al. 2024, Journal of Machine Learning Research, 25, 1–8 Report Issue Report Issue for Selection Generated by L A T E xml Instructions for reporting errors

We are continuing to improve HTML versions of papers, and your feedback helps enhance accessibility and mobile support. To report errors in the HTML that will help us improve conversion and rendering, choose any of the methods listed below:

Click the "Report Issue" button. Open a report feedback form via keyboard, use "Ctrl + ?". Make a text selection and click the "Report Issue for Selection" button near your cursor. You can use Alt+Y to toggle on and Alt+Shift+Y to toggle off accessible reporting links at each section.

Our team has already identified the following issues. We appreciate your time reviewing and reporting rendering errors we may not have found yet. Your efforts will help us improve the HTML versions for all readers, because disability should not be a barrier to accessing research. Thank you for your continued support in championing open access for all.

Have a free development cycle? Help support accessibility at arXiv! Our collaborators at LaTeXML maintain a list of packages that need conversion, and welcome developer contributions.

Xet Storage Details

Size:: 122 kB
Xet hash:: ea29d90c7c9f82ba47c22a45052f5a5c00c5d3b98b2d541e0e63350de03d92ed

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.