102 kB

Title: First Light and Reionisation Epoch Simulations (Flares) X: Environmental Galaxy Bias and Survey Variance at High Redshift

URL Source: https://arxiv.org/html/2301.09510

Markdown Content: Peter A.Thomas 1,1{}^{1},start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT , Christopher C. Lovell 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT, Maxwell G.A.Maltz 1 1{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT, Aswin P.Vijayan 3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT, Stephen M.Wilkins 1 1{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT, Dimitrios Irodotou 4 4{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPT, William J.Roper 1 1{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT, Louise Seeyave 1 1{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT

1 1{}^{1}start_FLOATSUPERSCRIPT 1 end_FLOATSUPERSCRIPT Astronomy Centre, University of Sussex, Falmer, Brighton BN1 9QH, UK

2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT Institute of Cosmology and Gravitation, University of Portsmouth, Burnaby Road, Portsmouth, PO1 3FX, UK

3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT Cosmic Dawn Center (DAWN), DTU-Space, Technical University of Denmark, Elektrovej 327, DK-2800 Kgs. Lyngby, Denmark

4 4{}^{4}start_FLOATSUPERSCRIPT 4 end_FLOATSUPERSCRIPT Department of Physics, University of Helsinki, Gustaf Hällströmin katu 2, FI-00014, Helsinki, Finland

(Accepted XXX. Received YYY; in original form ZZZ)

Abstract

Upcoming deep galaxy surveys with JWST will probe galaxy evolution during the epoch of reionisation (EoR, 5≤z≤10 5 𝑧 10 5\leq z\leq 10 5 ≤ italic_z ≤ 10) over relatively compact areas (e.g.∼similar-to\sim∼ 300 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT for the JADES GTO survey). It is therefore imperative that we understand the degree of survey variance, to evaluate how representative the galaxy populations in these studies will be. We use the First Light And Reionisation Epoch Simulations (Flares) to measure the galaxy bias of various tracers over an unprecedentedly large range in overdensity for a hydrodynamic simulation, and use these relations to assess the impact of bias and clustering on survey variance in the EoR. Star formation is highly biased relative to the underlying dark matter distribution, with the mean ratio of the stellar to dark matter density varying by a factor of 100 between regions of low and high matter overdensity (smoothed on a scale of 14 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc). This is reflected in the galaxy distribution – the most massive galaxies are found solely in regions of high overdensity. As a consequence of the above, galaxies in the EoR are highly clustered, which can lead to large variance in survey number counts. For mean number counts N≲100 less-than-or-similar-to 𝑁 100 N\lesssim 100 italic_N ≲ 100 (1000), in a unit redshift slice of angular area 300 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT (1.4 deg 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT), the 2-sigma range in N 𝑁 N italic_N is roughly a factor of four (two). We present relations between the expected variance and survey area for different survey geometries; these relations will be of use to observers wishing to understand the impact of survey variance on their results.

keywords:

galaxies: high-redshift – galaxies: luminosity function, mass function

††pubyear: 2023 1 Introduction

This paper investigates the clustering and bias of galaxies in the Epoch of Reionisation (EoR), 5≲z≲10 less-than-or-similar-to 5 𝑧 less-than-or-similar-to 10 5\lesssim z\lesssim 10 5 ≲ italic_z ≲ 10 using the First Light and Reionisation Epoch Simulations (Lovell et al., 2021, hereafter Flares-I). This can lead to variations in the number counts of upcoming galaxy surveys in the EoR (95 percentile range) of factors of around 2−--4.

Galaxies form within dark matter haloes, which themselves form at the peaks of the density field (smoothed on the halo mass scale) and which are overdense with respect to the background (Zeldovich et al., 1982; Kaiser, 1984). In the early Universe especially, those peaks rely on contributions from a wide range of scales (e.g. Bardeen et al., 1986) and can therefore only be properly represented in a region of very large extent. The non-linear relationship between galaxies and the underlying matter distribution is known as galaxy bias, a term which is also used more generally to describe the relation between a range of different galaxy tracers and the underlying matter distribution (see a review by Desjacques et al., 2018).

Survey variance 1 1 1 We avoid the oft-used term cosmic variance which more accurately describes the uncertainty from having a single observable universe. describes the uncertainty in observed estimates of galaxy number densities that arises from spatial variation within different survey volumes: both clustering of dark matter and galaxy bias contribute to this effect, in addition to stochasticity in the galaxy formation process itself. The choice of survey area and geometry is closely linked to the amplitude of these fluctuations and can give rise to significant variation in the measured number counts. Any actual survey will also be subject to sample variance arising from Poisson counting statistics, and this is likely to dominate at small number counts. We do not include that in our analysis as the survey and sample variances, as described above, can simply be added in quadrature to find the total variance in the observations.

A combination of dark matter only (DMO) simulations and analytic models are a computationally efficient means of assessing the variance over large volumes. These tend to connect haloes to galaxies given some mass – luminosity relation, or using some abundance matching prescription (e.g. Newman & Davis, 2002; Somerville et al., 2004; Stark et al., 2007; Trenti & Stiavelli, 2008; Moster et al., 2011). Ideally, however, one would use a more astrophysical, semi-analytic model (SAM, for a comparative review see Knebe et al., 2018) for which simulations with sufficient resolution are limited in size. The most well-known and well-used of these is the Millennium Simulation (Springel et al., 2005) which has a volume of just (500 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc)3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT, for which estimates of survey variance out to z=5 𝑧 5 z=5 italic_z = 5 were undertaken by Kitzbichler & White (2007). Larger volumes are available at lower resolution (see, e.g., Kim et al., 2009; Angulo et al., 2012; Maksimova et al., 2021), more suitable for use with (sub)halo abundance matching.

To more accurately model galaxies, hydrodynamic simulations are required and these have even more limited extent. The first to be widely used were Illustris and Eagle(Genel et al., 2014; Schaye et al., 2015, respectively), both of order (70 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc)3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT, followed by Simba(Davé et al., 2019) at (100 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc)3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT and Illustris-TNG(Nelson et al., 2017; Pillepich et al., 2017) at (200 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc)3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT. Perhaps the most ambitious in this respect is the large-scale simulation Bluetides(Feng et al., 2015) which simulated (400 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc)3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT – still less than the Millennium Simulation – but only down to z≈8 𝑧 8 z\approx 8 italic_z ≈ 8.

To overcome this limitation requires new approaches. One such is zoom simulations which run hydrodynamics at high resolution in selected regions of very large, low resolution, DMO simulations (e.g. Katz & White, 1993; Bahé et al., 2017; Barnes et al., 2017, which all concentrated on massive clusters). Flares built on this approach to simulate galaxy formation in a wide range of environments (following the approach adopted in Crain et al., 2009) within a (2.2 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cGpc)3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT box. It resimulates 40 regions with a wide range of overdensities, allowing us both to capture the very high overdensity environments within which the first galaxies will form, but also to investigate in detail the dependence of galaxy formation upon environment.

In recent years a number of multiwavelength surveys have measured the abundances and properties of galaxies at high redshift (e.g. González et al., 2011; Duncan et al., 2014; Song et al., 2016; Stefanon et al., 2017; Bhatawdekar et al., 2019). Deep galaxy surveys using JWST over the coming years will measure many of these functions to much greater depth, increasing the redshift and dynamic range probed, e.g.: CEERS (Bagley et al., 2022), COSMOS-Web (Casey et al., 2022), GLASS-ERS (Treu et al., 2022), JADES (Rieke, 2020) and PRIMER Dunlop et al. (2021). These surveys will cover areas in the range 100−2000 100 2000 100-2000 100 - 2000 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT and one of the purposes of this paper is to estimate the effect of survey variance on the expected number counts. This is particularly pertinent given the recent discovery of massive galaxy candidates at very high redshifts in relatively small early fields (e.g. Donnan et al., 2022; Labbe et al., 2022; Adams22; Harikane et al., 2022; Rodighiero et al., 2022; Naidu et al., 2022).

A number of studies have used analytic methods to estimate survey variance at these high redshift (Somerville et al., 2004; Trenti & Stiavelli, 2008; Trapp & Furlanetto, 2020; Trapp et al., 2022; Einasto et al., 2023), and have shown that the normalisation and slope of measured luminosity functions can be significantly affected. However, these studies use simplified models to map galaxies onto dark matter halos, which have not been tested in this regime. Also, they are presented in a way that is hard to relate to the population of galaxies likely to be observed in deep surveys. Trenti & Stiavelli (2008) and Moster et al. (2011) provide cosmic variance calculators (CVCs) that can be used to estimate the survey variance for given field sizes and depths. They each reach similar conclusions to this paper, in that they show that the variance can be significant for small areas and number counts, but we find a larger magnitude for the effect than they do. The former provide an online CVC and we contrast our results with theirs in Section4 below.

One purpose of this paper is to investigate the relationship between galaxies and the underlying dark matter distribution in the EoR in a much more direct way than previous studies, using a hydrodynamic method (Eagle, Schaye et al., 2015) that has been shown to reproduce the galaxy population extremely well in the current day Universe and which provides a good match to the observed luminosity functions of galaxies in the EoR (Vijayan et al., 2021). We measure the galaxy bias of various components over an unprecendented range in overdensity for a hydrodynamic simulation, and provide new estimates of the effect of survey variance on high redshift galaxy number counts.

Section2 briefly describes Flares, the method that we use to define large-scale overdensity, and to map stars and galaxies onto the dark matter distribution. Section3 presents results for the biasing of the smooth stellar distribution and of galaxies relative to that of the dark matter. Section4 then explores the clustering of those galaxies in areas typical of those of deep surveys. Finally, Section5 summarises our conclusions.

2 Method

2.1 Flares

The First Light And Reionisation Epoch Simulations (Flares Lovell et al., 2021; Vijayan et al., 2021) are a series of 40 large zoom simulations selected at z=4.69 𝑧 4.69 z=4.69 italic_z = 4.69. Flares uses the same hydrodynamics code, Anarchy, as the Eagle simulation, described in detail in (Schaye et al., 2015; Schaller et al., 2015). It employs the AGNdT9 parameter configuration, which leads to a closer match with observational constraints on the hot gas properties in groups and clusters (Barnes et al., 2017) than does the standard configuration, although in Flares these changes should have little effect, since the number of such massive halos is very low at z=5 𝑧 5 z=5 italic_z = 5.

Flares uses an identical resolution to the fiducial Eagle simulation, with gas particle mass m g=1.8×10 6⁢M⊙subscript 𝑚 g 1.8 superscript 10 6 subscript M direct-product m_{\mathrm{g}}=1.8\times 10^{6},\mathrm{M_{\odot}}italic_m start_POSTSUBSCRIPT roman_g end_POSTSUBSCRIPT = 1.8 × 10 start_POSTSUPERSCRIPT 6 end_POSTSUPERSCRIPT roman_M start_POSTSUBSCRIPT ⊙ end_POSTSUBSCRIPT, and a softening length of 2.66⁢ckpc 2.66 ckpc 2.66,\mathrm{ckpc}2.66 roman_ckpc. Resimulation regions are selected from the same (3.2 cGpc)3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT dark matter-only parent simulation as that used in the C-Eagle simulations (Barnes et al., 2017). The highest redshift snapshot available for this simulation is at z=4.69 𝑧 4.69 z=4.69 italic_z = 4.69, which was used to select spherical volumes that sample a range of overdensities. The size of the resimulation regions (radius 14 h−1 superscript ℎ 1 h^{-1},italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc) was chosen such that density fluctuations averaged on that scale are linear: then the distortion in the shape of the Lagrangian volume during the simulation is relatively small, and the ordering of the density fluctuations is preserved. Full details on the 40 selected regions and their overdensities are provided in Flares-I.

As shown in Flares-I, the galactic stellar mass functions from the Flares simulations agree with those from Eagle at z=5−10 𝑧 5 10 z=5-10 italic_z = 5 - 10 in the mass range within which they overlap, but with those from Flares extending to higher masses that are not accessible within the limited Eagle volume. As Eagle has been shown to agree well with observations of galaxies in the low-redshift universe (Schaye et al., 2015) that gives us confidence that our galaxies will also provide a reasonable match to the real galaxy population in the EoR. This is reflected in the success of Flares in matching existing observations in this regime. For example, the Flares galactic stellar mass function (Lovell et al., 2021, Figure 8) is in good agreement with observational constraints at all redshifts up to z=8 𝑧 8 z=8 italic_z = 8, beyond which it is slightly lower than the observations; on the other hand, the UV luminosity function is in excellent agreement with observations at all redshifts up to z=9 𝑧 9 z=9 italic_z = 9(Vijayan et al., 2021, Figure 7), beyond which slightly over-predicts the number counts. The complex astrophysics of star formation and feedback means that the physical nature of high redshift galaxies is quite different from those at low redshift, for example they are much more compact in size (Roper et al., 2022), which renders suspect any extrapolation from (semi)-analytic models constrained by observations at lower redshift.

2.2 Determination of overdensities

It is useful to be able to relate the density of stars, galaxies, or other observable quantities, to the overdensity of matter smoothed on a scale for which fluctutations are still linear and hence deducable from the initial density field. In Flares, we do this at a redshift of 4.69, shortly after the end of reionisation.2 2 2 With that particular redshift being chosen because it was the highest snapshot available for the underlying dark matter simulation.

The parent simulation has a volume of (3.2 cGpc)3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT. We divide that up into 1200 3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT grid cells each of side 2.67 cMpc. We use nearest grid point assignment to associate simulation particles with grid cells. We then determine the mean overdensity of those regions, smoothed using top hat filters of three radii: 10, 14 and 20 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc: we will call these δ 10 subscript 𝛿 10\delta_{10}italic_δ start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT, δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT and δ 20 subscript 𝛿 20\delta_{20}italic_δ start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT, where 1+δ 1 𝛿 1+\delta 1 + italic_δ is the ratio of the density of matter within a smoothing sphere to the mean density of matter within the simulation.

Figure 1: The PDF of overdensities smoothed within top-hat windows of different radii: 10, 14 and 20 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc for the left, centre and right panels, respectively. Blue is the entire simulation box; orange is the regions that we resimulate.

Figure1 shows the probablity densty functions (PDFs) for these three different definitions of overdensity. Blue shows the PDF for the parent simulation; orange shows the overdensities within the regions that we resimulate. You can see that we have deliberately chosen to over-sample regions of high density in order to get a significant population of massive galaxies.

To determine the mean (i.e. universal average) of a given quantity, we need to know how to weight the contributions from individual grid cells. We do this using the procedure described in Section 2.4 of Lovell et al. (2021). Essentially, we count the number of grid cells in bins of overdensity, both in the resimulated regions and within the parent simulation as a whole. The ratio of the latter to the former then gives us the relative weighting that needs to be applied to each resimulated grid cell.

Figure 2: For a subsample of grid cell locations, the relationship between overdensity smoothed with top-hat filters of radii 10 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc and 20 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc.

Figure2 shows the relationship between δ 10 subscript 𝛿 10\delta_{10}italic_δ start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT and δ 20 subscript 𝛿 20\delta_{20}italic_δ start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT. While clearly there is a strong correlation between the two, there is also significant scatter. We have found that all three smoothing radii give very similar results for the quantities that we investigate in this paper and so we stick with the original choice of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT used in Flares-I below.

2.3 Mapping stars and galaxies to dark matter

Although we resimulate only a small fraction of the parent volume, we sample a wide range of environments that span the whole range of overdensities. We use this to populate the parent simulation with galaxies in order to create large mock surveys. To do this, we tabulate galaxy properties 3 3 3 the galaxy stellar mass function (GSMF), or the star formation rate function, (SFRF). within overdensity bins, and then use this as a lookup table to populate grid cells that we have not resimulated.

We map individual particles within the simulation (dark matter, gas, stars or black holes) to the grid cell that they occupy at z=4.69 𝑧 4.69 z=4.69 italic_z = 4.69. This mapping can then be recovered at higher redshifts using the particle IDs that are preserved during the simulation and when particles transform from gas into stars.4 4 4 The exception is merging of black hole particles for which only the ID of the most massive progenitor is stored – hence we trace the main branch.

We have tried using each of δ 10 subscript 𝛿 10\delta_{10}italic_δ start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT, δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT, δ 20 subscript 𝛿 20\delta_{20}italic_δ start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT, the grid cell overdensity without smoothing (δ grid subscript 𝛿 grid\delta_{\mathrm{grid}}italic_δ start_POSTSUBSCRIPT roman_grid end_POSTSUBSCRIPT), and the velocity divergence within a grid cell, both alone and in combination. Although there is a slight reduction in residual scatter when combining two or more diagnostics, the gain is very slight, and we choose in this paper to stick to the single input of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT that was used in Flares-I.

3 Bias

Bias is a measure of how much a quantity is clustered relative to the overall mass density. Section3.1 looks at bias in the distribution of stars and other smoothed quantities within grid cells, and Section3.2 in that of the galactic population. Unless otherwise stated, all results shown here are for a redshift z=4.69 𝑧 4.69 z=4.69 italic_z = 4.69.

3.1 Bias in the matter distribution

We first look at the bias in the distribution of different types of matter within grid cells, compared to that of the dark matter. We plot our results as a function of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT in order to investigate how the bias changes with overdensity.

3.1.1 Dark matter

Figure 3: The dark matter density within grid cells plotted as a function of the mean matter density at that location, smoothed with a top-hat window of radius 14 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc. Only grid cells within the resimulated regions are plotted and used to calculate the mean and median within bins of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT.

Figure3 shows the measured density of dark matter within each grid cell in resimulated regions, compared to the smoothed matter density, 1+δ 14 1 subscript 𝛿 14 1+\delta_{14}1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT at that location. The solid black line shows the 1-to-1 relation, i.e.y=1+δ 14 𝑦 1 subscript 𝛿 14 y=1+\delta_{14}italic_y = 1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT; the blue dashed and magenta dotted lines the mean and median, respectively, averaged in bins of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT. The mean passes through the point (1,1), as is to be expected, but has a slope greater than that of the 1-1 relation: this is because the density averaged within a sphere of radius 14 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc will tend to be closer to 1 than when averaged within grid cells.

Note that the horizontal variation in the density of points simply reflects the overdensity of the regions that we have chosen to resimulate: the excess near δ 14=0 subscript 𝛿 14 0\delta_{14}=0 italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT = 0 comes from the fact that this is the peak of overall density field; that at high values of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT because we have chosen to simulate a large number of regions of high overdensity. The vertical variation in the density of points does, however, show the true variation of ρ DM/ρ¯DM subscript 𝜌 DM subscript¯𝜌 DM\rho_{\mathrm{DM}}/\bar{\rho}_{\mathrm{DM}}italic_ρ start_POSTSUBSCRIPT roman_DM end_POSTSUBSCRIPT / over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT roman_DM end_POSTSUBSCRIPT at a given overdensity.

The scatter in ρ DM/ρ¯DM subscript 𝜌 DM subscript¯𝜌 DM\rho_{\mathrm{DM}}/\bar{\rho}_{\mathrm{DM}}italic_ρ start_POSTSUBSCRIPT roman_DM end_POSTSUBSCRIPT / over¯ start_ARG italic_ρ end_ARG start_POSTSUBSCRIPT roman_DM end_POSTSUBSCRIPT, that is the dark matter density within grid cells measured in units of the universal mean, is very large and roughly symmetrical in the log, i.e.skewed to high values in real space. This skewness is caused by the non-linear growth of density fluctuations within grid cells whose overdensity approaches or exceeds unity. This leads to a huge bias in star formation, as we will see in the next section.

3.1.2 Stellar to dark matter mass ratio

Figure 4: The ratio of stellar to dark matter density within grid cells plotted as a function of the mean matter density at that location, smoothed with a top-hat window of radius 14 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc. Only grid cells within the resimulated regions are plotted and used to calculate the mean and median within bins of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT. The red hexes correspond to an absence of stars, but have been given a nominal value so that they appear on the plot.

Figure4 shows, at z=4.69 𝑧 4.69 z=4.69 italic_z = 4.69, the ratio of the stellar mass to the dark matter mass in individual grid cells, as a function of the smoothed matter density 1+δ 14 1 subscript 𝛿 14 1+\delta_{14}1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT. The points coloured in red correspond to cells that have zero stars, but which have been given a nominal value so that they appear on the plot. The blue dashed and dotted lines correspond to the mean and median value, respectively, within overdensity bins. In magenta, we show the equivalent ratios for the standard Eagle 100 cMpc box, which correspond quite closely to the relation seen in Flares. However, Eagle does not extend to the higher or lower values of overdensity sampled by Flares.

One thing that is immediately apparent is that the mean stellar to dark matter mass ratio varies by a factor of 100 between the highest and lowest values of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT: star formation is thus highly biased towards regions of high overdensity. Moreover, even at a fixed value of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT, the scatter is enormous and the distribution is highly skewed such that the mean is 10 times the median.

Figure 5: The mean stellar to dark matter density ratio within grid cells, as a function of redshift.

The variation of the mean stellar to dark matter density ratio as a function of redshift is shown in Figure5. Note that the value of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT used here is that measured at z=4.69 𝑧 4.69 z=4.69 italic_z = 4.69 (using the ability to track particles over time), so that the same grid cells contribute to the x 𝑥 x italic_x-axis bins at all redshifts. The relative bias as a function of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT steepens slightly over time, with the overall normalisation rising steadily with decreasing redshift.

Figure 6: The distribution of stellar mass densities within grid cells split by overdensity. The large bin on the left captures grid cells that have no stars within them. The legend shows the range of 1+δ 14 1 subscript 𝛿 14 1+\delta_{14}1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT within each colour bin: the peak stellar density shifts to the right as δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT increases.

We show the dispersion of values about the mean in Figure6. As can be seen, there is a huge variation in stellar density, even within grid cells with the same value of the smoothed matter density δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT. For the lowest values of δ 14≲−0.25 less-than-or-similar-to subscript 𝛿 14 0.25\delta_{14}\lesssim-0.25 italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT ≲ - 0.25 more than half the grid cells have no stars whatsoever within them. By contrast, the highest stellar density, within a grid cell with δ 14=0.76 subscript 𝛿 14 0.76\delta_{14}=0.76 italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT = 0.76, is 2.3×10 9 2.3 superscript 10 9 2.3\times 10^{9}2.3 × 10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT M⊙direct-product{}_{\odot}start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT cMpc−3 3{}^{-3}start_FLOATSUPERSCRIPT - 3 end_FLOATSUPERSCRIPT, almost 4 times the universal baryon density: within that grid cell approximately 10 per cent of the baryons have been turned into stars.

3.1.3 Other properties

Figure 7: The mean density of various particle types within grid cells, as a function of the smoothed matter density. The dotted lines show the expected relations if the particles traced the smooth matter distribution.

Figure7 contrasts the density variation of different particle types within grid cells at z=4.69 𝑧 4.69 z=4.69 italic_z = 4.69. The dotted lines show the expected relations if the particles traced the smooth matter distribution. The bias for both dark matter and non star-forming gas is minimal. However, that of star-forming gas, stars themselves, and the mass of metals produced is significant, varying by more than an order of magnitude above and below the mean in the highest and lowest density regions, respectively. A similar effect is seen in the distribution of black hole mass.

Figure 8: The mean star formation and black hole accretion rate densities, as a function of the smoothed matter density. The dotted lines show the expected relations if the rates traced the smooth matter distribution.

Finally, Figure8 shows the star formation and black hole accretion rate densities, which roughly track those of the stellar and black hole mass density, respectively.

3.2 Bias in galaxy properties

Figure 9: The galactic stellar mass function split by overdensity 1+δ 14 1 subscript 𝛿 14 1+\delta_{14}1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT at three different redshifts. The legend shows bins of 1+δ 14 1 subscript 𝛿 14 1+\delta_{14}1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT. The universal mean is shown by the solid, black line.

We now look at the bias in the distribution of integrated galaxy properties, as a function of the matter overdensity.

Figure9 shows the galactic stellar mass function (GSMF) as a function of 1+δ 14 1 subscript 𝛿 14 1+\delta_{14}1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT. This can be directly compared to Fig.9 in Flares-I, which showed a similar plot but with each galaxy being associated with the whole range of overdensities within its resimulation volume, rather than that specific to its individual grid cell. The two plots are very similar except that the new one better captures the true overdensity local to each galaxy, and so has a slightly larger difference between overdensity bins.

The mean GSMF follows a similar form to that for the grid cells of mean matter density (δ 14≈0 subscript 𝛿 14 0\delta_{14}\approx 0 italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT ≈ 0) in the mass range where they overlap, but has a slightly higher normalisation due to the strong bias towards extra star formation in overdense regions. An important thing to note, however, is that, at the high mass end, only the highest overdensity regions contribute to the mass function, increasingly so at higher redshift. These regions are very rare, which gives rise to the exponential decline in the GSMF at the high mass end. High mass galaxies are strongly clustered in these high density regions, leading to a large sample variance in observational surveys: this is discussed further in Section4.3.

Figure 10: The galactic star formation rate split by overdensity 1+δ 14 1 subscript 𝛿 14 1+\delta_{14}1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT at three different redshifts. The legend shows bins of 1+δ 14 1 subscript 𝛿 14 1+\delta_{14}1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT. The universal mean is shown by the solid, black line.

The star formation rate function (SFRF) for galaxies is shown in Figure10, again split by matter overdensity. It shows a similar behaviour to the GSMF, with the largest star formation rates being dominated by galaxies in the highest overdensity bins, especially at high redshift. This reflects the strong positive correlation between stellar mass and star formation at high redshift.

4 Survey variance

In this section, we investigate the clustering of galaxies on the sky and discuss the implications for survey design. This is very much a first look and we make a number of simplifying assumptions. We show that compact surveys such as those that are expected for deep fields are subject to large variance and we will return to a more detailed study of this in future work.

4.1 Populating grid cells with galaxies

We use the information that we have gathered from our high-resolution hydrodynamic simulations to populate the underlying dark-matter-only (DMO) simulation with galaxies. The mass of the DMO particles is 8.01×10 10⁢M⊙8.01 superscript 10 10 M⊙8.01\times 10^{10},\mbox{M${}{\odot}$}8.01 × 10 start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT, meaning that Milky Way sized halos would be barely resolvable, hence we choose instead for the purposes of this current paper to use as input the average properties of dark matter within grid cells. We have investigated a number of different ingredients: as well as the average densities on different smoothing scales, δ 10 subscript 𝛿 10\delta{10}italic_δ start_POSTSUBSCRIPT 10 end_POSTSUBSCRIPT, δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT and δ 20 subscript 𝛿 20\delta_{20}italic_δ start_POSTSUBSCRIPT 20 end_POSTSUBSCRIPT, described above, we have also tried the unsmoothed density within an individual grid cell, ρ grid subscript 𝜌 grid\rho_{\mathrm{grid}}italic_ρ start_POSTSUBSCRIPT roman_grid end_POSTSUBSCRIPT, and the divergence of the local velocity field within each cell. We find that all are highly correlated and have similar predictive power for the determination of the bias, ρ star/ρ DM subscript 𝜌 star subscript 𝜌 DM\rho_{\mathrm{star}}/\rho_{\mathrm{DM}}italic_ρ start_POSTSUBSCRIPT roman_star end_POSTSUBSCRIPT / italic_ρ start_POSTSUBSCRIPT roman_DM end_POSTSUBSCRIPT, within each grid cell. We have also checked that combining two or more of these inputs provides only a marginal improvement in predictive accuracy. For that reason, we stick here with the quantity that we have used both in the design of Flares and throughout most of this paper, δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT.

We tabulate the galactic stellar mass function (GSMF) as a function of overdensity and redshift; i.e. we determine the expected number of galaxies in each grid cell, which will be a fractional number. AppendixA investigates the variance in this GSMF and shows that it is close to Poisson, even for low galaxy number counts. As we are interested in the variation in number counts caused by large scale structure, we do not include that variance here: hence within each grid cell, we simply assign galaxies according to the corresponding mean GSMF.

4.2 Generating maps

To generate maps, we project grid cells along one axis of the simulation. Strictly speaking, we should project the simulation box onto a cone centred on an observer at z=0 𝑧 0 z=0 italic_z = 0; however, provided that the angular diameter of a grid cell varies by only a small amount within the depth of each map, then the parallel projection is a good approximation and avoids having to smooth over grid cell – in this paper we are interested in only a rough estimate of the clustering of sources, hence this is sufficient for our purposes. We summarise the slice properties for unit redshift intervals in Table1.

Table 1: Properties of the redshift slices that we use: first two columns, the redshift and angular diameter of grid cells at the slice edges; third/fourth columns, the thickness of the slice in cMpc and grid cells, respectively.

4.3 Results

In this section, we will present results for the number of galaxies that exceed a certain mass limit, in different survey areas and redshift slices. Very similar results are found for galaxies that exceed particular star formation rates, and these are shown in AppendixC.

Figure 11: A map of the expected number of galaxies in some projected redshift slice per projected grid cell. Top: galaxies with mass M*>10 10⁢M⊙subscript 𝑀 superscript 10 10 M⊙M_{}>10^{10},\mbox{M${}{\odot}$}italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 10 start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT between 4.5<z≤5.5 4.5 𝑧 5.5 4.5<z\leq 5.5 4.5 < italic_z ≤ 5.5. Bottom: galaxies with mass M*>10 9⁢M⊙subscript 𝑀 superscript 10 9 M⊙M{}>10^{9},\mbox{M${}_{\odot}$}italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT between 9.5<z≤10.5 9.5 𝑧 10.5 9.5<z\leq 10.5 9.5 < italic_z ≤ 10.5

Figure11 shows the expected number of galaxies per projected grid cell in two redshift slices: M*>10 10⁢M⊙subscript 𝑀 superscript 10 10 M⊙M_{}>10^{10},\mbox{M${}{\odot}$}italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 10 start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT, 4.5<z≤5.5 4.5 𝑧 5.5 4.5<z\leq 5.5 4.5 < italic_z ≤ 5.5 in the top panel; and M*>10 9⁢M⊙subscript 𝑀 superscript 10 9 M⊙M{}>10^{9},\mbox{M${}_{\odot}$}italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT, 9.5<z≤10.5 9.5 𝑧 10.5 9.5<z\leq 10.5 9.5 < italic_z ≤ 10.5 in the lower panel. These have been chosen, somewhat arbitrarily, to represent relatively abundant and sparse sources, respectively. In the upper panel, it can be seen quite clearly that there is significant clustering of the galaxies at z∼5 similar-to 𝑧 5 z\sim 5 italic_z ∼ 5; this is also true, but less obvious, in the lower panel at z∼10 similar-to 𝑧 10 z\sim 10 italic_z ∼ 10.

Figure 12: Histograms of the number of galaxies within a 256 grid cell (∼300 similar-to absent 300\sim 300,∼ 300 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT) survey region, above a particular mass and in a given redshift slice, according to the geometry of the survey: left column – M*>10 10⁢M⊙subscript 𝑀 superscript 10 10 M⊙M_{}>10^{10},\mbox{M${}{\odot}$}italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 10 start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT, 4.5<z≤5.5 4.5 𝑧 5.5 4.5<z\leq 5.5 4.5 < italic_z ≤ 5.5; right column – M*>10 9⁢M⊙subscript 𝑀 superscript 10 9 M⊙M{}>10^{9},\mbox{M${}_{\odot}$}italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT, 9.5<z≤10.5 9.5 𝑧 10.5 9.5<z\leq 10.5 9.5 < italic_z ≤ 10.5; upper row – 16 x 16; middle row – 64 x 4; lower row – 256 widely spaced grid cells. The dot-dashed, dashed and dotted lines show the median, one-sigma and two-sigma ranges, respectively; the box-plot shows the full extent of the data, plus the one and two-sigma ranges. In the top, right-hand panel a single point with N=12.3 𝑁 12.3 N=12.3 italic_N = 12.3 has been omitted, for clarity.

To show what effect this might have on the variance of galaxy numbers detected in surveys, we plot in Figure12 the galaxy counts in an area of approximately ∼300 similar-to absent 300\sim 300,∼ 300 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT, corresponding to 256 projected grid cells, for 3 different survey designs: the upper row is a square survey region of 16 x 16 grid cells, approximately (17 arcmin)2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT; the middle row is a long strip of 64 x 4 grid cells, approximately 1 deg x 4 arcmin; and the lower row is 256 separate, widely-spaced and hence uncorrelated grid cells. There are 5625 separate samples in the upper and lower rows; slightly fewer in the middle row because of the shape of the region and a desire not to sample the same grid cell twice.

From the bottom row, we can see that we have sampled a sufficient number of independent regions that the expected number of galaxies in each mock survey lies close to the mean. For the square survey regions, however, there is a large variation in the expected number of detected galaxies, by a factor of 8 (top-left) for the most abundant sources at z∼5 similar-to 𝑧 5 z\sim 5 italic_z ∼ 5, to 60 (top-right) for the rarer sources at z∼10 similar-to 𝑧 10 z\sim 10 italic_z ∼ 10; the 2-sigma ranges for these are factors of 3.3 and 5.2, respectively. The long, thin surveys shown in the middle row, reduce this variance a little but still show considerable spread about the mean value.

We compare our results from that of the Cosmic Variance Calculator 5 5 5 https://www.ph.unimelb.edu.au/mtrenti/cvc/CosmicVariance.html(Trenti & Stiavelli, 2008) in AppendixB. The latter show a similar trend but with reduced variance. The main reasons for this are likely to be the limited volume of the dark matter simulations which underlie their results, and the use of halo occupation models extrapolated to high redshift that may well not capture the extreme biases that we see for rare objects. The ratio of the standard deviation of the Flares predictions to that of the CVC is slowly varying with both the survey area, A survey subscript 𝐴 survey A_{\mathrm{survey}}italic_A start_POSTSUBSCRIPT roman_survey end_POSTSUBSCRIPT, and the expected number of galaxies in the survey, N gal subscript 𝑁 gal N_{\mathrm{gal}}italic_N start_POSTSUBSCRIPT roman_gal end_POSTSUBSCRIPT. For a square survey area it is reasonably well fit by the relation

σ σ CVC=(A survey arcmin 2)0.092⁢N gal−0.038.𝜎 subscript 𝜎 CVC superscript subscript 𝐴 survey superscript arcmin 2 0.092 superscript subscript 𝑁 gal 0.038{\sigma\over\sigma_{\mathrm{CVC}}}=\left(A_{\mathrm{survey}}\over\mathrm{% arcmin}^{2}\right)^{0.092}N_{\mathrm{gal}}^{-0.038}.divide start_ARG italic_σ end_ARG start_ARG italic_σ start_POSTSUBSCRIPT roman_CVC end_POSTSUBSCRIPT end_ARG = ( divide start_ARG italic_A start_POSTSUBSCRIPT roman_survey end_POSTSUBSCRIPT end_ARG start_ARG roman_arcmin start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT end_ARG ) start_POSTSUPERSCRIPT 0.092 end_POSTSUPERSCRIPT italic_N start_POSTSUBSCRIPT roman_gal end_POSTSUBSCRIPT start_POSTSUPERSCRIPT - 0.038 end_POSTSUPERSCRIPT .(1)

Although we have presented here results for galactic stellar mass, those for star formation rate, shown in AppendixC are similar. Moreover, we would expect the same to hold for flux-limited surveys also, as we expect a strong correlation between mass/SFR and observable fluxes. That is not to say that there won’t be some environmental dependence in that correlation. We will explore this in future work, where we generate mock surveys in different bands.

Figure 13: The mean and 2-sigma (2.3−--97.7 percentile) range for expected galaxy counts as a function of survey area and shape. Measured values are taken for survey areas corresponding to 64, 256, 1024 and 4096 pixels and results interpolated between these points. We report results for stellar mass: the upper/lower panels are for a large/low number count; similar results are found for star formation rates.

Figure13 shows the mean and 2-sigma spread for the number counts as a function of the survey area, for both high and low number counts. The variances are reduced as the survey area is increased. We show only our two mass selections here; results for the star formation selection are very similar. Measured values are taken for survey areas corresponding to 64, 256, 1024 and 4096 pixels and results interpolated between these points; in AppendixD we show histograms of number counts for the largest survey area of 4096 pixels, or approximately 1.4 deg 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT.

4.4 Application to existing and proposed surveys

The implications for the interpretation of galaxy surveys at these redshifts are clear: in any survey of limited spatial extent, the variance in the number of detected galaxies is likely to be large, and one should take that into account when making any measurement of the number density of sources.

Figure 8 of Flares-I showed galactic stellar mass functions at z=5 𝑧 5 z=5 italic_z = 5 from a range of observations (González et al., 2011; Duncan et al., 2014; Song et al., 2016; Stefanon et al., 2017), varying in survey area from 50 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT to 1 deg 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT. These show a variation in normalisation of a factor of 3 at low masses to 10 at high masses. Now, while some of this difference will be due to the different observational bands and analysis, a significant fraction may be due to sampling variation across different survey areas.

Cycle 1 of JWST has a number of large area survey programs. One of the many aims of these surveys is to investigate galaxies in the EoR. JADES (Rieke, 2020) imaging will cover 2 fields, each roughly square and 100 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT in area. The predicted galaxy numbers per unit redshift interval (Williams et al., 2018) vary from many thousand at z=5 𝑧 5 z=5 italic_z = 5 to a few hundred at z=10 𝑧 10 z=10 italic_z = 10: the survey variance will therefore roughly correspond to that in Figure12, i.e.a 2-sigma range of about a factor of just over 3. Robertson et al. (2022) report first results for galaxies at z>10 𝑧 10 z>10 italic_z > 10 in JADES (spectroscopcally confirmed by Curtis-Lake et al., 2022), finding 4 galaxies in a survey area of 65 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT, for which the 2-sigma survey variance is about a factor of 6.

CEERS (Bagley et al., 2022) is undertaking imaging and spectroscopy of the EGS HST legacy field, in an area of approximately 100 (20 x 5) arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT, and PRIMER (Dunlop et al., 2021) is providing imaging of the CANDELS/COSMOS and CANDELS/UDS fields, each of order 100 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT. They too will suffer survey variance similar to that shown in Figure12. Recently CEERS published preliminary results in a survey area of 35.5 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT(Finkelstein et al., 2023). They split their sample of high-redshift galaxies into 15, 9 & 2 objects at redshifts of 8.5–10, 10–12 &z>12 𝑧 12 z>12 italic_z > 12, respectively. Our results suggest that the 2-sigma range in number counts caused by survey variance in the two lower redshift bins is likely to extend over a factor of around 10.

On a larger scale, the COSMOS-Web imaging survey (Casey et al., 2022) will have an area of 0.54 deg 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT with an estimated galaxy count of several thousand per unit redshift interval at z=6 𝑧 6 z=6 italic_z = 6 and 30−--70 at z=10 𝑧 10 z=10 italic_z = 10. The survey area is approximately square in shape and so the 2-sigma range for galaxy counts will be around a factor of 2.5, as seen in Figure13.

Castellano et al. (2022) report a detection of 7 galaxies at z≈10 𝑧 10 z\approx 10 italic_z ≈ 10 in a survey area of 37 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT from JWST/NIRCam imaging data, 3-10 times higher than previously reported results. They suggest that survey variance may be contributing and call for more pencil-beam surveys to confirm their results. For the same mean number count in a similar field of view, we find a 2-sigma percentile range of between 2 and 15, thus reinforcing that need.

Harikane et al. (2023) summarises the high-redshift ,z≳9 greater-than-or-equivalent-to 𝑧 9 z\gtrsim 9 italic_z ≳ 9, photometric observations of galaxy candidates to-date. At z∼9 similar-to 𝑧 9 z\sim 9 italic_z ∼ 9 the majority, 8/12, of their sample comes from CEERS with an effective survey area of just over 200 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT, which corresponds in our results to a 2-sigma spread of a factor of approximately 4. Other reported results come from surveys that currently have much smaller areas and which will therefore have even greater survey variances.

Adams et al. (2023a) determine the UV luminosity function to date from PEARLS and other public surveys, predominantly CEERS. The total survey area is about 110 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT split over several fields, with the majority, 64 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT coming from CEERS. They find slightly lower luminosity functions at high redshift than Finkelstein et al. (2023), which they attribute to a greater area and survey depth.

McLeod et al. (2023) combine results from 12 different JWST surveys with a total effective area of 260 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT. This large area plus large number of pointings is likely to result in the lowest survey variance of any of the high-redshift JWST studies to date. They find similar results at z>9 𝑧 9 z>9 italic_z > 9 to previous studies, but with reduced error bars, and mean star formation rates that are consistent with Flares (but higher than other simulations that have restricted volumes that do not sample regions of the highest overdensity).

Finally we note that, as stressed by Adams et al. (2023b) and many others, the photometric redshifts used in many preliminary studies are highly uncertain and currently provide the biggest uncertainty in number count estimates at the upper end of the EoR, z≳9 greater-than-or-equivalent-to 𝑧 9 z\gtrsim 9 italic_z ≳ 9. However, as spectroscopic redshifts become available, survey variance is likely to dominate.

4.5 Future work: more realistic mocks

There are a number of enhancements that we intend to make to this study in order to make more accurate predictions of survey variance:

∙∙\bullet∙ The use of the mean density field in dark matter grid cells to predict star and galaxy formation rates is fairly crude and leaves a lot of residual scatter, as seen in Figure4, that we have struggled to reduce. Future work will use a higher resolution background dark matter simulation that will resolve halos and allow a better mapping from dark matter to galaxies.
∙∙\bullet∙ By resolving halos, we will be able to project onto light-cones centred on an observer without the need for smoothing, rather than projecting parallel to the simulation grid.
∙∙\bullet∙ We will use galaxies from the high-resolution, hydrodynamic simulations to make mock images of the sky in various bands, utilising the known star formation and metal enrichment histories, and applying realistic dust absorption.
∙∙\bullet∙ We will then make mock observations of those images, reproducing the selection criteria of the different surveys.

This is a substantial undertaking that will take some time to come to fruition, which is why we have given in this paper crude estimates of the magnitude of the survey variance that we expect to see that will still be of significant value as a qualitative estimate of the effect of survey variance for a given survey geometry.

5 Conclusions

In this paper we investigate the variation of star formation and galaxy properties with environment in the Flares simulations of galaxy formation in the early Universe. Those simulations are designed to sample the full range of overdensities, averaged on a scale of 14 h−1 superscript ℎ 1 h^{-1},italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc, within a (3.2 Gpc)3 3{}^{3}start_FLOATSUPERSCRIPT 3 end_FLOATSUPERSCRIPT box. For the most part we look at properties averaged within cubical grid cells of edge 2.67 cMpc as a function of the overdensity averaged within a sphere of radius 14 h−1 superscript ℎ 1 h^{-1}italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc, δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT. We reach the following conclusions:

∙∙\bullet∙ The ratio of stellar density, ρ star subscript 𝜌 star\rho_{\mathrm{star}}italic_ρ start_POSTSUBSCRIPT roman_star end_POSTSUBSCRIPT, to dark matter, ρ DM subscript 𝜌 DM\rho_{\mathrm{DM}}italic_ρ start_POSTSUBSCRIPT roman_DM end_POSTSUBSCRIPT, density within each grid cell is highly biased, varying by a factor of 100 between the lowest and highest values of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT (Figure4).
∙∙\bullet∙ Moreover, even at a fixed value of δ 14 subscript 𝛿 14\delta_{14}italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT, the scatter in ρ star subscript 𝜌 star\rho_{\mathrm{star}}italic_ρ start_POSTSUBSCRIPT roman_star end_POSTSUBSCRIPT/ρ DM subscript 𝜌 DM\rho_{\mathrm{DM}}italic_ρ start_POSTSUBSCRIPT roman_DM end_POSTSUBSCRIPT is enormous and the distribution is highly skewed such that the mean is 10 times the median.
∙∙\bullet∙ This bias remains constant across all redshifts between z=5 𝑧 5 z=5 italic_z = 5 and z=10 𝑧 10 z=10 italic_z = 10 (Figure5).
∙∙\bullet∙ For δ 14<0.9 subscript 𝛿 14 0.9\delta_{14}<0.9 italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT < 0.9 more than half the grid cells have no stars whatsoever within them; whereas in the highest overdensity cell roughly 10 per cent of the baryons have been turned into stars (Figure6)
∙∙\bullet∙ The bias seen in the stellar distribution is replicated in star-forming gas, metals and black holes, and in the star formation and black hole accretion rates; that in non-star-forming gas is, however, much lower and similar to that of the dark matter (Figures7 and 8)
∙∙\bullet∙ The mean galactic stellar mass function (GSMF) follows a similar form to that for the grid cells of mean matter density (δ 14≈0 subscript 𝛿 14 0\delta_{14}\approx 0 italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT ≈ 0) in the mass range where they overlap, but has a slightly higher normalisation due to the strong bias towards extra star formation in overdense regions (Figure9). Only the highest overdensity regions contribute to the high-mass end of the GSMF and these are very rare, which gives rise the the exponential decline in the GSMF.
∙∙\bullet∙ Because the highest mass galaxies are only found in the most overdense regions, we note that resimulation of such regions within large volumes, such as undertaken in Flares, is the only way to capture them in simulations.
∙∙\bullet∙ The star formation rate function (SFRF) shows a similar behaviour to the GSMF, with the largest star formation rates being dominated by galaxies in the highest overdensity bins (Figure10).
∙∙\bullet∙ Maps of unit redshift slices show significant clustering of galaxies at all redshifts and at both high and low number densities (Figure11).
∙∙\bullet∙ Figure12 illustrates the effect of clustering by looking at the variation in number counts in a region consisting of 256 grid cells (approximately ∼300 similar-to absent 300\sim 300,∼ 300 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT) in different configurations. If the cells are widely-spaced then the variance is small, as would be expected. However, for a square survey area of 16x16 grid cells, then the 2-sigma variation in number counts is more like a factor of 4 (slightly less for high number counts and higher for low number counts).
∙∙\bullet∙ Using rectangular survey volumes improves the survey variance slightly, but to reduce it significantly requires multiple survey areas over wide separations.
∙∙\bullet∙ We compare our results with those from the Cosmic Variance Calculator of Trenti & Stiavelli (2008) and find larger standard deviations by a factor of 1.3–1.8 (Figure16). We provide a scaling formula to convert between the two (Equation1).
∙∙\bullet∙ Very similar results hold for maps of galaxies exceeding a particular star formation rate (Figures17 and 18).
∙∙\bullet∙ For larger survey areas, the variance is reduced, dropping to a factor of about 2 for an area of 1.4 deg 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT (Figures19 and 20).

Although we have presented results for physical rather than observable properties of galaxies, we would expect similar results to hold for flux-limited surveys also, as we expect a strong correlation between stellar mass / star formation rate and observable fluxes. We will explore this in future work, where we generate mock surveys in different bands.

The implications for the interpretation of galaxy surveys at these redshifts are clear: in any flux-limited survey of limited spatial extent, the variance in the number of detected galaxies is likely to be large. It should not be surprising to find number densities from different survey areas that differ by a factor of 2–4. Multiple widely-spaced regions will need to be combined to beat down sample variance. Number densities obtained from (a large number of) background regions in targeted observations of unrelated, compact, low-redshift sources would be one way to do that. Our conclusions thus reinforce those of previous studies such as Trenti & Stiavelli (2008) and Moster et al. (2011).

Acknowledgements

We thank the Eagle team for their efforts in developing the Eagle simulation code. We also wish to acknowledge the following open source software packages used in the analysis: Scipy(Virtanen et al., 2020), Astropy(Robitaille et al., 2013), and Matplotlib(Hunter, 2007).

This work used the DiRAC@Durham facility managed by the Institute for Computational Cosmology on behalf of the STFC DiRAC HPC Facility (www.dirac.ac.uk). The equipment was funded by BEIS capital funding via STFC capital grants ST/K00042X/1, ST/P002293/1, ST/R002371/1 and ST/S002502/1, Durham University and STFC operations grant ST/R000832/1. DiRAC is part of the National e-Infrastructure. The Eagle simulations were performed using the DiRAC-2 facility at Durham, managed by the ICC, and the PRACE facility Curie based in France at TGCC, CEA, Bruyeres-le-Chatel.

CCL acknowledges support from a Dennis Sciama fellowship funded by the University of Portsmouth for the Institute of Cosmology and Gravitation. DI acknowledges support by the European Research Council via ERC Consolidator Grant KETJU (no. 818930) and the CSC – IT Center for Science, Finland. APV acknowledges support from the Carlsberg Foundation (grant no CF20-0534). The Cosmic Dawn Center (DAWN) is funded by the Danish National Research Foundation under grant No. 140.

We list here the roles and contributions of the authors according to the Contributor Roles Taxonomy (CRediT)6 6 6https://credit.niso.org/. Peter Thomas: Conceptualization, Data curation, Methodology, Investigation, Formal Analysis, Visualization, Writing - original draft. Christopher C. Lovell, Aswin P. Vijayan: Data curation, Writing - review & editing. Maxwell Maltz: Methodology, Writing - review & editing. Stephen M. Wilkins: Conceptualization, Writing - review & editing. Dimitrios Irodotou, Louise Seeyave, Will Roper: Writing - review & editing.

We would like to thank the anonymous referee whose comments significantly improved the paper.

Data Availability

A portion of the data used to produce this work can be found online: flaresimulations.github.io/#data. Much of the analysis used the raw data produced by the simulation which can be made available upon request.

References

Adams et al. (2023a) Adams N.J., et al., 2023a, arXiv e-prints, p. arXiv:2304.13721
Adams et al. (2023b) Adams N.J., et al., 2023b, MNRAS, 518, 4755
Angulo et al. (2012) Angulo R.E., Springel V., White S.D.M., Jenkins A., Baugh C.M., Frenk C.S., 2012, MNRAS, 426, 2046
Bagley et al. (2022) Bagley M.B., et al., 2022, arXiv:2211.02495
Bahé et al. (2017) Bahé Y.M., et al., 2017, MNRAS, 470, 4186
Bardeen et al. (1986) Bardeen J.M., Bond J.R., Kaiser N., Szalay A.S., 1986, ApJ, 304, 15
Barnes et al. (2017) Barnes D.J., et al., 2017, MNRAS, 471, 1088
Bhatawdekar et al. (2019) Bhatawdekar R., Conselice C.J., Margalef-Bentabol B., Duncan K., 2019, MNRAS, 486, 3805
Casey et al. (2022) Casey C.M., et al., 2022, arXiv:2211.07865
Castellano et al. (2022) Castellano M., et al., 2022, arXiv e-prints, p. arXiv:2212.06666
Crain et al. (2009) Crain R.A., et al., 2009, MNRAS, 399, 1773
Curtis-Lake et al. (2022) Curtis-Lake E., et al., 2022, arXiv:2212.04568
Davé et al. (2019) Davé R., Anglés-Alcázar D., Narayanan D., Li Q., Rafieferantsoa M.H., Appleby S., 2019, MNRAS, 486, 2827
Desjacques et al. (2018) Desjacques V., Jeong D., Schmidt F., 2018, Phys.Rep., 733, 1
Donnan et al. (2022) Donnan C.T., et al., 2022, arXiv:2207.12356
Duncan et al. (2014) Duncan K., et al., 2014, MNRAS, 444, 2960
Dunlop et al. (2021) Dunlop J.S., et al., 2021, PRIMER: Public Release IMaging for Extragalactic Research, JWST Proposal. Cycle 1, ID. #1837
Einasto et al. (2023) Einasto J., Liivamägi L.J., Einasto M., 2023, MNRAS, 518, 2164
Feng et al. (2015) Feng Y., Di-Matteo T., Croft R.A., Bird S., Battaglia N., Wilkins S., 2015, MNRAS, 455, 2778
Finkelstein et al. (2023) Finkelstein S.L., et al., 2023, ApJ, 946, L13
Genel et al. (2014) Genel S., et al., 2014, MNRAS, 445, 175
González et al. (2011) González V., Labbé I., Bouwens R.J., Illingworth G., Franx M., Kriek M., 2011, ApJ, 735, L34
Harikane et al. (2022) Harikane Y., et al., 2022, arXiv:2208.01612
Harikane et al. (2023) Harikane Y., Nakajima K., Ouchi M., Umeda H., Isobe Y., Ono Y., Xu Y., Zhang Y., 2023, arXiv e-prints, p. arXiv:2304.06658
Hunter (2007) Hunter J.D., 2007, Computing in Science & Engineering, 9, 90
Kaiser (1984) Kaiser N., 1984, ApJ, 284, L9
Katz & White (1993) Katz N., White S. D.M., 1993, ApJ, 412, 455
Kim et al. (2009) Kim J., Park C., Gott J.Richard I., Dubinski J., 2009, ApJ, 701, 1547
Kitzbichler & White (2007) Kitzbichler M.G., White S. D.M., 2007, MNRAS, 376, 2
Knebe et al. (2018) Knebe A., et al., 2018, MNRAS, 475, 2936
Labbe et al. (2022) Labbe I., et al., 2022, arXiv:2207.12446
Lovell et al. (2021) Lovell C.C., Vijayan A.P., Thomas P.A., Wilkins S.M., Barnes D.J., Irodotou D., Roper W., 2021, MNRAS, 500, 2127
Maksimova et al. (2021) Maksimova N.A., Garrison L.H., Eisenstein D.J., Hadzhiyska B., Bose S., Satterthwaite T.P., 2021, MNRAS, 508, 4017
McLeod et al. (2023) McLeod D.J., et al., 2023, arXiv e-prints, p. arXiv:2304.14469
Moster et al. (2011) Moster B.P., Somerville R.S., Newman J.A., Rix H.-W., 2011, ApJ, 731, 113
Naidu et al. (2022) Naidu R.P., et al., 2022, arXiv:2207.09434
Nelson et al. (2017) Nelson D., et al., 2017, MNRAS, 475, 624
Newman & Davis (2002) Newman J.A., Davis M., 2002, ApJ, 564, 567
Pillepich et al. (2017) Pillepich A., et al., 2017, MNRAS, 475, 648
Rieke (2020) Rieke M., 2020, in da Cunha E., Hodge J., Afonso J., Pentericci L., Sobral D., eds, Proceedings of the International Astronomical Union Vol. 352, Uncovering Early Galaxy Evolution in the ALMA and JWST Era. pp 337–341, doi:10.1017/S1743921319008950
Robertson et al. (2022) Robertson B.E., et al., 2022, arXiv:2212.04480
Robitaille et al. (2013) Robitaille T.P., et al., 2013, A&A, 558, A33
Rodighiero et al. (2022) Rodighiero G., Bisigello L., Iani E., Marasco A., Grazian A., Sinigaglia F., Cassata P., Gruppioni C., 2022, MNRAS, 518, L19
Roper et al. (2022) Roper W.J., Lovell C.C., Vijayan A.P., Marshall M.A., Irodotou D., Kuusisto J.K., Thomas P.A., Wilkins S.M., 2022, MNRAS, 514, 1921
Schaller et al. (2015) Schaller M., Dalla Vecchia C., Schaye J., Bower R.G., Theuns T., Crain R.A., Furlong M., McCarthy I.G., 2015, MNRAS, 454, 2277
Schaye et al. (2015) Schaye J., et al., 2015, MNRAS, 446, 521
Somerville et al. (2004) Somerville R.S., Lee K., Ferguson H.C., Gardner J.P., Moustakas L.A., Giavalisco M., 2004, ApJ, 600, L171
Song et al. (2016) Song M., et al., 2016, ApJ, 825, 5
Springel et al. (2005) Springel V., et al., 2005, Nature, 435, 629
Stark et al. (2007) Stark D.P., Loeb A., Ellis R.S., 2007, ApJ, 668, 627
Stefanon et al. (2017) Stefanon M., Bouwens R.J., Labbé I., Muzzin A., Marchesini D., Oesch P., Gonzalez V., 2017, ApJ, 843, 36
Trapp & Furlanetto (2020) Trapp A.C., Furlanetto S.R., 2020, MNRAS, 499, 2401
Trapp et al. (2022) Trapp A.C., Furlanetto S.R., Yang J., 2022, MNRAS, 510, 4844
Trenti & Stiavelli (2008) Trenti M., Stiavelli M., 2008, The Astrophysical Journal, 676, 767
Treu et al. (2022) Treu T., et al., 2022, ApJ, 935, 110
Vijayan et al. (2021) Vijayan A.P., Lovell C.C., Wilkins S.M., Thomas P.A., Barnes D.J., Irodotou D., Kuusisto J., Roper W.J., 2021, MNRAS, 501, 3289
Virtanen et al. (2020) Virtanen P., et al., 2020, Nature Methods, 17, 261
Williams et al. (2018) Williams C.C., et al., 2018, ApJS, 236, 33
Zeldovich et al. (1982) Zeldovich Y.B., Einasto J., Shandarin S.F., 1982, Nature, 300, 407

Appendix A Variance in mean number count predictions

In this section we investigate whether, for rare galaxies, the limited number of grid cells which are populated can lead to excess variance in number count predictions, over and above that expected from Poisson variation.

Figure4 shows that we have a fair sample of star formation density within grid cells at all overdensities. However, the matter under question here is whether the same is true of the number counts of galaxies. To test this we use bootstrap resampling to estimate the variance in our results that might arise from the rarity of massive galaxies. We perform that test on a sample with M>10 9⁢M⊙𝑀 superscript 10 9 M⊙M>10^{9}\mbox{M${}_{\odot}$}italic_M > 10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT at z=10 𝑧 10 z=10 italic_z = 10 which has a mean number count of just 1 in a 300 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT area.

Figure 14: The number of galaxies of galaxies in each grid cell versus the overdensity of that grid cell, for M>10 9⁢M⊙𝑀 superscript 10 9 M⊙M>10^{9}\mbox{M${}_{\odot}$}italic_M > 10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT at z=10 𝑧 10 z=10 italic_z = 10. The number of galaxies is integral but has been given a small dispersion so that individual points are clearly visible on the plot.

Figure14 shows the number of galaxies within a single grid cell versus the overdensity of the cell in the DM-only run. The number of galaxies is integral but has been given a small dispersion so that they are clearly visible on the plot; likewise the size of the symbols has been reduced to a single pixel for those grid cells with 0 galaxies. It can be seen that we have chosen an example where almost all grid cells have zero galaxies and so the number counts are dominated by a few grid cells: there are 51, 5 & 2 cells out of 56,960 that have 1, 2 & 3 galaxies, respectively.

Figure 15: The histograms show the probability distribution functions of the expected number of counts per grid cell within different density bins. The solid lines and points show the Poisson distributions with the same mean number of galaxies.

We estimate the sample variance that this would introduce by performing a bootstrap analysis within each density bin: we draw 10,000 samples of equal size to the original one, but including replacement, with the distribution of galaxy counts shown in Figure15. The mean within each density bin is unchanged and matches the value used in the paper: it is zero for low density bins and 4, 41 and 21, respectively, for the three highest density bins.

The solid lines and data points show the equivalent Poisson distributions. As can be seen, the data for the density bins 1.3<1+δ 14<1.5 1.3 1 subscript 𝛿 14 1.5 1.3<1+\delta_{14}<1.5 1.3 < 1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT < 1.5 and 1+δ 14>1.7 1 subscript 𝛿 14 1.7 1+\delta_{14}>1.7 1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT > 1.7 almost exactly match that of the Poisson distribution. That for the bin 1.5<1+δ 14<1.7 1.5 1 subscript 𝛿 14 1.7 1.5<1+\delta_{14}<1.7 1.5 < 1 + italic_δ start_POSTSUBSCRIPT 14 end_POSTSUBSCRIPT < 1.7 is a little wider: this presumably reflects the influence of the two grid cells with 3 galaxies within them. However, even there the difference is not overly large with a measured standard deviation of 7.8 galaxies (summed over all grid cells in the density bin) compared to the Poisson value of 6.4.

When we move to much larger galaxy numbers, then the measured and Poisson predictions become indistinguishable.

We conclude that the finite sampling is not significantly affecting our results.

Appendix B Comparison to cosmic variance calculator

This section compares our results to that of the Cosmic Variance Calculator (CVC) of Trenti & Stiavelli (2008). That uses analytic estimates via the two-point correlation function in extended Press-Schechter theory, as well as synthetic catalogs extracted from N-body cosmological simulations of structure formation. However the largest box is just 160 h−1 superscript ℎ 1 h^{-1},italic_h start_POSTSUPERSCRIPT - 1 end_POSTSUPERSCRIPT cMpc on a side and thus lacking the volume required to sample extreme density fluctuations, especially at high redshift. Moreover they use a simple Halo Occupation Distribution model to populate the halos with galaxies, rather than galaxy formation simulations as are used in Flares. We might therefore expect a larger variance in this current work than in the CVC.

Figure 16: The distribution of predicted galaxy number counts in different square survey areas for this study (Flares) and the cosmic variance calculator (CVC). The left/right columns show the 2/1-sigma dispersions; the rows give examples of galaxy selections with differing mean number counts and redshifts.

Figure16 contrasts the variance of the predictions from Flares with that from the CVC for square survey area. At all redshifts (greater than 5) and number counts, Flares has a higher variance. A similar plot for the rectangular (16x1 aspect ratio) survey areas, not shown here, is similar but with slightly reduced spread. The ratio of the standard deviation of the Flares predictions to that of the CVC varies from about 1.3 (large number counts) to 1.8 (low number counts).

The CVC just returns a value for the standard deviation of the distribution, although we note that Figure 1 of Trenti & Stiavelli (2008) shows that the distribution is skewed. In Flares this skewness is apparent both in the number count histograms of Figure127 7 7 and of Figures18, 19&20 below and also in Figure16, especially in the bottom row. We note that this skewness is less apparent in the middle row of the latter figure, which suggests that it the low number counts rather than high redshift which is the main driver.

Appendix C Survey variance in SFR

Here we repeat the results of Section4.3 but for galaxies above a particular star formation rate rather than stellar mass. The results presented in Figure17 and Figure18 are qualitatively very similar to those seen in Figure11 and Figure12, respectively.

Figure 17: The number of galaxies of star formation rate SFR >100⁢M⊙absent 100 M⊙>100,\mbox{M${}{\odot}$},> 100 M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT yr−1 1{}^{-1}start_FLOATSUPERSCRIPT - 1 end_FLOATSUPERSCRIPT in the redshift slice 4.5<z≤5.5 4.5 𝑧 5.5 4.5<z\leq 5.5 4.5 < italic_z ≤ 5.5 (upper panel), or SFR >10⁢M⊙absent 10 M⊙>10,\mbox{M${}{\odot}$},> 10 M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT yr−1 1{}^{-1}start_FLOATSUPERSCRIPT - 1 end_FLOATSUPERSCRIPT in the redshift slice 9.5<z≤10.5 9.5 𝑧 10.5 9.5<z\leq 10.5 9.5 < italic_z ≤ 10.5 (lower panel) per projected grid cell.

Figure 18: Histograms of the number of galaxies within a 256 grid cell (∼300 similar-to absent 300\sim 300,∼ 300 arcmin 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT) survey region, above a particular star formation rate and in a given redshift slice, according to the geometry of the survey: left column – M*>100⁢M⊙subscript 𝑀 100 M⊙M_{}>100,\mbox{M${}{\odot}$},italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 100 M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT yr−1 1{}^{-1}start_FLOATSUPERSCRIPT - 1 end_FLOATSUPERSCRIPT, 4.5<z≤5.5 4.5 𝑧 5.5 4.5<z\leq 5.5 4.5 < italic_z ≤ 5.5; right column – M*>10⁢M⊙subscript 𝑀 10 M⊙M{}>10,\mbox{M${}_{\odot}$},italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 10 M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT yr−1 1{}^{-1}start_FLOATSUPERSCRIPT - 1 end_FLOATSUPERSCRIPT, 9.5<z≤10.5 9.5 𝑧 10.5 9.5<z\leq 10.5 9.5 < italic_z ≤ 10.5; upper row – 16 x 16; middle row – 256 x 1; lower row – 256 widely spaced grid cells. The dot-dashed, dashed and dotted lines show the median, one-sigma and two-sigma ranges, respectively; the box-plot shows the full extent of the data, plus the one and two-sigma ranges. In the top, right-hand panel a single point with N=19.2 𝑁 19.2 N=19.2 italic_N = 19.2 has been omitted, for clarity.

Appendix D Variance for larger area surveys

Figure 19: Histograms of the number of galaxies within a 4096 grid cell (∼1.4 similar-to absent 1.4\sim 1.4,∼ 1.4 deg 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT) survey region, above a particular mass and in a given redshift slice, according to the geometry of the survey: left column – M*>10 10⁢M⊙subscript 𝑀 superscript 10 10 M⊙M_{}>10^{10},\mbox{M${}{\odot}$}italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 10 start_POSTSUPERSCRIPT 10 end_POSTSUPERSCRIPT M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT, 4.5<z≤5.5 4.5 𝑧 5.5 4.5<z\leq 5.5 4.5 < italic_z ≤ 5.5; right column – M*>10 9⁢M⊙subscript 𝑀 superscript 10 9 M⊙M{}>10^{9},\mbox{M${}_{\odot}$}italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 10 start_POSTSUPERSCRIPT 9 end_POSTSUPERSCRIPT M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT, 9.5<z≤10.5 9.5 𝑧 10.5 9.5<z\leq 10.5 9.5 < italic_z ≤ 10.5; upper row – 64 x 64; middle row – 1024 x 4; lower row – 4096 widely spaced grid cells. The dot-dashed, dashed and dotted lines show the median, one-sigma and two-sigma ranges, respectively; the box-plot shows the full extent of the data, plus the one and two-sigma ranges.

Figure 20: Histograms of the number of galaxies within a 4096 grid cell (∼1.4 similar-to absent 1.4\sim 1.4,∼ 1.4 deg 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT) survey region, above a particular star formation rate and in a given redshift slice, according to the geometry of the survey: left column – M*>100⁢M⊙subscript 𝑀 100 M⊙M_{}>100,\mbox{M${}{\odot}$},italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 100 M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT yr−1 1{}^{-1}start_FLOATSUPERSCRIPT - 1 end_FLOATSUPERSCRIPT, 4.5<z≤5.5 4.5 𝑧 5.5 4.5<z\leq 5.5 4.5 < italic_z ≤ 5.5; right column – M*>10⁢M⊙subscript 𝑀 10 M⊙M{}>10,\mbox{M${}_{\odot}$},italic_M start_POSTSUBSCRIPT * end_POSTSUBSCRIPT > 10 M start_FLOATSUBSCRIPT ⊙ end_FLOATSUBSCRIPT yr−1 1{}^{-1}start_FLOATSUPERSCRIPT - 1 end_FLOATSUPERSCRIPT, 9.5<z≤10.5 9.5 𝑧 10.5 9.5<z\leq 10.5 9.5 < italic_z ≤ 10.5; upper row – 64 x 64; middle row – 1024 x 4; lower row – 4096 widely spaced grid cells. The dot-dashed, dashed and dotted lines show the median, one-sigma and two-sigma ranges, respectively; the box-plot shows the full extent of the data, plus the one and two-sigma ranges.

Figures19 and 20 show histograms of the expected number of galaxies exceeding a particular mass or star formation threshold, respecitvely, for a survey region consisting of 4096 pixels, approximately 1.4 deg 2 2{}^{2}start_FLOATSUPERSCRIPT 2 end_FLOATSUPERSCRIPT.

Xet Storage Details

Size:: 102 kB
Xet hash:: e17baba4003f5622f08e30bcd9650b71095833e35395e93bba2df85a05f5dcd0

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.