Research

This page is not regularly updated; see here for an up-to-date list of first-author papers.

Cosmology from the Galaxy Power Spectrum and Bispectrum: Estimators and Results

To date, the majority of survey analyses are based around the galaxy power spectrum in redshift space, thus its precise measurement is of great importance. Most current power spectrum estimators are based on the “FKP” form, which is an approximation to the true maximum-likelihood estimator, only truly valid on small scales. In addition, it is biased, measuring only the power spectrum convolved with the survey window function. In Philcox 2020b, we consider instead estimating the power spectrum using the quadratic estimators first popularized in the late 1990s. Whilst these require somewhat increased computational resources compared to the FKP scheme, we show they can be robustly implemented, and give accurate measurements of the galaxy power spectrum, which are unbiased by the survey window function and free from the leading-order effects of shot-noise and discretization. We give both the maximum-likelihood (ML) solution (assuming a Gaussian Universe) as well as a simpler FKP-based alternative, and show that, whilst the ML approach offers little increase in constraining power for a BOSS-like survey, this may change for denser surveys with anisotropic windows, or analyses focussed on parameters affecting the power spectrum only on large scales.

Using a similar prescription, we can construct maximum-likelihood estimators for the galaxy bispectrum, as described in Philcox 20201. These are cubic in the data, and again do not include the survey window function. Again, they can be implemented using ML weights or a simple-to-use FKP approximation, and computed with reasonable resources. In particular, the computation-time per simulation is similar to the standard (windowed) estimators, with around an additional 100 CPU-hours required to construct a window-deconvolution matrix. The output bispectra can be straightforwardly compared to data; given the computational difficulties of windowing a bispectrum theory model, this is of considerable use for data analyses.

The above estimators have been used to measure the power spectrum and bispectrum of true survey data, in particular that of BOSS DR12 in Philcox & Ivanov 2022. This analysis has two novel features: (1) spectra are measured from scratch using the window-free estimators, allowing simple comparison to theory, (2) the bispectrum is consistently included for the first time, without scale cuts. Using a precise theoretical model developed in Ivanov et al. 2021, the data are used to constrain parameters of the LCDM model; in particular, the Hubble constant, the matter density, the spectral slope, and the fluctuation amplitude. Adding in the bispectra improves the latter parameter by 13% (equivalent to an extra year of DESI observations), as well as giving strong constraints on the galaxy bias parameters. These data (available on GitHub) also allow tests of non-LCDM physics, such as primordial non-Gaussianity, as discussed in Cabass et al. 2022.

Measuring the Four-Point Correlation Function of BOSS Galaxies

Gravitational evolution induces non-Gaussianity in the observed distribution of galaxies. One outcome of this is that higher-order correlation functions, including the three- and four-point correlation functions, become large and measurable. In Philcox et al. 2021c, we perform the first measurement of the four-point correlation function (4PCF) of the BOSS CMASS galaxy sample, containing almost a million galaxies. This is made possible via the efficient encore algorithm (Philcox et al. 2021a), with a new modification to subtract off the Gaussian contribution at the estimator level. In total, the 4PCF can be computed in a few tens of CPU hours. To quantify the significance of a 4PCF detection, we use a compressed Gaussian analysis. This involves a linear projection (greatly reducing the data dimensionality, without introducing bias), and a mock-based chi-squared test. An important ingredient is the theoretical covariance matrix developed in Hou et al. 2021; whilst this cannot be used directly to form the likelihood (since it does not fully account for the survey geometry), it facilitates efficient data projection. In full, we find a detection significance of around 8 sigma for the non-Gaussian 4PCF induced by gravity, or 5 sigma, if a more modest scale cut is applied.

Efficient Estimation of N-Point Correlation Functions

N-point correlation functions play a major role in the analysis of random fields. A particularly interesting example is the cosmological density fluctuation field; if the Universe obeys Gaussian statistics, only the 2PCF contains cosmological information; at late-times, this is far from true. To obtain the strongest constraints on cosmological parameters, we must combine 2PCF measurements with the higher-point statistics. This comes with a multitude of challenges, particularly with regards to NPCF computation, since a naïve estimator scales as the number of galaxies to the power N, and is thus infeasible to use for N>3. In Philcox et al. 2021a, we present a new algorithm capable of computing the isotropic NPCFs with quadratic complexity. This works by first decomposing the NPCF into a separable basis of isotropic functions built from spherical harmonics, and is an extension of the 3PCF approach considered in Slepian & Eisenstein 2015. Our algorithm is implemented in the encore code, and allows computation of the 4PCFs (5PCFs) of realistic surveys in a few tens (hundreds) of hours, including corrections for the non-uniform survey geometry. Practically, the runtime is usually found to scale linearly with the number of galaxies, if N>3. Our approach can be straightforwardly applied to current and future datasets to unlock the potential of constraining cosmology from the higher-point functions.

The above approach can be extended far beyond cosmology. In Philcox & Slepian 2021b, we show that NPCFs can be computed with quadratic complexity, regardless of the number of spatial dimensions or the (constant) spatial curvature. Furthermore, this extends to anisotropic NPCFs, as well as those possessing any partial rotational symmetries. To define the NPCF estimator in arbitrary numbers of dimensions, we first form an angular basis for the space using hyperspherical harmonics, which are combined together using techniques borrowed from quantum chemistry. The resulting estimator is separable (just as for the 3D Euclidean estimator needed for the galaxy NPCFs), and may be efficiently computed. This will facilitate measurement of the higher-order NPCFs in a range of fields, for example biophysics and fluid dynamics. The above algorithms are implemented in the NPCFs.jl Julia code.

Solving Transcendental Equations with Contour Integration

Transcendental equations are ubiquitous in the physical sciences. Two classic examples are the spherical collapse of a homogeneous sphere of gas under gravity and the evolution of elliptical orbits, via Kepler’s law. In both cases, the underlying equations cannot be directly solved via conventional methods, thus they are often implemented numerically, via root-finding. In Slepian & Philcox 2021 and Philcox et al. 2021b, we show that exact solutions for both problems can be obtained as ratios of contour integrals, using mathematics developed to solve the ‘geometric goat problem’ in Ullisch 2020. Each can be practically evaluated using FFTs or numerical integration, and the solution is significantly faster to evaluate than if conventional methods are used. The approach can be applied to a variety of other transcendental equations, provided certain conditions are met.

Improving the Line-of-Sight for Galaxy Two-Point Correlators

Under the assumptions of homogeneity and isotropy, the galaxy two-point correlator (either the two-point correlation function (2PCF) or power spectrum) depends on only one parameter: the galaxy pair separation. Redshift space distortions introduce anisotropy, requiring a second degree of freedom: the angle between the galaxy pair and the line-of-sight (LoS). Various options exist to define the LoS, the most popular being the approximation of Yamamoto et al. 2006, whereupon on fixes the LoS to the direction vector of one of the galaxies in the pair. Whilst this results in a simple-to-implement estimator, it incurs an error at second order in the survey opening angle, which may become important for future surveys such as DESI. In Philcox & Slepian 2021a, we consider two alternative LoS choices: the galaxy midpoint and the angle bisector. Both incur errors only at fourth order in the opening angle, but their naive implementation results in a prohibitively slow algorithm. Using newly defined series expansions for spherical harmonics and Legendre polynomials, we recast the 2PCF and power spectrum estimators in a form allowing for fast computation using FFTs, and demonstrate their applicability using BOSS-like mock catalogs.

Reducing the Dimensionality of Cosmological Observables

Most cosmological likelihoods rely on covariance matrices computed from a suite of simulations. Many observables are high-dimensional, thus, to obtain accurate covariances that will give minimal bias to the cosmological parameter inference, we require a large number of mock data-sets, which can be computationally challenging to create. In Philcox et al. 2020e, we present a technique to compress arbitrary observables into a small number of bins, given a theory model, priors and a rough estimate of the covariance. This works by projecting the statistic into a subspace that minimizes the error in the log-likelihood; rather than conventional approaches, it singles out the dominant basis vectors for a specific analysis, allowing for optimal compression, and significantly reducing the number of mocks that need to be computed. Applying this to data from BOSS DR12, we can reduce the 96-bin power spectrum to 12 subspace coefficients without bias or loss of precision; indeed the analysis can be performed accurately with only ~ 100 mocks, as the effects of covariance matrix noise are drastically reduced. The method is easily extended to higher order statistics; we show that a ∼ 2000 bin bispectrum can be accurately compressed into only ∼ 10 coefficients, making the analysis of higher-order statistics much more feasible.

Measuring the Hubble Parameter from Galaxy Surveys: With and Without the Sound Horizon

Two scales are encoded in the galaxy power spectrum; the sound horizon at recombination and the physical horizon at matter-radiation equality. The first sources the Baryon Acoustic Oscillations (BAO), whose positions are a well-known standard ruler, allowing accurate measurements of the Hubble parameter, H0. Over the past decade, galaxy surveys have focussed on measuring H0 through this feature, a process assisted by reconstruction of the galaxy density field, first proposed in Eisenstein et al. 2006. This effectively ‘undoes’ the effects of long wavelength displacements, shifting galaxies closer to their Lagrangian-space positions, and sharpening the BAO wiggles. Recently, a new tool has been developed: full-shape (FS) analyses of the power spectrum, which use all the power spectrum information, not just that in the wiggles. Whilst this can produce tighter constraints on cosmology, it cannot be easily applied to reconstructed spectra. In Philcox et al. 2020a, we perform a joint analysis of pre- and post-reconstruction galaxy power spectra, extracting geometric Alcock-Paczynski parameters from the former (via a robust new method making use of a correlated theoretical error), which is used to inform an FS analysis of the latter spectrum via a joint covariance matrix. Applying this to data from BOSS, we place strong constraints on cosmology without any information from the CMB. In particular, we obtain a 1.6% constraint on H0 when using baryon calibration from Big Bang Nucleosynthesis (BBN), which tightens to 1% with a Planck prior on the spectral slope, though does not significantly tighten constraints when the Planck likelihood is added.

In Philcox et al. 2020d we instead consider the second key scale; the horizon at matter-radiation equality. As shown in Baxter & Sherwin 2020 for CMB lensing, this can also be used as a standard ruler, allowing for an H0 measurement independent of sound-horizon physics. Even though the equality peak cannot be reliably resolved in current survey data, it can still be measured from the overall shape, and, if no prior on the baryon density is assumed, this sources the bulk of the H0 information. Combining BOSS power spectra with data from the Pantheon supernova sample (which provides necessary information on the matter density), we obtain H0 constraints of 65.1+3.0-5.4 km/s/Mpc, with little variation when uncalibrated BAO are used instead of supernovae. Notably, 95% of the resulting posterior is below the best-fit value of SH0ES (Riess et al. 2019). This can be straightforwardly combined with the constraints from CMB lensing to yield 70.6+3.7-5.0 km/s/Mpc. Through various tests involving Fisher forecasts, mock data, and scale cuts we demonstrate that the constraints are not receiving significant information from the sound horizon, and, further, that future surveys should soon be able to reach constraints at the 2.5% level. This has significant implications for the Hubble tension: equality-based measurements are sensitive to higher redshifts than BAO constraints, thus may respond differently if new physics is present in the early Universe. For now, we have consistent results from the two scales; if such consistency continues to higher precision, it will place strong bounds on many post-LCDM solutions to the Hubble tension.

Understanding the Marked Power Spectrum with Perturbation Theory

Recently, it has been shown that the marked power spectrum can place far tighter constraints on cosmological parameters (particularly the neutrino mass) than the conventional power spectrum. What new information does it contain beyond conventional statistics? In Philcox et al. 2020c, we find that the mark induces a significant coupling between small-scale non-Gaussianities and large scales, leading to the additional information content. To make sense of this, we derive a model using one-loop perturbation theory, which is compared with N-body simulations. Unlike the usual power spectrum, there is no well-defined convergence radius of the theory, due to the mixing of large and small scales. The theory is further developed in Philcox et al. 2020f, discussing redshift-space distortions, biased tracers and modifications for massive neutrino cosmologies. In particular, we introduce a large-scale ansatz that significantly improves the fit to simulation data, allowing for percent-accurate models in the linear and mildly non-linear regimes, at the expense of a few additional free parameters. Before the full power of this tool can be realized, more work is needed to study whether its efficacy remains valid for realistic galaxy samples, and whether it can be captured by our models.

The Effective Halo Model: Accurate Models for the Matter Power Spectrum and Cluster Counts

The standard Halo Model is one of the staples of analytic cosmology; a phenomenological model that predicts galaxy clustering across a huge range of scales. Though widely applicable, the halo model suffers from a lack of precision, and it is known to be highly inaccurate for scales around the non-linear wavenumber. In Philcox et al. 2020b, we introduce the Effective Halo Model which attempts to tackle this problem. One of the key assumptions in the standard halo model is that halos are distributed according to the underlying linear density field; in this work, we assume them to be distributed according to the underlying non-linear density field smoothed on some scale R. The statistics of this non-linear field are evaluated using Effective Field Theory. This leads to a two-parameter model for the matter power spectrum that is shown to be percent-level accurate for wavenumbers between 0.02 and 1 h/Mpc. Furthermore, since the model is physically motivated, it can be extended to statistics beyond the power spectrum, in particular covariances of halo number counts, both alone and in combination with the matter power spectrum. We introduce a new model for both intrinsic and super-sample covariance components, both of which are validated with extensive use of the Quijote simulations. In addition, we show that there is an important contribution from halo exclusion (the fact that two halos cannot overlap) which has not previously been discussed. These are key underlying statistics for thermal Sunyaev-Zel’dovich and Weak Lensing analyses, and via this method, joint likelihoods can be easily obtained.

Measuring the Small-Scale Galaxy Power Spectrum and Bispectrum

Understanding galaxy clustering statistics, such as the power spectrum and bispectrum, can give us an array of information, both on cosmology and galaxy formation. Conventional methods to measure these use Fast Fourier Transforms, which are optimal on large-scales, but require large computation times on small scales, and suffer from effects such as aliasing and shot-noise. In Philcox & Eisenstein 2020, we introduce HIPSTER, a suite of new estimators for computing small-scale power spectra and isotropic bispectra in configuration space via weighted pair and triple counts, with no explicit use of Fourier transforms. These fully account for the survey window function, and do not include aliasing, shot-noise or discretization artifacts. Unlike conventional Fourier-transform-based approaches, our algorithm becomes more efficient on small scales (since a smaller pair-count truncation may be used), thus we may efficiently estimate spectra across k-space by coupling this method with standard techniques. The utility of the method is shown by application to BOSS DR12 simulations to compute the high-k power spectrum and its covariance, which is in good agreement with that from our theoretical model. Computing configuration- and Fourier-space statistics in the same manner allows us to consider joint analyses, which can place stronger bounds on cosmological parameters.

In Philcox 2020a, we further develop these techniques, focusing on their usage for the analysis of periodic N-body simulations. In particular, it is shown that the configuration space isotropic bispectrum estimator can be written as a pair count, making use of spherical harmonic theorems, meaning its computational efficiency is the same as for the power spectrum. Furthermore, uniform survey geometry leads to a number of simplifications; in particular, we obviate the need for large random catalogs for the power spectrum algorithm and most parts of the bispectrum algorithm, which significantly decreases the run-time. The estimators are rigorously tested with a suite of N-body simulations, and their covariances discussed. By combining conventional FFT-based methods with HIPSTER, we can compute the power spectrum and bispectrum efficiently on all scales, paving the way for broad and fast analyses.

Computing Galaxy Correlation Function Covariances

In order to extract information from large cosmological surveys, we require robust statistics, as well as accurate estimates of their covariance matrices. In practice, this is difficult, since covariance matrices have non-trivial dependencies on the survey window function, which can be highly anisotropic, thus covariances are normally computed from a large suite of simulations, which is highly computationally expensive. In Philcox et al. 2019, we introduce RascalC, a C++ code for estimating the covariance matrices of large scale two-point correlation function (2PCF) covariances in arbitrary survey geometries. This runs around 10,000 times faster than previous codes, computing finely binned covariances in a fraction of the time required with mocks. As in previous works (O’Connell & Eisenstein 2018, O’Connell et al. 2015), we include the effects of non-Gaussianity via a small rescaling of shot-noise in the theoretical model, which has been shown to work well in practice. In addition, the resulting free parameter can be computed from the data itself, via jackknifing.

For most modern analyses of galaxy correlation functions, information is included from statistics beyond the isotropic 2PCF monopole. In particular, angular information informs us about redshift-space-distortions, and gravitational non-linearities (and inflation) source a non-zero three-point correlation function. In Philcox & Eisenstein 2019, we extend our covariance matrix estimators to include Legendre multipoles of the 2PCF and the 3PCF, requiring numerical integration over an 18-dimensional space, made possible through the fast RascalC algorithm and importance sampling. This code has been publicly released, and is shown to be accurate, reducing covariance computation time from CPU-decades to a matter of hours. RascalC was used to compute fast and accurate covariance matrices for the recent eBOSS analysis (Wang et al. 2020).

Inferring Galactic Parameters from Stellar Chemical Abundances

To understand galactic physics and create realistic hydrodynamical simulations, we require accurate knowledge of underlying physical parameters in the Galaxy, such as the slope of the initial mass function (IMF) and the frequency of Type Ia supernovae. An intriguing way in which to constrain these parameters is by looking at the chemical abundances of stars; these trace the composition of the interstellar medium, and are intrinsically related to the abundance of stars of different types. In the Chempy project (Rybizki et al. 2017), a simple galactic chemical evolution (GCE) model was constructed which allowed for parameter estimates to be performed in a Bayesian framework, primarily using the set of proto-stellar abundances. In Philcox & Rybizki 2019, this was extended to include multiple stars, incorporating modern statistical methods such as Hamiltonian Monte Carlo and neural networks. Using simulated data from IllustrisTNG, we demonstrate the utility of this method for obtaining parameter constraints, which grow tighter as more stars are included, and are competitive with other methods, including star counts.

A key hyperparameter in these analyses is the set of nucleosynthetic yields which determine the elemental output of stellar processes such as supernovae and asymptotic giant branch stars, and, in Philcox & Rybizki 2019, this was shown to have a non-negligible bias on the galactic parameter estimates. To distinguish between the variety of yield tables available, we present a scoring system in Philcox et al. 2018a (using Bayesian evidence ratios and cross-validation), based on the ability of a yield table to reproduce proto-solar abundances when used inside our GCE model. An important application of this is to allow simulators to pick yield tables that accurately represent galactic chemistry chemistry. For this purpose, we also show that Chempy can be used to dramatically improve elemental abundance predictions of hydrodynamical simulations by plugging tailored best-fit parameters into a Milky Way simulation.

Detecting and Removing Dust from CMB B-modes

One of the key science goals for any upcoming Cosmic Microwave Background (CMB) survey is the detection of gravitational wave (GW) signals from the very early universe, a key signature of cosmological inflation. Whilst the most promising place to look for these is in the polarized CMB B-mode sky, a major difficulty is that Galactic foregrounds such as dust can easily mimic the inflationary signals. Though foregrounds are typically separated from the CMB via their frequency dependence, this can lead to imperfect subtraction, and has previously led to false ‘discoveries’ of inflationary GWs. In Philcox et al. 2018b, we investigate the extent to which the Galactic dust B-modes’ statistical anisotropy can be used to distinguish them from inflationary B-modes, using accurate dust and CMB simulations. We find that such methods are able to distinguish between dust and and gravitational waves close to the resolution level of upcoming experiments. Furthermore, we demonstrate how anisotropy statistics can be used to construct an estimate of the dust B-mode map, which could potentially be used to clean the B-mode sky.

Oliver Philcox