RomanHomeAbout RomanScience OverviewObservatoryFor ScientistsNewsMultimediaWide-Field Science – RegularJeffrey Newman / University of Pittsburgh, PI
The vast majority of galaxies in Roman imaging will lack spectroscopic redshifts, so photometric redshifts (photo-zs) will be crucial for both cosmology and galaxy evolution studies. We propose to build on machine learning methods that can interpolate between the limited sampling of the color-redshift relation from deep but small-area spectroscopic surveys and clean incorrect redshifts out of spectroscopic training sets for Roman photo-zs, enabling the construction of nearly-ideal spectroscopic training data from sparse and imperfect samples. The methods we will explore should enable improvements in both the performance of photo-z algorithms at predicting redshifts for individual objects as well as in the calibration of the outputs of those algorithms, which otherwise may be a dominant systematic uncertainty in Roman cosmology studies.
First, we will apply a powerful non-linear dimension reduction technique, UMAP (Uniform Manifold Approximation and Projection), to compress galaxy SEDs into a low-dimensional continuous space, using existing data from fields with existing multiwavelength and spectroscopic data. In contrast to the Self-Organizing Maps (SOMs) often used to map observed galaxy SEDs onto a 2-D rectangular and discrete grid for photo-z applications, UMAP provides a continuous, topologically flexible, and robust low-D representation of optical-IR color space, which can be trained using large photometric galaxy samples. We expect that observed SEDs should intrinsically occupy a roughly 3-D manifold, since apparent colors are determined primarily by redshift, specific star formation rate, and a degenerate combination of dust/metallicity. Supervised variants of this algorithm trained using high-quality redshifts may help to make the structure of the low-dimensional color-redshift manifold more informative.
We will then train a robust Gaussian process regression algorithm, which can interpolate optimally and identifies and ignores outliers, to map from location in the low-dimensional UMAP space to redshift. Current spectroscopic and many-band photo-z samples have incorrect-redshift rates that are large enough to compromise the calibration of redshift distributions for cosmology; however, such incorrect redshifts should be easily identifiable in the lower-dimensional UMAP space, as they will be out of line with other redshifts in the same region of the color manifold. If the robustness to outliers is great enough, the numerous but less-certain low-resolution spectroscopic redshifts and many-band photo-zs could be incorporated into Roman photo-z training and characterization.
Our procedures will also address the problem that objects with spectroscopic redshifts provide only a sparse and inconsistent sampling of the relationship between the colors of galaxies and their redshifts due shot noise/limited sample size, selection effects in spectroscopic data sets, and sample/cosmic variance. For instance, galaxy populations that only inhabit the densest regions of the Universe may be missing entirely from training sets built from small fields at some redshifts but will be present at others; no amount of re-weighting can make up for their absence. With a mapping from UMAP coordinates to redshift in hand, however, we can construct augmented spectroscopic samples of arbitrary size that perfectly match the d …