Catalog Queries#

There are two ways to access astronomical data catalogs that are provided as table data with a VO API.

First, there is a Simple Cone Search (SCS) protocol used to search a given table with a given position and radius, getting back a table of results. The interface requires only a position and search radius.

For more complicated searches, the Table Access Protocol (TAP) protocol is a powerful tool to search any VO table. Here, we expand on its usage and that of the Astronomical Data Query Language (ADQL) that it uses.

Catalog Queries

# suppress some specific warnings that are not important
import warnings
warnings.filterwarnings("ignore", module="astropy.io.votable.*")
warnings.filterwarnings("ignore", module="pyvo.utils.xml.*")

import io
import numpy as np

# Astropy imports
import astropy.units as u
import astropy.constants as const
from astropy.coordinates import SkyCoord
from astropy.cosmology import Planck15
from astropy.io import votable as apvot
import scipy.integrate

## Generic VO access routines
import pyvo as vo

1. Simple cone search#

Starting with a single simple source first:

coord = SkyCoord.from_name("m51")
print(coord)

<SkyCoord (ICRS): (ra, dec) in deg
    (202.469575, 47.1952583)>

Below, we go through the exercise of how we can figure out the most relevant table. But for now, let’s assume that we know that we want the CFA redshift catalog refered to as ‘zcat’. VO services are listed in a central Registry that can be searched through a web interface or using PyVO’s regsearch. We use the registry to find the corresponding cone service and then submit our cone search.

Registry services are of the following type:

simple cone search: “scs”
table access protocol: “tap” or “table”
simple image search: “sia” or “sia2” for SIAv2 services
simple spectral access: “ssa”

There are a number of things in the registry related to ‘zcat’ so we find the specific one that we want, which is the CFA version:

services = vo.regsearch(servicetype='scs', keywords=['zcat'])
services.to_table()['ivoid', 'short_name', 'res_title']

Table length=6

ivoid	short_name	res_title
object	object	object
ivo://cds.vizier/j/mnras/339/652	J/MNRAS/339/652	The FLASH Redshift Survey
ivo://cds.vizier/vii/259	VII/259	6dF galaxy survey final redshift release
ivo://nasa.heasarc/sixdfgs	SIXDFGS	6dFGS Galaxy Survey Final Redshift Release Catalog
ivo://nasa.heasarc/uzc	UZC	Updated Zwicky Catalog
ivo://nasa.heasarc/zcat	CFAZ	CfA Redshift Catalog (June 1995 Version)
ivo://wfau.roe.ac.uk/twompz-dsa		2MASS Photometric Redshift catalogue (2MPZ)

Supposing that we want the table with the short_name CFAZ, and we want to retrieve the data for all sources within an arcminute of our specified location:

## Use the one that's CFAZ.
##  Use list comprehension to check each service's short_name attribute and use the first.
cfaz_cone_service = [s for s in services if 'CFAZ' in s.short_name][0]

## We are searching for sources within 10 arcminutes of M51.
results = cfaz_cone_service.search(pos=coord, radius=10*u.arcmin)
results.to_table()

Table length=2

__row	name	ra	dec	bmag	radial_velocity	radial_velocity_error	redshift	class	Search_Offset
		deg	deg		km / s	km / s
object	object	float64	float64	float32	int32	int16	float64	int16	float64
52427	N5194	202.46823	47.19815	9.03	474	23	--	6200	0.1819
52452	N5195	202.49491	47.26792	10.94	558	23	--	6200	4.4802

The SCS is quite straightforward and returns all of the columns of the given table (which can be anything) for the sources in the region queried.

2. Table Access Protocol queries#

A TAP query is the most powerful way to search a catalog. A Simple Cone Search only allows you to ask for a position and radius, but TAP allows you to do much more, since the available tables contain much more information.

2.1 TAP services#

Many services list a single TAP service in the Registry that can access many catalogs, boosting your efficiency, and letting you add constraints based on any column. This is the power of the TAP!

Suppose for our example, we want to select bright galaxy candidates but don’t know the coordinates. Therefore, we start from figuring out the best table to query.

As before, we use the vo.regsearch() for a servicetype ‘tap’. There are a lot of TAP services in the registry, but they are listed slightly differently than cone services. The metadata on each catalog is usually published in the registry with its cone service, and then the full TAP service is listed as an “auxiliary” service. So to find a TAP service for a given catalog, we need to add the option includeaux=True. Alternatively, you can start with a single TAP service and then ask it specifically which tables it serves, but for this use case, that is less efficient.

We’ll first do a registry search for all auxiliary TAP services related to the “CfA” and “redshift”.

tap_services = vo.regsearch(servicetype='tap', keywords=['cfa redshift'], includeaux=True)
tap_services.to_table()['ivoid', 'short_name', 'res_title']

Table length=20

ivoid	short_name	res_title
object	object	object
ivo://cds.vizier/j/a+a/408/905	J/A+A/408/905	Very Luminous Galaxies
ivo://cds.vizier/j/a+a/506/1477	J/A+A/506/1477	Radio source distribution
ivo://cds.vizier/j/a+a/679/a95	J/A+A/679/A95	Type Ia infrared data
ivo://cds.vizier/j/a+as/139/525	J/A+AS/139/525	Abell 496 heliocentric velocities
ivo://cds.vizier/j/aj/109/1458	J/AJ/109/1458	Neighborhoods of 36 loose groups of galaxies
ivo://cds.vizier/j/aj/109/2368	J/AJ/109/2368	Redshifts in First CfA Slice. III.
ivo://cds.vizier/j/aj/112/1803	J/AJ/112/1803	Large Scale Structure
ivo://cds.vizier/j/aj/147/99	J/AJ/147/99	Spectroscopy of 73 stripped core-collapse supernovae
ivo://cds.vizier/j/apj/655/790	J/ApJ/655/790	Groups of galaxies in 2MASS survey
ivo://cds.vizier/j/apj/885/85	J/ApJ/885/85	Automated DASH classification for supernovae
ivo://cds.vizier/j/apjs/119/277	J/ApJS/119/277	-3.5<=DE<=8.5, 11.5<RA<13.5 redshift survey
ivo://cds.vizier/j/apjs/121/287	J/ApJS/121/287	The CfA2S catalog
ivo://cds.vizier/j/apjs/213/19	J/ApJS/213/19	Optical and near-IR light curves of 64 SNe
ivo://cds.vizier/j/pasp/111/438	J/PASP/111/438	Updated Zwicky catalog (UZC)
ivo://cds.vizier/vii/193	VII/193	The CfA Redshift Catalogue, Version June 1995
ivo://cds.vizier/vii/84a	VII/84A	Groups of Galaxies. III. The CfA Survey
ivo://nasa.heasarc/cfa2s	CfARed.S.	CfA Redshift Survey: South Galactic Cap Data
ivo://nasa.heasarc/rasscals	RASS-CALS	ROSAT All-Sky Survey CALS Galaxy Groups Catalog
ivo://nasa.heasarc/uzc	UZC	Updated Zwicky Catalog
ivo://nasa.heasarc/zcat	CFAZ	CfA Redshift Catalog (June 1995 Version)

There are many tables that mention these keywords. Pick some likely looking ones and look at the descriptions:

for t in tap_services:
    if "cfa" in t.res_title.lower() and "magnitude" in t.res_description.lower():
        print(f"{t.ivoid}:  {t.res_description}\n")

ivo://cds.vizier/j/aj/109/2368:  We have measured redshifts for 241 galaxies selected from the Zwicky catalog with magnitudes 15.6 or 15.7 in the region (26.5deg<=delta<=32.5deg, 8h<=alpha=<17h). This region is the original CfA "Slice of the Universe", which was complete to 15.5; now nearly all Zwicky-catalog galaxies in the region have measured redshifts. These somewhat fainter galaxies outline the same structures as the brighter ones. By examinating a sample of 27 galaxies within the most prominent void we find (1) the fraction of absorption-line galaxies is typical of regions outside the cores of rich clusters; and (2) the local morphology-density relation holds within the globally underdense void.

ivo://cds.vizier/j/apjs/121/287:  We present redshifts for a sample of 4391 galaxies with magnitude m_B(0)_<=15.5 covering 20h<=RA<=4h in right ascension and -2.5{deg}<=DE<=90{deg} in declination. This sample is complete for all galaxies in the merge of the Zwicky et al. and Nilson catalogs in the south Galactic cap. Redshifts for 2964 of these were measured as part of the second CfA Redshift Survey. The data reveal large voids in the foreground and background of the Perseus-Pisces Supercluster. The largest of these voids lies at a mean velocity ~8000km/s, has diameter ~5000km/s, and is enclosed by a complex of dense structures. The large structure known as the Perseus-Pisces Supercluster forms the near side of this complex. On the far side of this large void, at a mean velocity ~12000km/s, there is another coherent dense wall. The structures in this survey support the view that galaxies generally lie on surfaces surrounding or nearly surrounding low-density regions or voids.

ivo://cds.vizier/vii/193:  The catalog (files zcat.dat and zbig.dat) incorporates much of the latest velocity data from the Whipple Observatory and other sources as well as velocities from earlier compilations such as the Second Reference Catalogue of de Vaucouleurs, de Vaucouleurs, and Corwin; the Index of Galaxy Spectra of Gisler and Friel, and the Catalogue of Radial Velocities of Galaxies of Palumbo, Tanzella-Nitti, and Vettolani. The data included here are BT magnitudes and a few UGC numbers with increased "accuracy" in the velocity source information. The following separate tables contain selected objects from the CfA Redshift Survey: - the first CfA Redsdhift Survey (Huchra et al., 1983ApJS...52...89H); this catalogue contains 2395 galaxies, a number slightly smaller than the published number of 2399, since there have been small corrections to the magnitudes for multiple galaxies. This version has had new morphological types added by me as well as POSS diameters measured for those galaxies that were not in the UGC or did not have UGC diameters. - the first slice of the second CfA redshift survey (Huchra et al., 1990ApJS...72..433H) - the second slice of the CfA redshift survey (Huchra et al., 1995ApJS...99..391H) See the complete description in 'zcom.tex' (in plain TeX), or zcom.txt (plain ascii)

ivo://cds.vizier/vii/84a:  The catalog contains statistically homogeneous groups of galaxies based on the Harvard-Smithsonian Center for Astrophysics (CfA) redshift survey. Groups in the catalog are all density enhancements in redshift space of a factor greater than 20. All groups contain at least three members. There are 176 groups in the catalog and 102 groups have been identified in one or more previous studies. The catalog includes group numbers, numbers of members, equatorial coordinates, mean flow-corrected galactocentric velocities, line-of-sight-velocity dispersion, integrated group magnitudes, crossing times for the groups, and mean harmonic radii of the groups.

ivo://nasa.heasarc/cfa2s:  The Center for Astrophysics (CfA) Redshift Survey South Galactic Cap (CFA2S) Catalog contains redshifts for a sample of about 4300 galaxies with blue (Zwicky B(0) type) magnitude &lt;= 15.5 covering the range from 20 h to 4h in right ascension and from -2.5 deg to 90 deg in declination. This sample is complete for all galaxies in the merge of the Zwicky et al. and Nilson catalogs in the south Galactic cap. Redshifts for 2964 of these were measured as part of the second CfA Redshift Survey. The data reveal large voids in the foreground and background of the Perseus-Pisces Supercluster. The largest of these voids lies at a mean velocity ~ 8000km/s, has diameter of ~ 5000km/s, and is enclosed by a complex of dense structures. The large structure known as the Perseus-Pisces Supercluster forms the near side of this complex. On the far side of this large void, at a mean velocity of ~ 12000km/s, there is another coherent dense wall. The structures in this survey support the view that galaxies generally lie on surfaces surrounding or nearly surrounding low-density regions or voids. This table was created by the HEASARC in March 2005 based on CDS table J/ApJS/121/287/cfa2s.dat.gz This is a service provided by NASA HEASARC .

ivo://nasa.heasarc/zcat:  The ZCAT database contains the CfA Redshift Catalog, which incorporates much of the latest velocity data from the Whipple Observatory and other sources, as well as velocities from earlier compilations such as the &quot;Second Reference Catalog&quot; of de Vaucouleurs, de Vaucouleurs, and Corwin; the &quot;Index of Galaxy Spectra&quot; of Gisler and Friel; and the &quot;Catalog of Radial Velocities of Galaxies&quot; of Palumbo, Tanzella-Nitti, and Vettolani. It includes BT magnitudes, some UGC numbers, and increased &quot;accuracy&quot; in the velocity source information. The data presented here have primarily been assembled for the purpose of studying the large scale structure of the universe, and, as such, are nearly complete in redshift information, but are not necessarily complete in such categories as diameter, magnitude, and cross-references to other catalogues. The original HEASARC version was constructed based on an earlier version of the catalog and was released on November 15, 1996. The HEASARC created the current version of ZCAT in February 2001 based on CDS/ADC Catalog VII/193, &quot;The CfA Redshift Catalogue&quot;, Version June 1995, tables &lt;a href="https://cdsarc.cds.unistra.fr/ftp/cats/VII/193/zcat.dat"&gt;https://cdsarc.cds.unistra.fr/ftp/cats/VII/193/zcat.dat&lt;/a&gt; and zbig.dat. The former table contains the main body of the CFA Redshift Catalog (57536 objects) and entries from it are distinguishable in the current database by having either listed radial velocity values but not redshifts or neither, while the latter table contains 1202 high-redshift galaxies (distinguishable in the current database by their having listed redshift values but not radial velocities). This is a service provided by NASA HEASARC .

From the above information, you can choose the table you want and then use the specified TAP service to query it as described below.

But first we’ll look at the other way of finding tables to query with TAP: by starting with the TAP services listed individually in the Registry. We see above that the HEASARC has the ZCAT, so what else does it have?

You can find out which tables a TAP serves and then look at the tables descriptions. The last line here sends a query directly to the service to ask it for a list of tables. (This can take a minute, since there may be a lot of tables.)

#  Here, we're looking for a specific service, and we don't need the includeaux option:
tap_services = vo.regsearch(servicetype='tap',keywords=['heasarc'])
heasarc = tap_services[0]
heasarc_tables=heasarc.service.tables

Then let’s look for tables matching the terms we’re interested in as above.

for tablename in heasarc_tables.keys():
    description = heasarc_tables[tablename].description
    if description and "redshift" in description.lower():
        heasarc_tables[tablename].describe()
        print("Columns={}".format(sorted([k.name for k in heasarc_tables[tablename].columns ])))
        print("----")

abellzcat
    Abell Clusters Measured Redshifts Catalog

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'abell_distance_class', 'abell_radius', 'abell_richness_class', 'bautz_morgan_class', 'bautz_morgan_flag', 'bii', 'class', 'dec', 'lii', 'log_redshift_ratio', 'mag_10', 'name', 'ra', 'redshift', 'redshift_flag', 'ref_redshift']
----
atlasspecz
    AT Large Area Survey (ATLAS) Spectroscopic Classes & Redshifts C

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'abs_rmag', 'bii', 'bmag', 'broad_type', 'dec', 'flux_1p4_ghz', 'lii', 'log_lum_1p4_ghz', 'name', 'ra', 'redshift', 'rmag', 'source_id']
----
ccosmphotz
    Chandra COSMOS (C-COSMOS) Survey Photometric Redshift Catalog

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'best_fit_template', 'bii', 'dec', 'ilbert_source_number', 'lii', 'morph_class', 'name', 'phot_redshift', 'phot_redshift_max', 'phot_redshift_min', 'ra', 'redshift_pdf', 'source_number', 'variability']
----
cfa2s
    CfA Redshift Survey: South Galactic Cap Data

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'bar_type', 'bii', 'bmag', 'class', 'dec', 'lii', 'luminosity_class', 'major_axis', 'minor_axis', 'morph_type', 'name', 'notes', 'peculiar_type', 'ra', 'radial_velocity', 'radial_velocity_error', 'ref_bmag', 'ref_radial_velocity', 'ugc_multiplicity', 'ugc_number']
----
iraspscz
    IRAS Point Source Catalog Redshift (PSCz) Catalog

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'alt_name', 'bii', 'class', 'dec', 'flux_60um', 'hex_hsd_flag', 'lii', 'lune_bin_100_um_bck', 'lune_bin_bmag_extinction', 'lune_bin_number', 'name', 'number_hcon_coverages', 'optical_class_code', 'ra', 'radial_velocity', 'radial_velocity_error', 'ratio_100_to_60_um_flux', 'redshift_status', 'ref_radial_velocity', 'source_sample', 'source_type']
----
iraszsurv
    IRAS 1.2-Jy Redshift Survey

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'bii', 'cirrus_flag', 'class', 'comments', 'dec', 'extend_flag', 'helio_velocity', 'helio_velocity_error', 'lii', 'micron_flux_100', 'micron_flux_100_flag', 'micron_flux_12', 'micron_flux_12_flag', 'micron_flux_25', 'micron_flux_25_flag', 'micron_flux_60', 'micron_flux_60_flag', 'name', 'obj_id_flag', 'opt_mag', 'psc_id', 'ra', 'source', 'spectrum_flag', 't_type', 'ugc_or_eso_number', 'var_flag']
----
lcrscat
    Las Campanas Redshift Survey Catalog

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'bii', 'broad_type', 'central_mag', 'class', 'dec', 'drift_scan', 'field_name', 'gsf_product', 'isophotal_mag', 'lii', 'name', 'nearby_radial_velocity', 'phot_cat_obj_num', 'ra', 'radial_velocity', 'radial_velocity_error', 'sf_product', 'spect_fiber']
----
lowzvlqvla
    VLA 6-GHz Observations of Low-Redshift SDSS QSOs

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'abs_imag', 'bii', 'dec', 'flux_6_ghz', 'flux_6_ghz_error', 'imag', 'int_flux_6_ghz', 'int_flux_6_ghz_error', 'lii', 'log_luminosity_6_ghz', 'name', 'off_set', 'peak_over_int_ratio', 'ra', 'radio_over_iband_ratio', 'redshift']
----
planckhzsc
    Planck High-Redshift Source Candidates Catalog

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'bii', 'cc_sel_prob', 'dec', 'e_bv_aperture', 'e_bv_aperture_error', 'e_bv_mean', 'fir_lum_25k', 'fir_lum_25k_neg_err', 'fir_lum_25k_pos_err', 'fir_lum_30k', 'fir_lum_30k_neg_err', 'fir_lum_30k_pos_err', 'fir_lum_35k', 'fir_lum_35k_neg_err', 'fir_lum_35k_pos_err', 'fir_lum_40k', 'fir_lum_40k_neg_err', 'fir_lum_40k_pos_err', 'fir_lum_45k', 'fir_lum_45k_neg_err', 'fir_lum_45k_pos_err', 'fir_lum_50k', 'fir_lum_50k_neg_err', 'fir_lum_50k_pos_err', 'flux_217_ghz', 'flux_217_ghz_fit_err', 'flux_217_ghz_meas_err', 'flux_217_ghz_sky_err', 'flux_353_ghz', 'flux_353_ghz_fit_err', 'flux_353_ghz_meas_err', 'flux_353_ghz_sky_err', 'flux_545_ghz', 'flux_545_ghz_fit_err', 'flux_545_ghz_meas_err', 'flux_545_ghz_sky_err', 'flux_857_ghz', 'flux_857_ghz_fit_err', 'flux_857_ghz_meas_err', 'flux_857_ghz_sky_err', 'herschel_flag', 'lii', 'major_axis', 'major_axis_error', 'matching_planck_catalogs', 'minor_axis', 'minor_axis_error', 'name', 'phot_redshift_25', 'phot_redshift_25_chi2', 'phot_redshift_25_neg_err', 'phot_redshift_25_pos_err', 'phot_redshift_30', 'phot_redshift_30_chi2', 'phot_redshift_30_neg_err', 'phot_redshift_30_pos_err', 'phot_redshift_35', 'phot_redshift_35_chi2', 'phot_redshift_35_neg_err', 'phot_redshift_35_pos_err', 'phot_redshift_40', 'phot_redshift_40_chi2', 'phot_redshift_40_neg_err', 'phot_redshift_40_pos_err', 'phot_redshift_45', 'phot_redshift_45_chi2', 'phot_redshift_45_neg_err', 'phot_redshift_45_pos_err', 'phot_redshift_50', 'phot_redshift_50_chi2', 'phot_redshift_50_neg_err', 'phot_redshift_50_pos_err', 'position_angle', 'position_angle_error', 'ra', 'sfr_25k', 'sfr_25k_neg_err', 'sfr_25k_pos_err', 'sfr_30k', 'sfr_30k_neg_err', 'sfr_30k_pos_err', 'sfr_35k', 'sfr_35k_neg_err', 'sfr_35k_pos_err', 'sfr_40k', 'sfr_40k_neg_err', 'sfr_40k_pos_err', 'sfr_45k', 'sfr_45k_neg_err', 'sfr_45k_pos_err', 'sfr_50k', 'sfr_50k_neg_err', 'sfr_50k_pos_err', 'snr_353_ghz', 'snr_545_ghz', 'snr_857_ghz', 'snr_excess_545_ghz']
----
sixdfgs
    6dFGS Galaxy Survey Final Redshift Release Catalog

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'av', 'bii', 'bj_mag', 'dec', 'lii', 'name', 'name_redshift_field', 'num_highq_measure', 'num_total_measures', 'original_weight', 'program_id', 'ra', 'radial_velocity', 'radial_velocity_compflag', 'radial_velocity_error', 'radial_velocity_qualflag', 'ref_radial_velocity', 'rf_mag', 'specid_values', 'supercosmos_class', 'target_id', 'template_code']
----
twodfqsoz
    2dF QSO Redshift (2QZ) Survey

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'apm_x_pos', 'apm_y_pos', 'bii', 'bj_mag', 'bj_r_color', 'broad_type_1', 'broad_type_2', 'class', 'comment_spectrum_1', 'comment_spectrum_2', 'dec', 'fiber_number_1', 'fiber_number_2', 'field_spect_number_1', 'field_spect_number_2', 'flux_20_cm', 'internal_name', 'lii', 'name', 'num_obs', 'previous_redshift', 'quality_flag_1', 'quality_flag_2', 'ra', 'rass_flux', 'reddening', 'redshift_1', 'redshift_2', 'sector_name', 'snr_spectrum_1', 'snr_spectrum_2', 'source_number', 'spectrum_1_date', 'spectrum_2_date', 'u_bj_color', 'ukst_field_number']
----
twomassrsc
    2MASS Redshift Survey (2MRS) Catalog

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'alt_name', 'axis_ratio', 'bii', 'dec', 'hmag_0', 'hmag_0_error', 'hmag_0_tot', 'hmag_0_tot_error', 'jmag_0', 'jmag_0_error', 'jmag_0_tot', 'jmag_0_tot_error', 'ks_mag_0', 'ks_mag_0_error', 'ks_mag_0_tot', 'ks_mag_0_tot_error', 'lii', 'log_k20_semimajor_axis', 'log_tot_semimajor_axis', 'morph_type', 'name', 'ra', 'radial_velocity', 'radial_velocity_error', 'radial_velocity_source', 'reddening', 'ref_morph_type', 'ref_radial_velocity', 'xsc_ppc_flags']
----

xdeep2
    DEEP2 Galaxy Redshift Survey Fields Chandra Point Source Catalog

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'bayesian_galaxy_prob', 'bii', 'bmag', 'csc_name', 'dec', 'error_radius', 'fb_counts_50pc_eef', 'fb_counts_50pc_eef_error', 'fb_counts_50pc_eef_limit', 'fb_counts_90pc_eef', 'fb_counts_90pc_eef_error', 'fb_counts_90pc_eef_limit', 'fb_flux', 'fb_flux_error', 'fb_flux_limit', 'field_number', 'field_subfield_id', 'flux_ratio', 'flux_ratio_lower', 'flux_ratio_upper', 'hardness_ratio', 'hardness_ratio_lower', 'hardness_ratio_upper', 'hb_counts_50pc_eef', 'hb_counts_50pc_eef_error', 'hb_counts_50pc_eef_limit', 'hb_counts_90pc_eef', 'hb_counts_90pc_eef_error', 'hb_counts_90pc_eef_limit', 'hb_flux', 'hb_flux_error', 'hb_flux_limit', 'imag', 'lii', 'name', 'off_axis', 'off_set', 'opt_dec', 'opt_ra', 'opt_source_number', 'ra', 'radius_50pc_eef', 'radius_90pc_eef', 'redshift', 'rmag', 'sb_counts_50pc_eef', 'sb_counts_50pc_eef_error', 'sb_counts_50pc_eef_limit', 'sb_counts_90pc_eef', 'sb_counts_90pc_eef_error', 'sb_counts_90pc_eef_limit', 'sb_flux', 'sb_flux_error', 'sb_flux_limit']
----
xmmcfrscat
    XMM-Newton/Canada-France Redshift Survey Fields X-Ray Sources

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'bii', 'counts', 'dec', 'flux', 'flux_error', 'hb_flux', 'hb_flux_error', 'lii', 'mos1_hr', 'mos1_hr_error', 'mos2_hr', 'mos2_hr_error', 'name', 'notes', 'pn_hr', 'pn_hr_error', 'ra', 'sb_flux', 'sb_flux_error']
----
xmmcfrsoid
    XMM-Newton/Canada-France Redshift Survey Fields Optical Identifi

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'bii', 'bmag', 'bmag_error', 'cfdf_number', 'dec', 'imag', 'imag_error', 'itot_mag', 'kmag', 'kmag_error', 'lii', 'name', 'notes', 'phot_redshift_1', 'phot_redshift_1_flag', 'phot_redshift_2', 'phot_redshift_2_flag', 'phot_redshift_3', 'phot_redshift_3_flag', 'ra', 'redshift', 'true_probability', 'umag', 'umag_error', 'vmag', 'vmag_error', 'xo_offset']
----
xmmcphotz
    XMM-Newton COSMOS (XMM-COSMOS) Survey Photometric Redshift Catal

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', 'best_fit_template', 'bii', 'dec', 'ilbert_source_number', 'lii', 'morph_class', 'name', 'phot_redshift', 'phot_redshift_max', 'phot_redshift_min', 'ra', 'redshift_pdf', 'source_number', 'variability']
----
zcat
    CfA Redshift Catalog (June 1995 Version)

Columns=['"__row"', '"__x_ra_dec"', '"__y_ra_dec"', '"__z_ra_dec"', '"distance"', 'bar_type', 'bii', 'bmag', 'bt_mag', 'class', 'comments', 'dec', 'diameter_1', 'diameter_2', 'lii', 'luminosity_class', 'morph_type', 'name', 'notes', 'ra', 'radial_velocity', 'radial_velocity_error', 'redshift', 'ref_bmag', 'ref_radial_velocity', 'ref_redshift', 'rfn_number', 'structure', 'ugc_or_eso']
----

There are a number of tables that appear to be useful table for our goal, including the ZCAT, which contains columns with the information that we need to select a sample of the brightest nearby spiral galaxy candidates.

Now that we know all the possible column information in the zcat catalog, we can do more than query on position (as in a cone search) but also on any other column (e.g., redshift, bmag, morph_type). The query has to be expressed in a language called ADQL.

2.2 Expressing queries in ADQL#

The basics of ADQL:

SELECT * FROM my.interesting.catalog as cat…

says you want all (“*”) columns from the catalog called “my.interesting.catalog”, which you will refer to in the rest of the query by the more compact name of “cat”.

Instead of returning all columns, you can

SELECT cat.RA, cat.DEC, cat.bmag from catalog as cat…

to only return the columns you’re interested in. To use multiple catalogs, your query could start, e.g.,

SELECT c1.RA,c1.DEC,c2.BMAG FROM catalog1 as c1 natural join catalog2 as c2…

says that you want to query two catalogs zipped together the “natural” way, i.e., by looking for a common column.

To select only some rows of the catalog based on the value in a column, you can add:

WHERE cat.bmag < 14

says that you want to retrieve only those entries in the catalog whose bmag column has a value less than 14.

You can also append

ORDER by cat.bmag

to return the result sorted ascending by one of the columns, adding DESC to the end for descending.

A few special functions in the ADQL allow you to query regions:

WHERE contains( point(‘ICRS’, cat.ra, cat.dec), circle(‘ICRS’, 210.5, -6.5, 0.5))=1

is how you would ask for any catalog entries whose RA,DEC lie within a circular region defined by RA,DEC 210.5,-6.5 and a radius of 0.5 (all in degrees). The ‘ICRS’ specifies the coordinate system.

See the ADQL documentation for more.

2.3 A use case#

Here is a simple ADQL query where we print out the relevant columns for the bright (Bmag <14) sources found within 1 degree of M51 (we will discuss how to define the table and column names below):

##  Inside the format call, the {} are replaced by the given variables in order.
##  So this asks for
##  rows of public.zcat where that row's ra and dec (cat.ra and cat.dec from the catalog)
##  are within radius 1deg of the given RA and DEC we got above for M51
##  (coord.ra.deg and coord.dec.deg from our variables defined above), and where
##  the bmag column is less than 14.
query = """SELECT ra, dec, Radial_Velocity, radial_velocity_error, bmag, morph_type FROM public.zcat as cat where
    contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',{},{},1.0))=1 and
    cat.bmag < 14
    order by cat.radial_velocity_error
    """.format(coord.ra.deg, coord.dec.deg)

results=heasarc.service.run_async(query)
#results = heasarc.search(query)
results.to_table()

Table length=3

ra	dec	radial_velocity	radial_velocity_error	bmag	morph_type
deg	deg	km / s	km / s
float64	float64	int32	int16	float32	int16
202.46823	47.19815	474	23	9.03	4
202.49491	47.26792	558	23	10.94	0
202.54745	46.66996	2569	25	13.30	-5

See the information on the zcat for column information. (We will use the ‘radial_velocity’ column rather than the ‘redshift’ column.) We note that spiral galaxies have morph_type between 1 - 9.

Therefore, we can generalize the query above to complete our exercise and select the brightest (bmag < 14), nearby (radial velocity < 3000), spiral ( morph_type = 1 - 9) galaxies as follows:

query = """SELECT ra, dec, Radial_Velocity, radial_velocity_error, bmag, morph_type FROM public.zcat as cat where
    cat.bmag < 14 and cat.morph_type between 1 and 9 and cat.Radial_Velocity < 3000
    order by cat.Radial_velocity
    """.format(coord.ra.deg, coord.dec.deg)

results = heasarc.service.run_async(query)
#results = heasarc.search(query)
results.to_table()

Table length=1120

ra	dec	radial_velocity	radial_velocity_error	bmag	morph_type
deg	deg	km / s	km / s
float64	float64	int32	int16	float32	int16
10.68474	41.26883	-297	1	4.30	3
189.20757	13.16275	-223	18	10.58	2
186.73679	15.04782	-182	12	12.23	1
23.46218	30.66019	-180	1	6.50	5
186.49551	15.67102	-155	26	13.70	7
183.45115	14.90010	-98	15	11.00	2
183.91296	13.90134	-84	27	12.08	4
148.89929	69.06297	-40	5	7.75	2
68.21188	71.88872	-28	10	12.10	7
...	...	...	...	...	...
160.76076	-36.36216	2965	36	13.80	5
186.92024	-8.27657	2978	20	13.13	2
327.32062	-60.60958	2981	2	13.23	5
85.81306	-30.07849	2982	32	13.67	1
199.74246	-47.91271	2982	10	13.10	3
268.99556	18.34047	2986	20	13.40	2
234.99208	58.08230	2987	22	13.30	3
165.35109	57.67702	2994	20	13.70	5
149.05805	59.30739	2996	20	13.30	3

2.4 TAP examples for a given service#

Each service may also provide some example queries. If they do, then you can see them with, e.g.:

heasarc.service.examples[0]

{'REQUEST': 'doQuery',
 'LANG': 'ADQL',
 'QUERY': "SELECT * FROM rosmaster WHERE exposure > 10000 AND 1=CONTAINS(POINT('ICRS', ra, dec),CIRCLE('ICRS', 50, -85, 1))          "}

Above, these examples look like a list of dictionaries. But they are actually a list of objects that can be executed:

for example in heasarc.service.examples:
    print(example['QUERY'])
    result=example.execute()
    #  Stop at one
    break
result.to_table()

SELECT * FROM rosmaster WHERE exposure > 10000 AND 1=CONTAINS(POINT('ICRS', ra, dec),CIRCLE('ICRS', 50, -85, 1))          

Table length=2

__row	seq_id	ra	dec	lii	bii	instrument	filter	site	exposure	requested_exposure	fits_type	start_time	end_time	name	pi_lname	pi_fname	ror	index_id	subj_cat	proc_rev	title	qa_number	ao	proposal_number	roll	rday_begin	rday_end	class	__x_ra_dec	__y_ra_dec	__z_ra_dec
		deg	deg	deg	deg				s	s		d	d												deg	d	d
object	object	float64	float64	float64	float64	object	object	object	int32	int32	object	float64	float64	object	object	object	int32	object	int16	int16	object	int32	int16	int32	int16	int32	int32	int16	float64	float64	float64
1	RH202299A01	49.3200	-85.5400	299.8517	-30.6815	HRI	N	MPE	43683	70000	RDF 3_6	50324.7425347222	50377.4775925926	RE J0317-853	BURLEIGH	MATTHEW	202299	a01	2	2	RE J0317-853	24301	6	7128	162	2316	2369	2900	0.0589725712332591	0.0506886092544769	-0.996971865567923
2	RH202299N00	49.3200	-85.5400	299.8517	-30.6815	HRI	N	MPE	36146	70000	RDF 4_2	50174.4961921296	50187.9867476852	RE J0317-853	BURLEIGH	MATTHEW	202299	n00	2	2	RE J0317-853	24302	6	7128	307	2163	2177	2900	0.0589725712332591	0.0506886092544769	-0.996971865567923

3. Using the TAP to cross-correlate and combine#

3.1 Cross-correlating to combine catalogs#

TAP can also be a powerful way to collect a lot of useful information from existing catalogs in one quick step. For this exercise, we will start with a list of sources, uploaded from our own table, and do a ‘cross-correlation’ with the zcat table.

For more on creating and working with VO tables, see that notebook. Here, we just read one in that’s already prepared:

First, check that this service can handle uploaded tables. Not all do.

heasarc.service.upload_methods

[<UploadMethod ivo-id="ivo://ivoa.net/std/TAPRegExt#upload-http"/>,
 <UploadMethod ivo-id="ivo://ivoa.net/std/TAPRegExt#upload-inline"/>]

The inline method is what PyVO will use. These take a while, i.e. half a minute.

query="""
    SELECT cat.ra as ra, cat.dec as dec, Radial_Velocity, bmag, morph_type
    FROM zcat cat, tap_upload.mysources mt
    WHERE
    contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',mt.ra,mt.dec,0.01))=1
    and Radial_Velocity > 0
    ORDER by cat.ra"""
zcattable = heasarc.service.run_async(query, uploads={'mysources': 'data/my_sources.xml'})
#zcattable = heasarc.search(query, uploads={'mysources': 'data/my_sources.xml'})
mytable = zcattable.to_table()
mytable

Table length=14

ra	dec	radial_velocity	bmag	morph_type
deg	deg
float64	float64	int32	float32	int16
136.00074	21.96792	3093	13.80	20
146.70334	22.01827	7446	14.60	-1
146.70334	22.01827	7597	15.00	-2
148.77806	14.29613	7194	15.10	2
175.03940	15.32725	3325	14.40	3
191.54740	30.72338	6517	14.70	-2
191.54199	30.73227	6651	15.00	20
194.91294	28.89537	6093	15.00	0
206.57996	43.84386	25864	18.50	--
206.57164	43.85051	2229	15.00	-1
209.96260	38.18183	2759	14.90	3
213.55601	15.62222	4644	14.00	3
219.96650	42.74234	2517	14.70	20
333.82170	37.29922	5207	15.30	1

Therefore we now have the Bmag, morphological type and radial velocities for all the sources in our list with a single TAP query.

3.2 Cross-correlating with user-defined columns#

Our input list of sources contains galaxy pair candidates that may be interacting with each other. Therefore it would be interesting to know what the morphological type and the Bmagnitude are for the potential companions.

In this advanced example, we want our search to be physically motivated since the criterion for galaxy interaction depends on the physical separation of the galaxies. Unlike the previous case, the search radius is not a constant, but varies for each candidate by the distance to the source. Specifically, we want to search for companions that are within 50 kpc of the candidate and therefore first need to find the angular diameter distance that corresponds to galaxy’s distance (in our case the radial velocity).

Therefore, we begin by taking our table of objects and adding an angDdeg column:

## The column 'radial_velocity' is c*z but doesn't include the unit; it is km/s
## Get the speed of light from astropy.constants and express in km/s
c = const.c.to(u.km/u.s).value
redshifts = mytable['radial_velocity']/c
mytable['redshift'] = redshifts
physdist = 0.05*u.Mpc # 50 kpc physical distance

angDdist = Planck15.angular_diameter_distance(mytable['redshift'].data)
angDrad = np.arctan(physdist/angDdist)
mytable['angDdeg'] = angDrad.to(u.deg)
mytable

Table length=14

ra	dec	radial_velocity	bmag	morph_type	redshift	angDdeg
deg	deg					deg
float64	float64	int32	float32	int16	float64	float64
136.00074	21.96792	3093	13.80	20	0.010317137464478843	0.06354147623734327
146.70334	22.01827	7446	14.60	-1	0.024837182528454405	0.026865139047937883
146.70334	22.01827	7597	15.00	-2	0.02534086431220361	0.026347225109192074
148.77806	14.29613	7194	15.10	2	0.02399660100855506	0.027777901534139626
175.03940	15.32725	3325	14.40	3	0.011091006165338556	0.0591638655187596
191.54740	30.72338	6517	14.70	-2	0.02173837208406357	0.030579669183669427
191.54199	30.73227	6651	15.00	20	0.022185347971629094	0.02997982632048326
194.91294	28.89537	6093	15.00	0	0.020324060320423405	0.032651528395800614
206.57996	43.84386	25864	18.50	--	0.08627301758205005	0.008319382638265264
206.57164	43.85051	2229	15.00	-1	0.007435143681966809	0.08786069920287261
209.96260	38.18183	2759	14.90	3	0.009203033386517016	0.07113666025554453
213.55601	15.62222	4644	14.00	3	0.015490716581002182	0.042588105321019934
219.96650	42.74234	2517	14.70	20	0.008395808276137487	0.0778991456401862
333.82170	37.29922	5207	15.30	1	0.017368682436967778	0.03807030457673168

Now we construct and run a query that uses the new angDdeg column in every row search. Note, we also don’t want to list the original candidates since we know these are in the catalog and we want rather to find any companions. Therefore, we exclude the match if the radial velocities match exactly.

query="""SELECT mt.ra, mt.dec, cat.ra, cat.dec, cat.Radial_Velocity, cat.morph_type, cat.bmag
    FROM zcat cat, tap_upload.mytable mt
    WHERE
    contains(point('ICRS',cat.ra,cat.dec),circle('ICRS',mt.ra,mt.dec,mt.angDdeg))=1
    and cat.Radial_Velocity > 0 and cat.radial_velocity != mt.radial_velocity
    ORDER by cat.ra"""
mytable2 = heasarc.search(query, uploads={'mytable': mytable})
mytable2.to_table()

Table length=9

tap_upload_1764635367875_mytable_ra	tap_upload_1764635367875_mytable_dec	zcat_ra	zcat_dec	radial_velocity	zcat_morph_type	zcat_bmag
		deg	deg
float64	float64	float64	float64	int32	int16	float32
146.70334	22.01827	146.70334	22.01827	7597	-2	15.00
146.70334	22.01827	146.70334	22.01827	7446	-1	14.60
175.0394	15.32725	175.05522	15.34280	3299	3	13.10
191.5474	30.72338	191.54199	30.73227	6651	20	15.00
191.54199	30.73227	191.54740	30.72338	6517	-2	14.70
206.57164	43.85051	206.57996	43.84386	25864	--	18.50
206.57164	43.85051	206.59810	43.87223	2420	4	12.72
219.9665	42.74234	220.03854	42.77895	2545	7	14.00
333.8217	37.29922	333.84684	37.28261	5984	-3	15.06

Therefore, by adding new information to our original data table, we could cross-correlate with the TAP. We find that, in our candidate list, there is one true pair of galaxies.

4. Synchronous versus asynchronous queries#

There is one technical detail about TAP queries that you will need to know. In the code cells above, there are two commands for sending the query, one of which is commented out. This is because, with the TAP, there are two ways to send such queries. The default when you use the search() method is to us a synchronous query, which means that the query is sent and the client waits for the response. For large and complicated queries, this may time out, or you may want to run several in parallel. So there are other options.

The method service.run_async() uses an asynchronous query, which means that the query is sent, and then (under the hood without you needing to do anything), the method checks for a response. From your point of view, these methods look the same; PyVO is doing different things under the hood, but the method will not return until it has your result.

You need to know about these two methods for a couple of reasons. First, some services will limit synchronous queries, i.e. they will not necessarily return all the results if there are too many of them. An asynchronous query should have no such restrictions. In the case of the HEASARC service that we use above, it does not matter, but you should be aware of this and be in the habit of using the asynchronous queries for complete results after an initial interactive exploration.

The second reason to be aware of this is that asynchronous queries may be queued by the service, and they can take a lot longer if the service is very busy or the job is very large. (The synchronous option in this case may either time out, or it may return quickly but with incomplete results.)

For very large queries, or for submitting queries in parallel, you may wish to use the submit_job(), wait(), and fetch_results() methods to avoid locking up your Python session. This is described in the pyvo documentation.

Catalog Queries

Contents