Suppose you want to do something using a column that you expect to find in a bunch of different tables, like coordinates and time. It's a good bet that many if not most of the tables have coordinate columns, but there's no rule about what they have to be named.
When doing detailed catalog queries with the TAP, you can obviously examine the columns of every table you're interested in to find the columns you want. Then you can hard-code the correct ones into each query for each table and service.
Or, you can also search for keywords like "ra" or "ascension" in the columns and their descriptions to get the columns you want automatically that way.
But is there are more generic way? Unified Content Descriptors (UCDs) are a VO standard that allows table publishers to name their columns whatever they (or their contributors) want but to identify those that contain standard sorts of data. For example, the RA column could be called "RA", "ra", "Right_Ascension", etc. But in all cases, a VO service can label the column with its UCD, which is "pos.eq.ra". This information is not part of the table but part of the meta-data that the service may provide with that data. Though not required of all VO services, UCDs are commonly provided precisely to make such tasks as identifying the columns of interest easier to automate.
This is easiest to show by example.
# Generic VO access routines
import pyvo as vo
# Ignore unimportant warnings
import warnings
warnings.filterwarnings('ignore', '.*Unknown element .*', vo.utils.xml.elements.UnknownElementWarning)
Let's look at some tables in a little more detail. Let's find the Hubble Source Catalog version 3 (HSCv3), assuming there's only one at MAST.
services = vo.regsearch(servicetype='tap', keywords=['mast'])
hsc=[s for s in services if 'HSCv3' in s.res_title][0]
print(f'Title: {hsc.res_title}')
print(f'{hsc.res_description}')
Title: MAST STScI Hubble Source Catalog Version 3 (HSCv3) TAP service The MAST Archive at STScI TAP end point for the Hubble Source Catalog, version 3.1. The Hubble Source Catalog (HSC) is designed to optimize science from the Hubble Space Telescope by combining the tens of thousands of visit-based source lists in the Hubble Legacy Archive (HLA) into a single master catalog. The Hubble Source Catalog version 3.1 also provides proper motions for over 400,00 stars in the augmented Sagittarius Window Eclipsing Extrasolar Planet Search (SWEEPS) HST field, and the Hubble Catalog of Variables (HCV). The Table Access Protocol (TAP) lets you execute queries against our database tables, and inspect various metadata. Upload is not currently supported.
Now let's see what tables are provided by this service for HSCv3. Note that this is another query to the service:
tables = hsc.service.tables # Queries for details of the service's tables
print(f'{len(tables)} tables:')
for t in tables:
print(f'{t.name:30s} - {t.description}\n----') # A more succinct option than t.describe()
31 tables: dbo.Catalog_ACS_SourceExtractor - This table contains the Source Extractor information for each single filter, detected, ACS source. Source Extractor is described <a href="http://adsabs.harvard.edu/abs/1996A%26AS..117..393B"><style="font-size:medium">here </a>. ---- dbo.Catalog_Image_MetaData - This table describes the properties of images and their source lists. The source lists are contained in tables Catalog_WFPC2_SourceExtractor, Catalog_WFC3_SourceExtractor, Catalog_WFPC2_SourceExtractor. ---- dbo.Catalog_WFC3_SourceExtractor - This table contains the Source Extractor information for each single filter, detected, WFC3 source. Source Extractor is described <a href="http://adsabs.harvard.edu/abs/1996A%26AS..117..393B"><style="font-size:medium">here </a>. ---- dbo.Catalog_WFPC2_SourceExtractor - This table contains the Source Extractor information for each single filter, detected, WFPC2 source. Source Extractor is described <a href="http://adsabs.harvard.edu/abs/1996A%26AS..117..393B"><style="font-size:medium">here </a>. ---- dbo.CloseMatch - For each match, this table contains a list of other matches that lie within 1 arcsecond of its position. ---- dbo.ClosestMatch - This table contains the nearest neighbor for each match in table SumPropMagAper2Cat, provided there is a neighbor that lies with 1 arcsecond of the match. This table is a subset of table CloseMatch. ---- dbo.DetailedCatalog - This table contains the properties of each source in the Hubble Source Catalog. ---- dbo.GroupMembers - This table lists the white light images that make up each group. ---- dbo.Groups - This table summarizes properties of the image group, a collection of white light images that each intersect with another image in the group. ---- dbo.HLAScience - This table describes the Hubble Legacy Archive images that make up the white light images. ---- dbo.ImageMembers - This table describes the Hubble Legacy Archive images that make up the white light images. ---- dbo.Images - This table summarizes properties of each white light image. ---- dbo.ProductMembers - This table provides the names and IDs of the single exposures that are used in each combined the single filter product image. The single filter product images are used by the Hubble Source Catalog to produce white light images for each visit. ---- dbo.SourceLinks - This table links the Hubble Source Catalog tables to the Hubble Legacy Archive Tables (HLAScience, Catalog_Image_Metadata, ACS_Catalog_SourceExtractor, WFPC2_Catalog_SourceExtractor, and WFC3_Catalog_SourceExtractor). For given white light source (SourceID) and single filter image (ImageName, SciID, or MemberID), there is a unique single filter source (ObjID). These Hubble Legacy archive tables provide additional information that is not in the Hubble Source Catalog tables. ---- dbo.Sources - This table provides properties of each white light source. ---- dbo.SourcePositionsView - This view contains positional information about the individual white light source detections of objects. The information is used to determine the astrometric parameters given in table AstromProperMotions. ---- dbo.SumMagAper2cat - This table provides Source Extractor magaper2 information for each match based on sources with valid Source Extractor aper2 magnitudes. The companion table SumPropMagAper2Cat contains other summary information for the corresponding match ---- dbo.ProperMotionsView - This table contains proper motion information for the SWEEPS field. The proper motion is determined by a median fit to the source positions given in table AstromSourcePositions as a function of time for right ascension and declination independently and Galactic latitude and longitude independently. The fit is performed using the medfit algorithm in Numerical Recipes (Press et al. 1993). For right ascension (RA) in degrees, the fit for an object has the form RA(t) = raMean + 3.6x10^6*pmRA/cos(Dec)*(t-epochMean)/365.25, where t is the time expressed as an MJD, epochMean is the mean MJD of the object detections, raMean is the RA at the mean epoch, and pmRA is the RA proper motion in mas/yr. The first step in the medfit algorithm involves a least squares fit. The RA standard errors (1 sigma) of the least squares fit for the position at the mean epoch and the proper motion are designated by raMeanErr and pmRAErr, respectively. They are obtained using equations 34 and 35 of MathWorld. For a given object, medfit minimizes the mean absolute deviation of the RA data values from the fit. These RA directed distance value are given by dRA in table AstromSourcePositions. The mean absolute deviation of the RA data from the fit is designated by pmRADev. Similar fits are made for declination and Galactic longitude and latitude. ---- dbo.SumMagAutoCat - This table provides Source Extractor magauto information for each match based on sources with valid Source Extractor auto magnitudes. The companion table SumPropMagAutoCat contains other summary information for the corresponding match ---- dbo.HCVdetailedView - This view summarizes detailed properties of each match in the Hubble Catalog of Variables (HCV). This table provides source properties for each match, filter, and epoch of observation. ---- dbo.SumPropMagAper2Cat - This table summarizes properties of each match based on sources with valid Source Extractor aper2 magnitudes. The companion table SumMagAper2Cat contains the aper2 magnitude information for the match. ---- dbo.SumPropMagAutoCat - This table summarizes properties of each match based on sources with valid Source Extractor auto magnitudes. The companion table SumMagAutoCat contains the auto magnitude information for the match. ---- dbo.WFPC2PCFrac - This table lists for each WFPC2 white light source the fraction of exposure time that was covered by the PC chip. ---- dbo.XMatchV2 - For each match, this table contains a list of possible corresponding matches from Version 1 of the HSC. ---- dbo.SumMagAper2CatView - This table provides Source Extractor magaper2 information for each match based on sources with valid Source Extractor aper2 magnitudes. The companion table SumMagAper2CatViewView contains other summary information for the corresponding match ---- dbo.HCVsummaryView - This table summarizes source properties for a match and filter in the Hubble Catalog of Variables (HCV). ---- tap_schema.schemas - None ---- tap_schema.tables - None ---- tap_schema.columns - None ---- tap_schema.keys - None ---- tap_schema.key_columns - None ----
Let's look at the first 10 columns of the DetailedCatalog table. Again, note that calling the columns attribute sends another query to the service to ask for the columns.
columns=tables['dbo.DetailedCatalog'].columns
for c in columns:
print(f'{f"{c.name} [{c.ucd}]":30s} - {c.description}')
MatchRA [pos.eq.ra] - right ascension coordinate of the match position MatchDec [pos.eq.dec] - declination coordinate of the match position SourceRA [None] - right ascension coordinate of the source position SourceDec [None] - declination coordinate of the match position D [None] - offset distance of source from match position DSigma [None] - standard deviation of source positions from match position AbsCorr [None] - Y/N indicator of whether the astrometric correction included alignment with a standard catalog XImage [None] - x position of source in image coordinates YImage [None] - y position of source in image coordinates ImageName [None] - Hubble Legacy Archive image name Instrument [meta.id;instr] - instrument name Mode [None] - observation mode Detector [meta.id;instr.det] - detector Aperture [None] - aperture ExposureTime [None] - exposure time StartTime [time.start] - earliest start time of exposures in image StopTime [time.end] - latest stop time of exposures in image StartMJD [time.start] - modified Julian date (MJD) for earliest start time of exposures in image StopMJD [time.end] - modified Julian date (MJD) for latest start time of exposures in image WaveLength [None] - central wavelength of filter Filter [meta.id;instr.filter] - filter TargetName [meta.id;src] - target name FluxAper2 [phot.count] - aper2 flux MagAper2 [phot.mag] - aper2 magnitude MagAuto [phot.mag] - auto magnitude PropID [None] - HST proposal ID CI [None] - normalized concentration index KronRadius [None] - Kron radius Flags [None] - bit encoded representation of source properties: 0 for point source, 1 for extended source, 4 for saturated source HTMID [None] - hierarchical triangular index (HTM) identifier X [None] - x coordinate of source on unit celestial sphere Y [None] - y coordinate of source on unit celestial sphere Z [None] - z coordinate of source in unit celestial sphere CatID [meta.id] - unique row identifier MatchID [None] - match identifier MemID [None] - source number within match SourceID [None] - white light source identifier ImageID [None] - white light image identified Det [None] - Y/N indicator of whether source was detected in the specified filter
The PyVO method to get the columns will automatically fetch all the meta-data about those columns. It's up to the service provider to set them correctly, of course, but in this case, we see that the column named "MatchRA" is identified with the UCD "pos.eq.ra".
So if we did not know the exact name used in HSCv3 for the RA, we could do something like this looking for the string "RA":
ra_name=[c.name for c in columns if 'RA' in c.name or "ascension" in c.name.lower()]
print(ra_name)
['MatchRA', 'SourceRA']
But a more general approach is to check for the correct UCD. It also has the further advantage that it can be used to label columns that should be used for certain purposes when there are multiple possibilities. For instance, this table has MatchRA and SourceRA. Let's check the UCD:
(Note that the UCD is not required. If it isn't there, you get a None type, so code the check carefully)
ra_name=[c.name for c in columns if c.ucd and 'pos.eq.ra' in c.ucd][0]
dec_name=[c.name for c in columns if c.ucd and 'pos.eq.dec' in c.ucd][0]
ra_name,dec_name
('MatchRA', 'MatchDec')
What that shows you is that though there are two columns in this table that give RA information, only one has the 'pos.eq.ra' UCD. The documentation for this ought to explain the usage of these columns, and the UCD should not be used as a substitute for understanding the table. But it can be a useful tool.
In particular, you can use the UCDs to look for catalogs that might have the information you're interested in. Then you can code the same query to work for different tables (with different column names) in a loop. This sends a bunch of queries but doesn't take too long, a minute maybe. (One is particularly slow.)
# Look for all TAP services with x-ray and optical data
collection={}
for s in vo.regsearch(servicetype='tap',keywords=['x-ray','optical']):
if "wfau" in s.ivoid: continue # These sometimes have issues
print(f"Looking at service from {s.ivoid}")
try:
tables=s.service.tables
except:
print("Problem with this service's tables endpoint. Continuing to next.")
continue
# Find all the tables that have an RA,DEC and a start and end time
for t in tables:
names={}
for ucd in ['pos.eq.ra','pos.eq.dec','time.start','time.end']:
cols=[c.name for c in t.columns if c.ucd and ucd in c.ucd]
if len(cols) > 0:
names[ucd]=cols[0] # use the first that matches
if len(names.keys()) == 4:
print(f" Table {t.name} has the right columns. Counting rows matching my time.")
# For a first look, a very simple query counting rows in a
# time range of interest:
query=f"select count({names['time.start']}) from {t.name}" \
f" where {names['time.start']} > 52000 "
try:
results=s.search(query)
except:
print("Problem executing query. Continuing to next.")
continue
# For this simple query, the result is a single number, the count.
# But different services might name the result differently, so
# don't assume you know the column name.
print(" Found {} results from {}\n".format(results.to_table()[0][0],t.name))
# If the query above asked for the matching data rather than the
# count, you might want to collect the results.
# Careful: here we're assuming the table names are unique
collection[t.name]=results
Looking at service from ivo://cefca/j-plus/j-plus-dr1 Table ivoa.ObsCore has the right columns. Counting rows matching my time. Found 6132 results from ivoa.ObsCore Looking at service from ivo://cefca/j-plus/j-plus-dr2 Table ivoa.ObsCore has the right columns. Counting rows matching my time. Found 55363 results from ivoa.ObsCore Looking at service from ivo://cefca/minijpas/minij-pas-pdr201912 Table ivoa.ObsCore has the right columns. Counting rows matching my time. Found 240 results from ivoa.ObsCore Looking at service from ivo://cxc.harvard.edu/cda Looking at service from ivo://cxc.harvard.edu/csc Looking at service from ivo://cxc.harvard.edu/cscr1 Table csc1.obi_source has the right columns. Counting rows matching my time. Problem executing query. Continuing to next. Looking at service from ivo://cxc.harvard.edu/cscr2 Looking at service from ivo://esavo/xmm/tap Looking at service from ivo://eso.org/tap_cat Table XQ_100_summary_fits_V1 has the right columns. Counting rows matching my time. Found 100 results from XQ_100_summary_fits_V1 Looking at service from ivo://eso.org/tap_obs Table dbo.raw has the right columns. Counting rows matching my time. Problem executing query. Continuing to next. Table dbo.ssa has the right columns. Counting rows matching my time. Found 1865883 results from dbo.ssa Table ivoa.ObsCore has the right columns. Counting rows matching my time. Found 3282377 results from ivoa.ObsCore
You can also use UCDs to look at the results. Above, we collected just the first 10 rows of the four columns we're interested in from every catalog that had them. But these tables still have their original column names. So the UCDs will still be useful, and PyVO provides a simple routine to convert from UCD to column (field) name.
Note, however, that returning the UCDs as part of the result is not mandatory, and some services do not do it. So you'll have to check.
Now we have a collection of rows from different tables with different columns. In the results object, we have access to a fieldname_with_ucd() function to get the column you want. Supposing we hadn't already looked for this in the above loop, let's now find out which of these tables has a magnitude column:
#ucd='pos.eq.ra'
ucd='phot.mag'
for tname,results in collection.items():
#print(f"On table {tname}")
# Sometimes this doesn't work well, so use a try:
try:
name=results.fieldname_with_ucd(ucd)
except:
pass
if name:
print(f" Table {tname} has the {ucd} column named {name}")
else:
print(f" (Table {tname} didn't find the UCD.)")
(Table ivoa.ObsCore didn't find the UCD.) (Table XQ_100_summary_fits_V1 didn't find the UCD.) (Table dbo.ssa didn't find the UCD.)
Lastly, if you have a table of results from a TAP query (and if that service includes the UCDs), then you can get data based on UCDs with the getbyucd() method, which simply gets the corresponding element using fieldname_with_ucd():
results=hsc.service.search("select top 10 * from dbo.DetailedCatalog")
[r.getbyucd('phot.mag') for r in results]
[21.94580078125, 22.5403995513916, 22.0459003448486, 19.3535995483398, 22.0848999023438, 22.1319999694824, 18.7014007568359, 21.6016998291016, 21.2751007080078, 22.0634002685547]
Note that we can see earlier in this notebook, when we looked at this table's contents, that there are two phot.mag fields in this table, MagAper2 and MagAuto. The getbyucd() and fieldname_with_ucd() routines do not currently allow you to handle multiple columns with the same UCD. The code can help you find what you want, but it depends on the meta data the service defines, and you still must look at the detailed information for each catalog you use to understand what it contains.