Target selection¶
Target selection is the process of choosing objects from the pool of unique targets in catalogdb.catalog
generated by cross-matching. The targets are selected according to one or more target classes or “cartons” that have a common astronomical motivation. The selected targets are loaded into targetdb
along with their astrometric information and some additional parameters such as magnitudes, observing cadences, etc. This determines the list of all possible objects that could be observed by SDSS-V, and is used by robostrategy to define the fields to be observed.
Definitions¶
For consistency, we define the following concepts:
Carton: a set of targets selected algorithmically from parent catalogues in
catalogdb
. The carton is identified by a name in the form<mapper>_<program>_<carton>
, for examplemwm_halo_bb
. Sometimes the program is not included:mwm_100pc
.Program: a higher level grouping of cartons with a common goal. For example the
Halo
program includes themwm_halo_bb
andmwm_halo_sm
cartons.Cadence: a string that determines the conditions in which a certain targets will be observed, including lunation rules, instruments to be used, number of observations, and separation between different observations. For example,
mwm_ob_3x1
. Cadences can be assigned to individual targets or globally to all the targets in a carton.Priority: a numerical value indicating the global priority of a given target. Lower numbers indicate higher priority.
Mapper: the mapper leading a carton, either
MWM
orBHM
.
targetdb
schema¶
targetdb
stores the results of target selection and is synced with the observatories on a regular basis (at least once after each run of target selection). The schema has the following tables and relationships
The main table, target
, contains all the targets selected from catalogdb
along with their astrometric information and provides the link to catalodb.catalog
via catalogid
. Photometric information is stored in the magnitude
table for each target.
The program
table contains the target selection “cartons” associated with their leading survey (MWM or BHM) and a category
(science, standards, etc). Programs are version controlled and as in the case of cross-matching we refer to a run of target selection as a “plan” with an associated code “tag”. The table version
stores this information; the two boolean columns target_selection
and robostrategy
indicate whether the plan refers to a target selection or robostrategy run. Note that a target selection plan version doesn’t have to be equal to its cross-matching one, and neither do their code tags (although, since cross-matching always happens before target selection it could not be that the tag version for target selection is lower than the tag version for the associated cross-matching plan). Also note that objects in the target
table are not associated with a plan because catalogids are unique and fully define the associated target and its cross-match associations.
The allocation of targets to programs is done in the program_to_target
table, which also assigns the cadence (frequency and number of observations) for a given target and program. Cadence labels and their parameters are defined externally to target_selection
.
When robostrategy
run it selects targets based on their cadence and program associations and generates designs
in which each object is assigned an instrument and robotic fibre positioner. Each design is characterised by a field
centre with an associated cadence, chosen to optimise the cadence requirements of the individual targets in the design.
Defining a carton¶
Cartons are groupings of astronomical objects that are selected together to achieve a common science goal. They are chosen from the pool of unique targets in catalogdb.catalog
by applying a series of filter conditions on columns in the associated parent tables. An example of a carton is the Galactic Genesis, defined as
Selection of all the IR-bright, red stars. Select sources brighter than H<11 AND ((G-H) > 3.5 OR Gaia non-detection), where H are 2MASS magnitudes and G are Gaia DR2 magnitudes.
In this case we’d need to select all the targets (catalogids) that have an associated 2MASS match with \(H<11\) if \((G-H)>3.5\) or if the target doesn’t have a Gaia match. This can be achieved with the following SQL query taking advantage of the fact that the TIC is complete in both 2MASS and Gaia DR2.
SELECT c.catalogid FROM twomass_psc tm
LEFT OUTER JOIN tic_v8 tic ON tic.twomass_psc = tm.designation
INNER JOIN catalog_to_tic_v8 ctt ON ctt.target_id = tic.id
INNER JOIN catalog c USING (catalogid)
WHERE tm.h_m < 11 AND
((tm.h_m - tic.gmag) > 3.5 OR tic.gaia IS NULL) AND
c.version_id = 13;
Here we assume we are using the cross-match corresponding to the plan with catalog.version.id = 13
. Using Pewee and sdssdb this can be written as
from sdssdb.peewee.sdss5db.catalogdb import (Catalog, CatalogToTIC_v8,
TIC_v8, TwoMassPSC)
gg = (TwoMassPSC
.select(Catalog.catalogid)
.join(TIC_v8, 'LEFT_OUTER')
.join(CatalogToTIC_v8)
.join(Catalog)
.where(TwoMassPSC.h_m < 11,
(((TwoMassPSC.h_m - TIC_v8.gmag) > 3.5) | TIC_v8.gaia >> None),
Catalog.version_id == 13))
target_selection
provides all the additional boilerplate to evaluate this query using catalogdb
, retrieve the results, and load them into targetdb
along with their associated metadata (magnitudes, cadences, etc).
Writing the query¶
Cartons are implemented as subclasses of BaseCarton
. BaseCarton
is an abstract class, which means that is not intended to be used directly and must be subclassed with some of its methods overridden.
The main method that needs overloading is build_query
, which receives the version of cross-matching to use and must return a Peewee Select
or ModelSelect
object. We must also define the name
, program
, and category
for the carton. A full implementation for the Galactic Genesis carton would look like
from sdssdb.peewee.sdss5db.catalogdb import (Catalog, CatalogToTIC_v8,
TIC_v8, TwoMassPSC)
from . import BaseCarton
class GalacticGenesisCarton(BaseCarton):
name = 'galactic_genesis'
category = 'science'
program = 'Galactic Genesis'
mapper = 'MWM'
def build_query(self, version_id):
gg = (TwoMassPSC
.select(Catalog.catalogid)
.join(TIC_v8, 'LEFT_OUTER')
.join(CatalogToTIC_v8)
.join(Catalog)
.where(TwoMassPSC.h_m < 11,
(((TwoMassPSC.h_m - TIC_v8.gmag) > 3.5) | TIC_v8.gaia >> None),
Catalog.version_id == version_id))
return gg
That’s about it. The file containing this code must be placed in the cartons
directory of target_selection
from where it will be automatically imported. The query must return the catalogid
for the selected objects, along with any other column that we want to use for post-processing.
We haven’t defined the cadence associated with the carton. We can do that by overloading the cadence
attribute (which defaults to None
) in the carton class or later in post-processing.
The configuration file¶
If we try to instantiate the class GalacticGenesisCarton
it will raise an error because the carton cannot be found in the configuration file. The file at python/target_selection/config/target_selection.yml
stores the general parameters for target selection and the values for specific cartons.
'0.1.0':
xmatch_plan: 0.1.0
cartons:
- galactic_genesis
schema: sandbox
parameters:
galactic_genesis:
h_max: 11
h_g: 3.5
magnitudes:
g: [catalog_to_sdss_dr13_photoobj_primary, sdss_dr13_photoobj.psfmag_g]
r: [catalog_to_sdss_dr13_photoobj_primary, sdss_dr13_photoobj.psfmag_r]
i: [catalog_to_sdss_dr13_photoobj_primary, sdss_dr13_photoobj.psfmag_i]
h: [catalog_to_tic_v8, tic_v8, twomass_psc.h_m]
bp: [catalog_to_tic_v8, tic_v8, gaia_dr2_source.phot_bp_mean_mag]
rp: [catalog_to_tic_v8, tic_v8, gaia_dr2_source.phot_rp_mean_mag]
Here target selection plan 0.1.0
is associated with cross-matching 0.1.0-beta.1
and we define a single carton for it, galactic_genesis
. We also specify the parameters for that carton. The parameters
section for a given carton is accessible in BaseCarton
as self.parameters
. With this we can avoid hardcoding values in the query and rewrite it as
gg = (TwoMassPSC
.select(Catalog.catalogid)
.join(TIC_v8, 'LEFT_OUTER')
.join(CatalogToTIC_v8)
.join(Catalog)
.where(TwoMassPSC.h_m < self.parameters['h_max'],
(((TwoMassPSC.h_m - TIC_v8.gmag) > self.parameters['h_g']) | TIC_v8.gaia >> None),
Catalog.version_id == version_id))
The magnitudes
section indicates the joins needed to load the targetdb.magnitude
table. For each column in the table the mapping indicates the tables that need to be joined, starting at catalog
; the last entry also includes the column to grab. For example, for the h
magnitude the configuration file indicates that we need to join catalog
with twomass_psc
via catalog_to_tic_v8
and tic_v8
and the insert the value from the column h_m
.
As with cross-matching, it’s possible to locally override the default database configuration to increase the work memory or optimise queries. The database parameters must be defined for a plan inside the configuration file, for example
'0.1.0':
xmatch_plan: 0.1.0
cartons:
- galactic_genesis
database_options:
work_mem: '2GB'
temp_buffers: '2GB'
The custom parameters are applying within the transactions used to execute run
, post_process
, and load
.
Another possibility is to override the setup_transaction
method completely for the carton implementation. This method prepares the transactions used to run and load the carton. To set random_page_cost=0.1
for a given carton we can do
def setup_transaction(self):
self.database.execute_sql('SET LOCAL random_page_cost = 0.1;')
Note that if setup_transaction
is overridden, the database_options
configuration is ignored for that carton.
Custom magnitudes¶
We have just seen how the magnitudes for a target are obtained from parent tables in catalogdb
. Sometimes this is not possible, for example because the object does not an associated target in 2MASS and we cannot retrieve the H magnitude. To avoid this we can have the query return a proxy for a magnitude
gg = (TwoMassPSC
.select(Catalog.catalogid,
CatWISE.w1mag.alias('h'))
.join(TIC_v8, 'LEFT_OUTER')
.join(CatalogToTIC_v8)
.join(Catalog)
.join(CatalogToCatWISE)
.join(CatWISE)
.where(TwoMassPSC.h_m < self.parameters['h_max'],
(((TwoMassPSC.h_m - TIC_v8.gmag) > self.parameters['h_g']) | TIC_v8.gaia >> None),
Catalog.version_id == version_id))
In this query we are returning the CatWISE W1 magnitude aliased as column h
(a very bad idea, but useful for the purposes of this example). If the column is present, target_selection
will use it directly instead of trying to grab the h
magnitude from catalogdb
.
Post-processing¶
Calling run
will execute the query and create a temporary table in the sandbox
schema called temp_<carton_name>
with its output (the catalogid
column and any other columns we decided to return). Two extra columns are added if they have not been returned by the query: selected
which is set to true
, and cadence
, set to null
. The first one indicates whether the target must be selected and loaded into targetdb
’ the second allows to set a cadence specific to that object. Note that setting both the carton cadence
attribute and the cadence
column is not allowed.
After the query is done the carton class calls post_process
. By default that method doesn’t do anything but it can be overloaded to perform additional, non-SQL operations on the output table. A typical case is that a selection criteria is too complicated to encapsulate as SQL, or maybe it requires using an external file. We can define build_query
to return a superset of the targets and use post_process
to mask out the objects that do not meet the criteria by changing their selected
value to false
. We can also set the cadence
column the same way, or add new magnitude columns based on other existing columns. post_process
receives a Peewee model of the temporary table generated using reflection and doesn’t return anything: all operations must be done in place on the table.
Restricting the query¶
For test purposes it’s useful to be able to run the query on a small region on the sky. This can be accomplished by defining the carton class and overriding the query_region
attribute or by calling run
and passing a query_region
argument. In either case query_region
must be a tuple in the form (ra_centre, dec_centre, radius)
, in degrees. Only targets within that radial region will be included in the output.
The most efficient way to implement the radial query is to do it explicitely when writing the query. If we define build_query
with the keyword argument query_region
in its signature, run
will pass the parameter, at which point the query can implement it in the most optimal way possible.
Let’s rewrite our Galactic Genesis example with a radial query option
import peewee
from sdssdb.peewee.sdss5db.catalogdb import (Catalog, CatalogToTIC_v8,
TIC_v8, TwoMassPSC)
from . import BaseCarton
class GalacticGenesisCarton(BaseCarton):
name = 'galactic_genesis'
category = 'science'
program = 'Galactic Genesis'
mapper = 'MWM'
def build_query(self, version_id, query_region=None):
gg = (TwoMassPSC
.select(Catalog.catalogid)
.join(TIC_v8, 'LEFT_OUTER')
.join(CatalogToTIC_v8)
.join(Catalog)
.where(TwoMassPSC.h_m < 11,
(((TwoMassPSC.h_m - TIC_v8.gmag) > 3.5) | TIC_v8.gaia >> None),
Catalog.version_id == version_id))
if query_region:
gg = gg.where(peewee.fn.q3c_radial_query(Catalog.ra, Catalog.dec,
query_region[0],
query_region[1],
query_region[2]))
return gg
If we don’t implement the region condition explicitely, run
will add it by converting the main query into a subquery and joining with the catalog
table. Depending on the query this may result in very poor performance (the results could be restricted to the radial region only after the query has run on the whole sky). It’s recommended to implement query_region
in build_query
.
Writing results to a file¶
For QA purposes it’s useful to be able to write the result of running the carton query to a file. The method write_table
allows to do that. It must be called after run
has been invoked and writes the temporary table to a gzip’d FITS file with all the columns returned by the query and modified in post-processing (including selected
and cadence
).
>>> carton.write_table()
<Table masked=True length=5459267>
catalogid selected cadence ra ... cc_flg rd_flg gal_contam
int64 bool float64 float64 ... str3 str3 int64
--------- -------- ------- ---------------- ... ------ ------ ----------
565437025 True nan 314.977316146069 ... 000 222 0
757228297 True nan 315.101352793936 ... 000 111 0
757228365 True nan 315.115427475258 ... 000 111 0
757228396 True nan 315.124831714726 ... 000 222 0
757654361 True nan 315.3227486863 ... 000 222 0
565379603 True nan 314.673172098268 ... 000 111 0
... ... ... ... ... ... ... ...
476312142 True nan 41.850288 ... 000 222 0
649921691 True nan 44.536546 ... 000 222 0
476311240 True nan 41.5986734774755 ... 000 222 0
After load
has run and the carton has been ingested into targetdb
, it’s possible to call write_table
with mode='targetdb'
. This will write a selection of the targetdb
columns for the carton (catalogid, astrometric coordinates, cadence, magnitudes)
carton.write('carton_loaded-0.2.3.fits.gz', mode='targetdb')
Running target selection¶
Once the carton is fully implemented we can execute the query, post-process, and load the data into targetdb
by doing
from target_selection.cartons import GalacticGenesisCarton
gg = GalacticGenesisCarton('0.1.0')
gg.run()
gg.load()
While we could do this for each carton in the target selection run, it’s easier to use the command line interface by doing
target_selection --user sdss run "0.1.0"
This will select all the cartons for the target selection plan 0.1.0
and run and load them in order.