Target selection

Target selection is the process of choosing objects from the pool of unique targets in catalogdb.catalog generated by cross-matching. The targets are selected according to one or more target classes or “cartons” that have a common astronomical motivation. The selected targets are loaded into targetdb along with their astrometric information and some additional parameters such as magnitudes, observing cadences, etc. This determines the list of all possible objects that could be observed by SDSS-V, and is used by robostrategy to define the fields to be observed.

Definitions

For consistency, we define the following concepts:

  • Carton: a set of targets selected algorithmically from parent catalogues in catalogdb. The carton is identified by a name in the form <mapper>_<program>_<carton>, for example mwm_halo_bb. Sometimes the program is not included: mwm_100pc.

  • Program: a higher level grouping of cartons with a common goal. For example the Halo program includes the mwm_halo_bb and mwm_halo_sm cartons.

  • Cadence: a string that determines the conditions in which a certain targets will be observed, including lunation rules, instruments to be used, number of observations, and separation between different observations. For example, mwm_ob_3x1. Cadences can be assigned to individual targets or globally to all the targets in a carton.

  • Priority: a numerical value indicating the global priority of a given target. Lower numbers indicate higher priority.

  • Mapper: the mapper leading a carton, either MWM or BHM.

targetdb schema

targetdb stores the results of target selection and is synced with the observatories on a regular basis (at least once after each run of target selection). The schema has the following tables and relationships

https://github.com/sdss/sdssdb/raw/main/schema/sdss5db/targetdb/sdss5db.targetdb.png

The main table, target, contains all the targets selected from catalogdb along with their astrometric information and provides the link to catalodb.catalog via catalogid. Photometric information is stored in the magnitude table for each target.

The program table contains the target selection “cartons” associated with their leading survey (MWM or BHM) and a category (science, standards, etc). Programs are version controlled and as in the case of cross-matching we refer to a run of target selection as a “plan” with an associated code “tag”. The table version stores this information; the two boolean columns target_selection and robostrategy indicate whether the plan refers to a target selection or robostrategy run. Note that a target selection plan version doesn’t have to be equal to its cross-matching one, and neither do their code tags (although, since cross-matching always happens before target selection it could not be that the tag version for target selection is lower than the tag version for the associated cross-matching plan). Also note that objects in the target table are not associated with a plan because catalogids are unique and fully define the associated target and its cross-match associations.

The allocation of targets to programs is done in the program_to_target table, which also assigns the cadence (frequency and number of observations) for a given target and program. Cadence labels and their parameters are defined externally to target_selection.

When robostrategy run it selects targets based on their cadence and program associations and generates designs in which each object is assigned an instrument and robotic fibre positioner. Each design is characterised by a field centre with an associated cadence, chosen to optimise the cadence requirements of the individual targets in the design.

Defining a carton

Cartons are groupings of astronomical objects that are selected together to achieve a common science goal. They are chosen from the pool of unique targets in catalogdb.catalog by applying a series of filter conditions on columns in the associated parent tables. An example of a carton is the Galactic Genesis, defined as

Selection of all the IR-bright, red stars. Select sources brighter than H<11 AND ((G-H) > 3.5 OR Gaia non-detection), where H are 2MASS magnitudes and G are Gaia DR2 magnitudes.

In this case we’d need to select all the targets (catalogids) that have an associated 2MASS match with \(H<11\) if \((G-H)>3.5\) or if the target doesn’t have a Gaia match. This can be achieved with the following SQL query taking advantage of the fact that the TIC is complete in both 2MASS and Gaia DR2.

SELECT c.catalogid FROM twomass_psc tm
    LEFT OUTER JOIN tic_v8 tic ON tic.twomass_psc = tm.designation
    INNER JOIN catalog_to_tic_v8 ctt ON ctt.target_id = tic.id
    INNER JOIN catalog c USING (catalogid)
WHERE tm.h_m < 11 AND
    ((tm.h_m - tic.gmag) > 3.5 OR tic.gaia IS NULL) AND
    c.version_id = 13;

Here we assume we are using the cross-match corresponding to the plan with catalog.version.id = 13. Using Pewee and sdssdb this can be written as

from sdssdb.peewee.sdss5db.catalogdb import (Catalog, CatalogToTIC_v8,
                                             TIC_v8, TwoMassPSC)

gg = (TwoMassPSC
      .select(Catalog.catalogid)
      .join(TIC_v8, 'LEFT_OUTER')
      .join(CatalogToTIC_v8)
      .join(Catalog)
      .where(TwoMassPSC.h_m < 11,
             (((TwoMassPSC.h_m - TIC_v8.gmag) > 3.5) | TIC_v8.gaia >> None),
             Catalog.version_id == 13))

target_selection provides all the additional boilerplate to evaluate this query using catalogdb, retrieve the results, and load them into targetdb along with their associated metadata (magnitudes, cadences, etc).

Writing the query

Cartons are implemented as subclasses of BaseCarton. BaseCarton is an abstract class, which means that is not intended to be used directly and must be subclassed with some of its methods overridden.

The main method that needs overloading is build_query, which receives the version of cross-matching to use and must return a Peewee Select or ModelSelect object. We must also define the name, program, and category for the carton. A full implementation for the Galactic Genesis carton would look like

from sdssdb.peewee.sdss5db.catalogdb import (Catalog, CatalogToTIC_v8,
                                             TIC_v8, TwoMassPSC)

from . import BaseCarton

class GalacticGenesisCarton(BaseCarton):

    name = 'galactic_genesis'
    category = 'science'
    program = 'Galactic Genesis'
    mapper = 'MWM'

    def build_query(self, version_id):

        gg = (TwoMassPSC
              .select(Catalog.catalogid)
              .join(TIC_v8, 'LEFT_OUTER')
              .join(CatalogToTIC_v8)
              .join(Catalog)
              .where(TwoMassPSC.h_m < 11,
                     (((TwoMassPSC.h_m - TIC_v8.gmag) > 3.5) | TIC_v8.gaia >> None),
                     Catalog.version_id == version_id))

        return gg

That’s about it. The file containing this code must be placed in the cartons directory of target_selection from where it will be automatically imported. The query must return the catalogid for the selected objects, along with any other column that we want to use for post-processing.

We haven’t defined the cadence associated with the carton. We can do that by overloading the cadence attribute (which defaults to None) in the carton class or later in post-processing.

The configuration file

If we try to instantiate the class GalacticGenesisCarton it will raise an error because the carton cannot be found in the configuration file. The file at python/target_selection/config/target_selection.yml stores the general parameters for target selection and the values for specific cartons.

'0.1.0':
    xmatch_plan: 0.1.0
    cartons:
        - galactic_genesis
    schema: sandbox
    parameters:
        galactic_genesis:
            h_max: 11
            h_g: 3.5
    magnitudes:
        g: [catalog_to_sdss_dr13_photoobj_primary, sdss_dr13_photoobj.psfmag_g]
        r: [catalog_to_sdss_dr13_photoobj_primary, sdss_dr13_photoobj.psfmag_r]
        i: [catalog_to_sdss_dr13_photoobj_primary, sdss_dr13_photoobj.psfmag_i]
        h: [catalog_to_tic_v8, tic_v8, twomass_psc.h_m]
        bp: [catalog_to_tic_v8, tic_v8, gaia_dr2_source.phot_bp_mean_mag]
        rp: [catalog_to_tic_v8, tic_v8, gaia_dr2_source.phot_rp_mean_mag]

Here target selection plan 0.1.0 is associated with cross-matching 0.1.0-beta.1 and we define a single carton for it, galactic_genesis. We also specify the parameters for that carton. The parameters section for a given carton is accessible in BaseCarton as self.parameters. With this we can avoid hardcoding values in the query and rewrite it as

gg = (TwoMassPSC
      .select(Catalog.catalogid)
      .join(TIC_v8, 'LEFT_OUTER')
      .join(CatalogToTIC_v8)
      .join(Catalog)
      .where(TwoMassPSC.h_m < self.parameters['h_max'],
             (((TwoMassPSC.h_m - TIC_v8.gmag) > self.parameters['h_g']) | TIC_v8.gaia >> None),
             Catalog.version_id == version_id))

The magnitudes section indicates the joins needed to load the targetdb.magnitude table. For each column in the table the mapping indicates the tables that need to be joined, starting at catalog; the last entry also includes the column to grab. For example, for the h magnitude the configuration file indicates that we need to join catalog with twomass_psc via catalog_to_tic_v8 and tic_v8 and the insert the value from the column h_m.

As with cross-matching, it’s possible to locally override the default database configuration to increase the work memory or optimise queries. The database parameters must be defined for a plan inside the configuration file, for example

'0.1.0':
    xmatch_plan: 0.1.0
    cartons:
        - galactic_genesis
    database_options:
        work_mem: '2GB'
        temp_buffers: '2GB'

The custom parameters are applying within the transactions used to execute run, post_process, and load.

Another possibility is to override the setup_transaction method completely for the carton implementation. This method prepares the transactions used to run and load the carton. To set random_page_cost=0.1 for a given carton we can do

def setup_transaction(self):

    self.database.execute_sql('SET LOCAL random_page_cost = 0.1;')

Note that if setup_transaction is overridden, the database_options configuration is ignored for that carton.

Custom magnitudes

We have just seen how the magnitudes for a target are obtained from parent tables in catalogdb. Sometimes this is not possible, for example because the object does not an associated target in 2MASS and we cannot retrieve the H magnitude. To avoid this we can have the query return a proxy for a magnitude

gg = (TwoMassPSC
      .select(Catalog.catalogid,
              CatWISE.w1mag.alias('h'))
      .join(TIC_v8, 'LEFT_OUTER')
      .join(CatalogToTIC_v8)
      .join(Catalog)
      .join(CatalogToCatWISE)
      .join(CatWISE)
      .where(TwoMassPSC.h_m < self.parameters['h_max'],
             (((TwoMassPSC.h_m - TIC_v8.gmag) > self.parameters['h_g']) | TIC_v8.gaia >> None),
             Catalog.version_id == version_id))

In this query we are returning the CatWISE W1 magnitude aliased as column h (a very bad idea, but useful for the purposes of this example). If the column is present, target_selection will use it directly instead of trying to grab the h magnitude from catalogdb.

Post-processing

Calling run will execute the query and create a temporary table in the sandbox schema called temp_<carton_name> with its output (the catalogid column and any other columns we decided to return). Two extra columns are added if they have not been returned by the query: selected which is set to true, and cadence, set to null. The first one indicates whether the target must be selected and loaded into targetdb’ the second allows to set a cadence specific to that object. Note that setting both the carton cadence attribute and the cadence column is not allowed.

After the query is done the carton class calls post_process. By default that method doesn’t do anything but it can be overloaded to perform additional, non-SQL operations on the output table. A typical case is that a selection criteria is too complicated to encapsulate as SQL, or maybe it requires using an external file. We can define build_query to return a superset of the targets and use post_process to mask out the objects that do not meet the criteria by changing their selected value to false. We can also set the cadence column the same way, or add new magnitude columns based on other existing columns. post_process receives a Peewee model of the temporary table generated using reflection and doesn’t return anything: all operations must be done in place on the table.

Restricting the query

For test purposes it’s useful to be able to run the query on a small region on the sky. This can be accomplished by defining the carton class and overriding the query_region attribute or by calling run and passing a query_region argument. In either case query_region must be a tuple in the form (ra_centre, dec_centre, radius), in degrees. Only targets within that radial region will be included in the output.

The most efficient way to implement the radial query is to do it explicitely when writing the query. If we define build_query with the keyword argument query_region in its signature, run will pass the parameter, at which point the query can implement it in the most optimal way possible.

Let’s rewrite our Galactic Genesis example with a radial query option

import peewee

from sdssdb.peewee.sdss5db.catalogdb import (Catalog, CatalogToTIC_v8,
                                             TIC_v8, TwoMassPSC)

from . import BaseCarton

class GalacticGenesisCarton(BaseCarton):

    name = 'galactic_genesis'
    category = 'science'
    program = 'Galactic Genesis'
    mapper = 'MWM'

    def build_query(self, version_id, query_region=None):

        gg = (TwoMassPSC
              .select(Catalog.catalogid)
              .join(TIC_v8, 'LEFT_OUTER')
              .join(CatalogToTIC_v8)
              .join(Catalog)
              .where(TwoMassPSC.h_m < 11,
                     (((TwoMassPSC.h_m - TIC_v8.gmag) > 3.5) | TIC_v8.gaia >> None),
                     Catalog.version_id == version_id))

        if query_region:
            gg = gg.where(peewee.fn.q3c_radial_query(Catalog.ra, Catalog.dec,
                                                     query_region[0],
                                                     query_region[1],
                                                     query_region[2]))

        return gg

If we don’t implement the region condition explicitely, run will add it by converting the main query into a subquery and joining with the catalog table. Depending on the query this may result in very poor performance (the results could be restricted to the radial region only after the query has run on the whole sky). It’s recommended to implement query_region in build_query.

Writing results to a file

For QA purposes it’s useful to be able to write the result of running the carton query to a file. The method write_table allows to do that. It must be called after run has been invoked and writes the temporary table to a gzip’d FITS file with all the columns returned by the query and modified in post-processing (including selected and cadence).

>>> carton.write_table()
<Table masked=True length=5459267>
catalogid selected cadence        ra        ... cc_flg rd_flg gal_contam
int64     bool   float64     float64      ...  str3   str3    int64
--------- -------- ------- ---------------- ... ------ ------ ----------
565437025     True     nan 314.977316146069 ...    000    222          0
757228297     True     nan 315.101352793936 ...    000    111          0
757228365     True     nan 315.115427475258 ...    000    111          0
757228396     True     nan 315.124831714726 ...    000    222          0
757654361     True     nan   315.3227486863 ...    000    222          0
565379603     True     nan 314.673172098268 ...    000    111          0
      ...      ...     ...              ... ...    ...    ...        ...
476312142     True     nan        41.850288 ...    000    222          0
649921691     True     nan        44.536546 ...    000    222          0
476311240     True     nan 41.5986734774755 ...    000    222          0

After load has run and the carton has been ingested into targetdb, it’s possible to call write_table with mode='targetdb'. This will write a selection of the targetdb columns for the carton (catalogid, astrometric coordinates, cadence, magnitudes)

carton.write('carton_loaded-0.2.3.fits.gz', mode='targetdb')

Running target selection

Once the carton is fully implemented we can execute the query, post-process, and load the data into targetdb by doing

from target_selection.cartons import GalacticGenesisCarton
gg = GalacticGenesisCarton('0.1.0')
gg.run()
gg.load()

While we could do this for each carton in the target selection run, it’s easier to use the command line interface by doing

target_selection --user sdss run "0.1.0"

This will select all the cartons for the target selection plan 0.1.0 and run and load them in order.