.. _target-selection: Target selection ================ Target selection is the process of choosing objects from the pool of unique targets in ``catalogdb.catalog`` generated by cross-matching. The targets are selected according to one or more target classes or "cartons" that have a common astronomical motivation. The selected targets are loaded into ``targetdb`` along with their astrometric information and some additional parameters such as magnitudes, observing cadences, etc. This determines the list of all possible objects that *could* be observed by SDSS-V, and is used by `robostrategy `__ to define the fields to be observed. Definitions ----------- For consistency, we define the following concepts: - **Carton:** a set of targets selected algorithmically from parent catalogues in ``catalogdb``. The carton is identified by a name in the form ``__``, for example ``mwm_halo_bb``. Sometimes the program is not included: ``mwm_100pc``. - **Program:** a higher level grouping of cartons with a common goal. For example the ``Halo`` program includes the ``mwm_halo_bb`` and ``mwm_halo_sm`` cartons. - **Cadence:** a string that determines the conditions in which a certain targets will be observed, including lunation rules, instruments to be used, number of observations, and separation between different observations. For example, ``mwm_ob_3x1``. Cadences can be assigned to individual targets or globally to all the targets in a carton. - **Priority:** a numerical value indicating the global priority of a given target. Lower numbers indicate higher priority. - **Mapper:** the mapper leading a carton, either ``MWM`` or ``BHM``. ``targetdb`` schema ------------------- ``targetdb`` stores the results of target selection and is synced with the observatories on a regular basis (at least once after each run of target selection). The schema has the following tables and relationships .. image:: https://github.com/sdss/sdssdb/raw/main/schema/sdss5db/targetdb/sdss5db.targetdb.png :target: https://github.com/sdss/sdssdb/raw/main/schema/sdss5db/targetdb/sdss5db.targetdb.png :align: center The main table, ``target``, contains all the targets selected from ``catalogdb`` along with their astrometric information and provides the link to ``catalodb.catalog`` via ``catalogid``. Photometric information is stored in the ``magnitude`` table for each target. The ``program`` table contains the target selection "cartons" associated with their leading survey (MWM or BHM) and a ``category`` (science, standards, etc). Programs are version controlled and as in the case of cross-matching we refer to a run of target selection as a "plan" with an associated code "tag". The table ``version`` stores this information; the two boolean columns ``target_selection`` and ``robostrategy`` indicate whether the plan refers to a target selection or robostrategy run. Note that a target selection plan version doesn't have to be equal to its cross-matching one, and neither do their code tags (although, since cross-matching always happens before target selection it could not be that the tag version for target selection is lower than the tag version for the associated cross-matching plan). Also note that objects in the ``target`` table are not associated with a plan because catalogids are unique and fully define the associated target and its cross-match associations. The allocation of targets to programs is done in the ``program_to_target`` table, which also assigns the cadence (frequency and number of observations) for a given target and program. Cadence labels and their parameters are defined externally to ``target_selection``. When ``robostrategy`` run it selects targets based on their cadence and program associations and generates ``designs`` in which each object is assigned an instrument and robotic fibre positioner. Each design is characterised by a ``field`` centre with an associated cadence, chosen to optimise the cadence requirements of the individual targets in the design. Defining a carton ----------------- Cartons are groupings of astronomical objects that are selected together to achieve a common science goal. They are chosen from the pool of unique targets in ``catalogdb.catalog`` by applying a series of filter conditions on columns in the associated parent tables. An example of a carton is the Galactic Genesis, defined as Selection of all the IR-bright, red stars. Select sources brighter than H<11 AND ((G-H) > 3.5 OR Gaia non-detection), where H are 2MASS magnitudes and G are Gaia DR2 magnitudes. In this case we'd need to select all the targets (catalogids) that have an associated 2MASS match with :math:`H<11` if :math:`(G-H)>3.5` or if the target doesn't have a Gaia match. This can be achieved with the following SQL query taking advantage of the fact that the TIC is complete in both 2MASS and Gaia DR2. .. code-block:: postgresql SELECT c.catalogid FROM twomass_psc tm LEFT OUTER JOIN tic_v8 tic ON tic.twomass_psc = tm.designation INNER JOIN catalog_to_tic_v8 ctt ON ctt.target_id = tic.id INNER JOIN catalog c USING (catalogid) WHERE tm.h_m < 11 AND ((tm.h_m - tic.gmag) > 3.5 OR tic.gaia IS NULL) AND c.version_id = 13; Here we assume we are using the cross-match corresponding to the plan with ``catalog.version.id = 13``. Using Pewee and sdssdb this can be written as :: from sdssdb.peewee.sdss5db.catalogdb import (Catalog, CatalogToTIC_v8, TIC_v8, TwoMassPSC) gg = (TwoMassPSC .select(Catalog.catalogid) .join(TIC_v8, 'LEFT_OUTER') .join(CatalogToTIC_v8) .join(Catalog) .where(TwoMassPSC.h_m < 11, (((TwoMassPSC.h_m - TIC_v8.gmag) > 3.5) | TIC_v8.gaia >> None), Catalog.version_id == 13)) ``target_selection`` provides all the additional boilerplate to evaluate this query using ``catalogdb``, retrieve the results, and load them into ``targetdb`` along with their associated metadata (magnitudes, cadences, etc). Writing the query ^^^^^^^^^^^^^^^^^ Cartons are implemented as subclasses of `.BaseCarton`. `.BaseCarton` is an abstract class, which means that is not intended to be used directly and must be subclassed with some of its methods overridden. The main method that needs overloading is `.build_query`, which receives the version of cross-matching to use and must return a Peewee :class:`peewee:Select` or :class:`peewee:ModelSelect` object. We must also define the ``name``, ``program``, and ``category`` for the carton. A full implementation for the Galactic Genesis carton would look like :: from sdssdb.peewee.sdss5db.catalogdb import (Catalog, CatalogToTIC_v8, TIC_v8, TwoMassPSC) from . import BaseCarton class GalacticGenesisCarton(BaseCarton): name = 'galactic_genesis' category = 'science' program = 'Galactic Genesis' mapper = 'MWM' def build_query(self, version_id): gg = (TwoMassPSC .select(Catalog.catalogid) .join(TIC_v8, 'LEFT_OUTER') .join(CatalogToTIC_v8) .join(Catalog) .where(TwoMassPSC.h_m < 11, (((TwoMassPSC.h_m - TIC_v8.gmag) > 3.5) | TIC_v8.gaia >> None), Catalog.version_id == version_id)) return gg That's about it. The file containing this code must be placed in the ``cartons`` directory of ``target_selection`` from where it will be automatically imported. The query *must* return the ``catalogid`` for the selected objects, along with any other column that we want to use for post-processing. We haven't defined the cadence associated with the carton. We can do that by overloading the ``cadence`` attribute (which defaults to `None`) in the carton class or later in :ref:`post-processing `. The configuration file ^^^^^^^^^^^^^^^^^^^^^^ If we try to instantiate the class ``GalacticGenesisCarton`` it will raise an error because the carton cannot be found in the configuration file. The file at ``python/target_selection/config/target_selection.yml`` stores the general parameters for target selection and the values for specific cartons. .. code-block:: yaml '0.1.0': xmatch_plan: 0.1.0 cartons: - galactic_genesis schema: sandbox parameters: galactic_genesis: h_max: 11 h_g: 3.5 magnitudes: g: [catalog_to_sdss_dr13_photoobj_primary, sdss_dr13_photoobj.psfmag_g] r: [catalog_to_sdss_dr13_photoobj_primary, sdss_dr13_photoobj.psfmag_r] i: [catalog_to_sdss_dr13_photoobj_primary, sdss_dr13_photoobj.psfmag_i] h: [catalog_to_tic_v8, tic_v8, twomass_psc.h_m] bp: [catalog_to_tic_v8, tic_v8, gaia_dr2_source.phot_bp_mean_mag] rp: [catalog_to_tic_v8, tic_v8, gaia_dr2_source.phot_rp_mean_mag] Here target selection plan ``0.1.0`` is associated with cross-matching ``0.1.0-beta.1`` and we define a single carton for it, ``galactic_genesis``. We also specify the parameters for that carton. The ``parameters`` section for a given carton is accessible in `.BaseCarton` as ``self.parameters``. With this we can avoid hardcoding values in the query and rewrite it as :: gg = (TwoMassPSC .select(Catalog.catalogid) .join(TIC_v8, 'LEFT_OUTER') .join(CatalogToTIC_v8) .join(Catalog) .where(TwoMassPSC.h_m < self.parameters['h_max'], (((TwoMassPSC.h_m - TIC_v8.gmag) > self.parameters['h_g']) | TIC_v8.gaia >> None), Catalog.version_id == version_id)) The ``magnitudes`` section indicates the joins needed to load the ``targetdb.magnitude`` table. For each column in the table the mapping indicates the tables that need to be joined, starting at ``catalog``; the last entry also includes the column to grab. For example, for the ``h`` magnitude the configuration file indicates that we need to join ``catalog`` with ``twomass_psc`` via ``catalog_to_tic_v8`` and ``tic_v8`` and the insert the value from the column ``h_m``. As with :ref:`cross-matching `, it's possible to locally override the default database configuration to increase the work memory or optimise queries. The database parameters must be defined for a plan inside the configuration file, for example .. code-block:: yaml '0.1.0': xmatch_plan: 0.1.0 cartons: - galactic_genesis database_options: work_mem: '2GB' temp_buffers: '2GB' The custom parameters are applying within the transactions used to execute `~.BaseCarton.run`, `.post_process`, and `.load`. Another possibility is to override the `~.BaseCarton.setup_transaction` method completely for the carton implementation. This method prepares the transactions used to run and load the carton. To set ``random_page_cost=0.1`` for a given carton we can do :: def setup_transaction(self): self.database.execute_sql('SET LOCAL random_page_cost = 0.1;') Note that if `~BaseCarton.setup_transaction` is overridden, the ``database_options`` configuration is ignored for that carton. Custom magnitudes ^^^^^^^^^^^^^^^^^ We have just seen how the magnitudes for a target are obtained from parent tables in ``catalogdb``. Sometimes this is not possible, for example because the object does not an associated target in 2MASS and we cannot retrieve the H magnitude. To avoid this we can have the query return a proxy for a magnitude :: gg = (TwoMassPSC .select(Catalog.catalogid, CatWISE.w1mag.alias('h')) .join(TIC_v8, 'LEFT_OUTER') .join(CatalogToTIC_v8) .join(Catalog) .join(CatalogToCatWISE) .join(CatWISE) .where(TwoMassPSC.h_m < self.parameters['h_max'], (((TwoMassPSC.h_m - TIC_v8.gmag) > self.parameters['h_g']) | TIC_v8.gaia >> None), Catalog.version_id == version_id)) In this query we are returning the CatWISE W1 magnitude aliased as column ``h`` (a very bad idea, but useful for the purposes of this example). If the column is present, ``target_selection`` will use it directly instead of trying to grab the ``h`` magnitude from ``catalogdb``. .. _target-selection-post-processing: Post-processing ^^^^^^^^^^^^^^^ Calling `~.BaseCarton.run` will execute the query and create a temporary table in the ``sandbox`` schema called ``temp_`` with its output (the ``catalogid`` column and any other columns we decided to return). Two extra columns are added if they have not been returned by the query: ``selected`` which is set to ``true``, and ``cadence``, set to ``null``. The first one indicates whether the target must be selected and loaded into ``targetdb``' the second allows to set a cadence specific to that object. Note that setting both the carton `.cadence` attribute and the ``cadence`` column is not allowed. After the query is done the carton class calls `~.BaseCarton.post_process`. By default that method doesn't do anything but it can be overloaded to perform additional, non-SQL operations on the output table. A typical case is that a selection criteria is too complicated to encapsulate as SQL, or maybe it requires using an external file. We can define `.build_query` to return a superset of the targets and use `.post_process` to mask out the objects that do not meet the criteria by changing their ``selected`` value to ``false``. We can also set the ``cadence`` column the same way, or add new magnitude columns based on other existing columns. `.post_process` receives a Peewee model of the temporary table generated using reflection and doesn't return anything: all operations must be done in place on the table. Restricting the query ^^^^^^^^^^^^^^^^^^^^^ For test purposes it's useful to be able to run the query on a small region on the sky. This can be accomplished by defining the carton class and overriding the ``query_region`` attribute or by calling `~.BaseCarton.run` and passing a ``query_region`` argument. In either case ``query_region`` must be a tuple in the form ``(ra_centre, dec_centre, radius)``, in degrees. Only targets within that radial region will be included in the output. The most efficient way to implement the radial query is to do it explicitely when writing the query. If we define `.build_query` with the keyword argument ``query_region`` in its signature, `~.BaseCarton.run` will pass the parameter, at which point the query can implement it in the most optimal way possible. Let's rewrite our Galactic Genesis example with a radial query option :: import peewee from sdssdb.peewee.sdss5db.catalogdb import (Catalog, CatalogToTIC_v8, TIC_v8, TwoMassPSC) from . import BaseCarton class GalacticGenesisCarton(BaseCarton): name = 'galactic_genesis' category = 'science' program = 'Galactic Genesis' mapper = 'MWM' def build_query(self, version_id, query_region=None): gg = (TwoMassPSC .select(Catalog.catalogid) .join(TIC_v8, 'LEFT_OUTER') .join(CatalogToTIC_v8) .join(Catalog) .where(TwoMassPSC.h_m < 11, (((TwoMassPSC.h_m - TIC_v8.gmag) > 3.5) | TIC_v8.gaia >> None), Catalog.version_id == version_id)) if query_region: gg = gg.where(peewee.fn.q3c_radial_query(Catalog.ra, Catalog.dec, query_region[0], query_region[1], query_region[2])) return gg If we don't implement the region condition explicitely, `~.BaseCarton.run` will add it by converting the main query into a subquery and joining with the ``catalog`` table. Depending on the query this may result in very poor performance (the results could be restricted to the radial region only after the query has run on the whole sky). It's recommended to implement ``query_region`` in `.build_query`. Writing results to a file ^^^^^^^^^^^^^^^^^^^^^^^^^ For QA purposes it's useful to be able to write the result of running the carton query to a file. The method `.write_table` allows to do that. It must be called after `~.BaseCarton.run` has been invoked and writes the temporary table to a gzip'd FITS file with all the columns returned by the query and modified in post-processing (including ``selected`` and ``cadence``). :: >>> carton.write_table() catalogid selected cadence ra ... cc_flg rd_flg gal_contam int64 bool float64 float64 ... str3 str3 int64 --------- -------- ------- ---------------- ... ------ ------ ---------- 565437025 True nan 314.977316146069 ... 000 222 0 757228297 True nan 315.101352793936 ... 000 111 0 757228365 True nan 315.115427475258 ... 000 111 0 757228396 True nan 315.124831714726 ... 000 222 0 757654361 True nan 315.3227486863 ... 000 222 0 565379603 True nan 314.673172098268 ... 000 111 0 ... ... ... ... ... ... ... ... 476312142 True nan 41.850288 ... 000 222 0 649921691 True nan 44.536546 ... 000 222 0 476311240 True nan 41.5986734774755 ... 000 222 0 After `.load` has run and the carton has been ingested into ``targetdb``, it's possible to call `.write_table` with ``mode='targetdb'``. This will write a selection of the ``targetdb`` columns for the carton (catalogid, astrometric coordinates, cadence, magnitudes) :: carton.write('carton_loaded-0.2.3.fits.gz', mode='targetdb') Running target selection ------------------------ Once the carton is fully implemented we can execute the query, post-process, and load the data into ``targetdb`` by doing :: from target_selection.cartons import GalacticGenesisCarton gg = GalacticGenesisCarton('0.1.0') gg.run() gg.load() While we could do this for each carton in the target selection run, it's easier to use the :ref:`command line interface ` by doing .. code-block:: sh target_selection --user sdss run "0.1.0" This will select all the cartons for the target selection plan ``0.1.0`` and run and load them in order.