Usage: Data Pipeline ==================== The primary usage of ImagingLSS is to produce large scale structure catalogues. In this section, we document the data pipeline. There are the steps: - Building the cache - Generating the object catalogue - Generating the random sampling of the mask, or Querying depth for given RA DEC positions - Applying veto based on proximity to stars - Estimating completeness for randoms and objects - Assemble and export final text files The first 3 steps are computational/data extensive and we provide mpi4py based scripts; which shall go through the job queue of a computing facility. The rest of the steps are very light weight and the head nodes can easily handle them. We will cover them in the following sections. The examples are based off our working configuration at NERSC. The source file `runtests.sh` in the root directory contains an example of the full pipeline for producing randoms and objects of the QSO target type. Building the Cache ------------------ Before we start using ImagingLSS, it is crucial to build the catalogue cache. .. attention:: On Edison for DR2, this step can be omitted if the following configuration file is used: .. code-block:: bash /project/projectdirs/m779/imaginglss/dr2.conf.py The application to build the cache is :code:`scripts/imglss-mpi-build-cache.py`. The application scans the files in the data release described by a configuration file (provided in :code:`--conf` argument), and converts the columns used by ImagingLSS to a binary format that is much easier to use into the 'cache' directory specified in the configuration file. For the format of the configuration file, refer to `Configuration File`. The inline help of the script describes the usage: .. code-block:: bash usage: imglss-mpi-build-cache.py [-h] [--conf CONF] optional arguments: -h, --help show this help message and exit --conf CONF Path to the imaginglss config file, default is from DECALS_PY_CONFIG Here is an example job script that works on Edison (Note that python-mpi-bcast is used). Submit the job script with :code:`sbatch`. .. code-block:: bash #!/bin/bash #SBATCH -J imglss-mpi-build-cache #SBATCH -n 512 #SBATCH -o imglss-mpi-build-cache.%j #SBATCH -p debug #SBATCH -t 00:30:00 export OMP_NUM_THREADS=1 module load python/2.7-anaconda source /project/projectdirs/m779/python-mpi/nersc/activate.sh # change the following line to where your ImagingLSS is installed mirror ~/source/imaginglss imaginglss scripts # change conf to your imaginglss configuration file srun -n 256 python-mpi /dev/shm/local/scripts/imglss-mpi-build-cache.py --conf /project/projectdirs/m779/imaginglss/dr2.conf.py Generating Object Catalogue --------------------------- :code:`scripts/imglss-mpi-select-objects.py` selects objects of a type, and writes out the objects that we will use in the later stages of the pipeline. The target types are defined at http://desi.lbl.gov/trac/wiki/TargetSelection The output is the RA DEC and magnitudes of objects. The inline help of the script describes the usage: .. code-block:: bash usage: imglss-mpi-select-objects.py [-h] [--use-tractor-depth] [--conf CONF] {MYBGS,ELG,QSOC,LRG,QSO,QSOd,BGS} output positional arguments: {MYBGS,ELG,QSOC,LRG,QSO,QSOd,BGS} output Output file name. A new object catalogue file will be created. optional arguments: -h, --help show this help message and exit --use-tractor-depth Use Tractor's Depth in the catalogue, very fast! --conf CONF Path to the imaginglss config file, default is from DECALS_PY_CONFIG Here is an example job script we use on Edison to generate the LRG catalogue. Submit the job script with :code:`sbatch`. We also encourage typing in the commands one by one from an interactive job session, obtained via :code:`salloc`. Refer to http://www.nersc.gov/users/computational-systems/cori/running-jobs/interactive-jobs/. .. code-block:: bash #!/bin/bash #SBATCH -J imglss-mpi-select-objects #SBATCH -n 512 #SBATCH -o imglss-mpi-select-objects.%j #SBATCH -p debug #SBATCH -t 00:30:00 export OMP_NUM_THREADS=1 module load python/2.7-anaconda source /project/projectdirs/m779/python-mpi/nersc/activate.sh # change the following line to where your imaginglss is installed mirror ~/source/imaginglss imaginglss scripts # use without installing export PYTHONPATH=/dev/shm/local:$PYTHONPATH # change conf to your imaginglss configuration file srun -n 256 python-mpi /dev/shm/local/scripts/imglss-mpi-select-objects.py LRG LRG.hdf5 --conf /project/projectdirs/m779/imaginglss/dr2.conf.py Generating Complete Random Sky Mask ----------------------------------- imglss-mpi-make-random.py generates the randoms for the sky mask. The points will be uniform within the survey footprint. The inline help of the script describes the usage: .. code-block:: bash usage: imglss-mpi-make-random.py [-h] [--conf CONF] Nran output positional arguments: Nran Minimum number of randoms output optional arguments: -h, --help show this help message and exit --conf CONF Path to the imaginglss config file, default is from DECALS_PY_CONFIG Here is an example job script we use on Edison to generate a QSO random catalogue. Submit the job script with :code:`sbatch`. We also encourage typing in the commands one by one from an interactive job session, obtained via :code:`salloc`. Refer to http://www.nersc.gov/users/computational-systems/cori/running-jobs/interactive-jobs/. .. code:: #!/bin/bash #SBATCH -J imglss-mpi-make-random #SBATCH -n 512 #SBATCH -o imglss-mpi-make-random.%j #SBATCH -p debug #SBATCH -t 00:30:00 export OMP_NUM_THREADS=1 module load python/2.7-anaconda source /project/projectdirs/m779/python-mpi/nersc/activate.sh # change the following line to where your imaginglss is installed mirror ~/source/imaginglss imaginglss scripts # use without installing export PYTHONPATH=/dev/shm/local:$PYTHONPATH # change conf to your imaginglss configuration file srun -n 256 python-mpi /dev/shm/local/scripts/imglss-mpi-make-random.py 6000000 QSO-random.hdf5 --conf /project/projectdirs/m779/imaginglss/dr2.conf.py Sometimes the position of a random catalogue is already specified. In this case we provide another script, `imglss-mpi-query-depth.py`, to query the depth / noise level of the deccals survey of these points. The RA and DEC of these points must be stored as two datasets name 'RA' and 'DEC' in a HDF5 file. Here is the help of the script: .. code:: usage: imglss-mpi-query-depth.py [-h] [--conf CONF] query Query Depth from DECALS data for input RA DEC of points. The input must be saved in a HDF5 with two datasets 'RA' and 'DEC'. The output will be written in the same file as INTRINSIC_NOISELEVEL data set. To lookup the columns, use the dictionary in `imaginglss.model.dataproduct.bands`. The output of this script can be directly fed into imglss-query-completeness.py as the query input. positional arguments: query An HDF5 file with RA and DEC dataset, the position of to query the depth. optional arguments: -h, --help show this help message and exit --conf CONF Path to the imaginglss config file, default is from DECALS_PY_CONFIG Apply Star veto mask -------------------- imglss-query-tycho-veto.py applies the bright star veto masks to a target or random catalogue. The veto types are defined in imaginglss/analysis/tycho_veto.py . As you can tell, we currently only support vetoing via a Tycho2 catalogue. The star veto mask is important for correctly building the completeness estimator. The inline help of the script describes the usage: .. code:: usage: imglss-query-tycho-veto.py [-h] [--conf CONF] catalogue Query the TYCHOVETO flags of input data. The position is taken from the NOISES extension of input. The result is written to the TYCHOVETO extension of output. Currently, only veto by proximity to tycho stars are implemented. Each veto in imaginglss.analysis.tycho_veto is calculated as a column in the TYCHOVETO extension. Unfortunately, this script is not sufficiently smart to decide the correct TYCHOVETO for the target type. Therefore, no combined veto flag is generated. positional arguments: catalogue HDF5 catalogue file, can be either random or objects. TYCHO_VETO dataset will be added optional arguments: -h, --help show this help message and exit --conf CONF Path to the imaginglss config file, default is from DECALS_PY_CONFIG Query Completeness ------------------ imglss-query-completeness.py esitmates the fractional completeness for objects / randoms based on their depth. A threshold confidence level is used to generate a 100% complete sample based on an object catalogue. Then this sample is taken to model the fractional completeness. The result is appended as COMPLETENESS column to the catalogue. The inline help of the script describes the usage: .. code:: Usage: imglss-query-completeness.py [-h] [--use-tycho-veto {BOSS_DR9,DECAM_BGS,DECAM_ELG,DECAM_LRG,DECAM_QSO}] [--sigma-z SIGMA_Z] [--sigma-g SIGMA_G] [--sigma-r SIGMA_R] [--conf CONF] {MYBGS,ELG,QSOC,LRG,QSO,QSOd,BGS} objects query positional arguments: {MYBGS,ELG,QSOC,LRG,QSO,QSOd,BGS} objects object catalogue for building the completeness model. query catalogue to query completeness optional arguments: -h, --help show this help message and exit --use-tycho-veto {BOSS_DR9,DECAM_BGS,DECAM_ELG,DECAM_LRG,DECAM_QSO} --sigma-z SIGMA_Z --sigma-g SIGMA_G --sigma-r SIGMA_R --conf CONF Path to the imaginglss config file, default is from DECALS_PY_CONFIG Assemble Final Product ---------------------- imglss-export-text.py assembles a final catalogue for objects or randoms. The final product is a plain text file. fluxes (only for objects) and depths of selected bands can be included in the final product. We need a threshold confidence level (usually identical to the one used in imglss-query-completenesss) to filter out poorly detected objects. Vetoing by proximity to stars is also applied at this final stage. The inline help of the script describes the usage: .. code:: usage: imglss-export-text.py [-h] [--conf CONF] [--use-tycho-veto {BOSS_DR9,DECAM_BGS,DECAM_ELG,DECAM_LRG,DECAM_QSO}] [--bands {Y,W4,r,u,W1,g,i,W3,z,W2} [{Y,W4,r,u,W1,g,i,W3,z,W2} ...]] [--sigma-z SIGMA_Z] [--sigma-g SIGMA_G] [--sigma-r SIGMA_R] catalogue output positional arguments: catalogue internal catalogue of HDF5 type. output text file to store the catalogue. optional arguments: -h, --help show this help message and exit --conf CONF Path to the imaginglss config file, default is from DECALS_PY_CONFIG --use-tycho-veto {BOSS_DR9,DECAM_BGS,DECAM_ELG,DECAM_LRG,DECAM_QSO} --bands {Y,W4,r,u,W1,g,i,W3,z,W2} [{Y,W4,r,u,W1,g,i,W3,z,W2} ...] --sigma-z SIGMA_Z --sigma-g SIGMA_G --sigma-r SIGMA_R