How do I use a job array?
A job array is a collection of similar independent jobs which are submitted together to one of the Linux cluster job schedulers using a job template script. The advantage of using a job array is that many similar jobs can be submitted using a single job template script, and the jobs will run independently as they are able to obtain resources on the compute cluster. Using a job array can be advantageous for calculation throughput, especially for small independent calculations that may be able to run "in-between" larger calculations.
Job arrays are mostly easily used if input and output files for independent calculations can be numbered in a sequential fashion.
The following YouTube video on MSI's channel summarizes much of this and the below information: https://youtu.be/FrTGAMC7w1I
To use a job array, first create a job template script - each element of the array will replicate this script. Here, we will work off the example shown below:
#!/bin/bash -l
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=2
#SBATCH --mem=2g
#SBATCH -t 1:00:00
#SBATCH --mail-type=END
#
cd ~/program_directory
./program.exe < input$SLURM_ARRAY_TASK_ID > output$SLURM_ARRAY_TASK_ID
In this example, the job script is requesting one hour of walltime, two compute cores on a single node, and 2gb of memory. The job script attempts to run an executable named program.exe, and directs the input and output of the program. If you save this script to a file named jobtemplate.sh, it could then be used to submit an array of ten jobs using the command:
sbatch --array=1-10 jobtemplate.sh
Within a job array, the value of SLURM_ARRAY_TASK_ID will be replaced by a number equal to the array ID number. In this example, ten independent jobs will be submitted, with SLURM_ARRAY_TASK_ID values running from 1 to 10. This means that in each of the jobs, program.exe will be passed a numbered input file with a name like inputfile1, inputfile2, etc. Output will similarly be sent to an output file with a name like outputfile1, outputfile2, etc. Each of the ten jobs in this example will have 1 hour of walltime, 2 cores, and 2gb of memory.
The jobs will run independently as they find available resources on the compute cluster. Note that users are limited to 2,000 jobs within the scheduler at any point (including those listed as "Completed"); each element of an array job counts as an individual job.