This guide contains the fundamental knowledge to assist you in migrating from LSF to SLURM successfully. If anything is missing from this guide, please send email to support@robusthpc.com.
LSF | Slurm | Description |
---|---|---|
bsub < script_file | sbatch script_file | Submit a job from script_file |
bkill 123 | scancel 123 | Cancel job 123 |
bjobs | squeue | List user's pending and running jobs |
bqueues | sinfo sinfo -s |
Cluster status with partition (queue) list With '-s' a summarised partition list, which is shorter and simpler to interpret. |
LSF | Slurm |
---|---|
bjobs [JOBID] | squeue [-j JOBID] |
bjobs -p | squeue -u USERNAME -t PENDING |
bjobs -r | squeue -u USERNAME -t RUNNING |
LSF | Slurm |
---|---|
Job #1: bsub -J job1 command1 Job #2: bsub -J job2 -w "done(job1)" command2 |
Job #1: myjobid=$(sbatch --parsable -J job1 --wrap="command1") Job #2: sbatch -J job2 -d afterany:$myjobid --wrap="command2" |
In Slurm, sbatch --parsable returns the JOBID of the job.
LSF | Slurm | Description |
---|---|---|
#BSUB | #SBATCH | Scheduler directive |
-q queue_name | -p queue_name | Queue to 'queue_name' |
-n 64 | -n 64 | Processor count of 64 |
-W [hh:mm:ss] | -t [minutes] or -t [days-hh:mm:ss] |
Max wall run time |
-o file_name | -o file_name | STDOUT output file |
-e file_name | -e file_name | STDERR output file |
-J job_name | --job-name=job_name | Job name |
-x | --exclusive | Exclusive node usage for this job - i.e. no other jobs on same nodes |
-M 128 | --mem-per-cpu=128M or --mem-per-cpu=1G |
Memory requirement |
-R "span[ptile=16]" | --tasks-per-node=16 | Processes per node |
-P proj_code | --account=proj_code | Project account to charge job to |
-J "job_name[array_spec]" | --array=array_spec | Job array declaration |
LSF | Slurm | Description |
---|---|---|
$LSB_JOBID | $SLURM_JOBID | Job ID |
$LSB_SUBCWD | $SLURM_SUBMIT_DIR | Submit directory |
$LSB_JOBID | $SLURM_ARRAY_JOB_ID | Job Array Parent |
$LSB_JOBINDEX | $SLURM_ARRAY_TASK_ID | Job Array Index |
$LSB_SUB_HOST | $SLURM_SUBMIT_HOST | Submission Host |
$LSB_HOSTS $LSB_MCPU_HOST |
$SLURM_JOB_NODELIST | Allocated compute nodes |
$LSB_DJOB_NUMPROC | $SLURM_NTASKS (mpirun can automatically pick this up from Slurm, it does not need to be specified) |
Number of processors allocated |
$SLURM_JOB_PARTITION | Queue |
LSF | Slurm |
---|---|
bsub -Is [LSF options] bash | srun --pty bash |
LSF | Slurm |
---|---|
bsub -n 128 -R "span[ptile=128]" | sbatch -n 1 --cpus-per-task=128 |
LSF | Slurm |
---|---|
bsub -n 256 -R "span[ptile=128]" |
sbatch -n 256 --ntasks-per-node=128 or sbatch -n 256 --nodes=2 |
The Slurm options
are supported.
Please note that for larger parallel MPI jobs that use more than a single node (more than 128 cores), you should add the sbatch option
-C ib
to make sure that they get dispatched to nodes that have the InfiniBand highspeed interconnect, as this will result a much better performance.
LSF | SLURM |
---|---|
bsub -J jobname[1-N]" | sbatch --array=1-N |
bsub -J jobname[1-N%step]" | sbatch --array=1-N:step |
Environment variables defined in each job:
|
Environment variables defined in each job:
|
LSF example:
bsub -J "myarray[1-4]" 'echo "Hello, I am task $LSB_JOBINDEX of $LSB_JOBINDEX_END"'
Slurm example:
sbatch --array=1-4 --wrap='echo "Hello, I am task $SLURM_ARRAY_TASK_ID of $SLURM_ARRAY_TASK_COUNT"'
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" | sbatch --gpus=1 |
For multi-node jobs you need to use the --gpus-per-node option instead.
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceGTX1080]" | sbatch --gpus=2g.20gb:1 |
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_model0==NVIDIAGeForceRTX3090]" | sbatch --gpus=3g.40gb:1 |
LSF | Slurm |
---|---|
bsub -R "rusage[ngpus_excl_p=1]" -R "select[gpu_mtotal0>=20480]" | sbatch --gpus=1 --gres=gpumem:20g |
The default unit for gpumem is bytes. You are therefore advised to specify units, for example 20g or 11000m.