Warning

This command is not yet implemented on MonARCH. Please check the message of the day (MOTD) displayed at login to see which system commands are currently implemented on MonARCH.

Checking job status¶

There are two methods to check your job status.

Method 1: `show_job`¶

We provide a show_job script. This script groups information, filters, sorts, and provides statistics to provide a clean, tidy, and user-friendly output.

show_job 3000558
-----------------------------------------------------------------------------------
JobID                       3000558
USERID                      smichnow
USER Name                   Simon Michnowicz (Monash University)
Email
 -----------------------------------------------------------------------------------
Job Name                    testV2feature
Project                     general
Partition                   comp
QoS                         normal
Job State                   PENDING
Why cant Run                Resources
Running Time                00:00:00
Total Time                  00:05:00
Submit Host                 monarch-dtn
Submit Time                 2018-06-19T14:29:36
-----------------------------------------------------------------------------------
Job Resource                Node=1
                          NumCPUs=16
                          CPUsPerTask=1
                          CPUsPerNode=1
                          MemoryPerNode=1000M
                          Constraint=Xeon-E5-2680-v3
 ----------------------------------------------------------------------------------
Job Working Dir:
/home/smichnow/slurm
Job Command File/Script:
/home/smichnow/slurm/testMonV2-testFeature.hc.sh
Job Output File:
/home/smichnow/slurm/hc-3000558
Job Error File:
/home/smichnow/slurm/hc-3000558
-----------------------------------------------------------------------------------

Hint

To check the status of a single job use show_job [JOBID].

Method 2: Slurm commands¶

To display all of your running/pending jobs use squeue -u `whoami`.

Hint

whoami returns your MonARCH username, and is a handy shortcut.

$ squeue -u `whoami`
         JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)

If you want to view the status of a single job

$ scontrol show job [JOBID]

squeue Status Codes and Reasons¶

The squeue command details a variety of information on an active job’s status with state and reason codes. Job state codes describe a job’s current state in queue (e.g. pending, completed). Job reason codes describe the reason why the job is in its current state.

The following tables outline a variety of job state and reason codes you may encounter when using squeue to check on your jobs.

squeue status codes¶
Status	Code	Explanation
COMPLETED	CD	The job has completed successfully.
COMPLETING	CG	he job is finishing but some processes are still active.
FAILED	F	The job terminated with a non-zero exit code and failed to execute.
PENDING	PD	The job is waiting for resource allocation. It will eventually run.
PREEMPTED	PR	The job was terminated because of preemption by another job.
RUNNING	R	The job currently is allocated to a node and is running.
SUSPENDED	S	A running job has been stopped with its cores released to other jobs.
STOPPED	ST	A running job has been stopped with its cores retained.

Job Reason Codes¶
Reason Code	Explanation
Priority	One or more higher priority jobs is in queue for running. Your job will eventually run.
Dependency	This job is waiting for a dependent job to complete and will run afterwards.
Resources	The job is waiting for resources to become available and will eventually run.
InvalidAccount	The job’s account is invalid. Cancel the job and rerun with correct account.
InvaldQoS	The job’s QoS is invalid. Cancel the job and rerun with correct account.
QOSGrpCpuLimit	All CPUs assigned to your job’s specified QoS are in use; job will run eventually.
QOSGrpMaxJobsLimit	Maximum number of jobs for your job’s QoS have been met; job will run eventually.
QOSGrpNodeLimit	All nodes assigned to your job’s specified QoS are in use; job will run eventually.
PartitionCpuLimit	All CPUs assigned to your job’s specified partition are in use; job will run eventually.
PartitionMaxJobsLimit	Maximum number of jobs for your job’s partition have been met; job will run eventually.
PartitionNodeLimit	All nodes assigned to your job’s specified partition are in use; job will run eventually.

Checking job status¶

Method 1: show_job¶

Method 2: Slurm commands¶

squeue Status Codes and Reasons¶

Method 1: `show_job`¶