Warning
This command is not yet implemented on MonARCH. Please check the message of the day (MOTD) displayed at login to see which system commands are currently implemented on MonARCH.
Checking job status¶
There are two methods to check your job status.
Method 1: show_job
¶
We provide a show_job
script. This script groups information, filters, sorts, and provides statistics to provide
a clean, tidy, and user-friendly output.
show_job 3000558
-----------------------------------------------------------------------------------
JobID 3000558
USERID smichnow
USER Name Simon Michnowicz (Monash University)
Email
-----------------------------------------------------------------------------------
Job Name testV2feature
Project general
Partition comp
QoS normal
Job State PENDING
Why cant Run Resources
Running Time 00:00:00
Total Time 00:05:00
Submit Host monarch-dtn
Submit Time 2018-06-19T14:29:36
-----------------------------------------------------------------------------------
Job Resource Node=1
NumCPUs=16
CPUsPerTask=1
CPUsPerNode=1
MemoryPerNode=1000M
Constraint=Xeon-E5-2680-v3
----------------------------------------------------------------------------------
Job Working Dir:
/home/smichnow/slurm
Job Command File/Script:
/home/smichnow/slurm/testMonV2-testFeature.hc.sh
Job Output File:
/home/smichnow/slurm/hc-3000558
Job Error File:
/home/smichnow/slurm/hc-3000558
-----------------------------------------------------------------------------------
Hint
To check the status of a single job use show_job [JOBID]
.
Method 2: Slurm commands¶
To display all of your running/pending jobs use squeue -u `whoami`
.
Hint
whoami
returns your MonARCH username, and is a handy shortcut.
$ squeue -u `whoami`
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
If you want to view the status of a single job
$ scontrol show job [JOBID]
squeue Status Codes and Reasons¶
The squeue command details a variety of information on an active job’s status with state and reason codes. Job state codes describe a job’s current state in queue (e.g. pending, completed). Job reason codes describe the reason why the job is in its current state.
The following tables outline a variety of job state and reason codes you may encounter when using squeue to check on your jobs.
Status |
Code |
Explanation |
---|---|---|
COMPLETED |
CD |
The job has completed successfully. |
COMPLETING |
CG |
he job is finishing but some processes are still active. |
FAILED |
F |
The job terminated with a non-zero exit code and failed to execute. |
PENDING |
PD |
The job is waiting for resource allocation. It will eventually run. |
PREEMPTED |
PR |
The job was terminated because of preemption by another job. |
RUNNING |
R |
The job currently is allocated to a node and is running. |
SUSPENDED |
S |
A running job has been stopped with its cores released to other jobs. |
STOPPED |
ST |
A running job has been stopped with its cores retained. |
Reason Code |
Explanation |
---|---|
Priority |
One or more higher priority jobs is in queue for running. Your job will eventually run. |
Dependency |
This job is waiting for a dependent job to complete and will run afterwards. |
Resources |
The job is waiting for resources to become available and will eventually run. |
InvalidAccount |
The job’s account is invalid. Cancel the job and rerun with correct account. |
InvaldQoS |
The job’s QoS is invalid. Cancel the job and rerun with correct account. |
QOSGrpCpuLimit |
All CPUs assigned to your job’s specified QoS are in use; job will run eventually. |
QOSGrpMaxJobsLimit |
Maximum number of jobs for your job’s QoS have been met; job will run eventually. |
QOSGrpNodeLimit |
All nodes assigned to your job’s specified QoS are in use; job will run eventually. |
PartitionCpuLimit |
All CPUs assigned to your job’s specified partition are in use; job will run eventually. |
PartitionMaxJobsLimit |
Maximum number of jobs for your job’s partition have been met; job will run eventually. |
PartitionNodeLimit |
All nodes assigned to your job’s specified partition are in use; job will run eventually. |