Default Values For Selecting Hardware

A number of SLURM mechanisms are available to select different hardware:

  • partitions

  • QOS

  • gres

  • constraint

Not all of these mechanisms need to be specfied when submitting, and are listed here only for completeness. Users have a default partition and QOS, so do not need to specify them when submitting jobs. However doing so will do no harm, and may remind them about their values.

Name

Default Value

partition

comp

QOS

normal

The values of the defaults is likely to change over time, as we add new hardware and optimize the system. Current values can be found with some of these commands. To view more information on the outputs, please use the manual pages.

  • man scontrol

  • man sacctmgr

Command

Values

scontrol show partitions

Lists All Our Partitions, currently short, comp and gpu

scontrol show partition comp

Detailed information on the comp partition

including Maximum Wall Time (7 days) and Default Memory per CPU (4096M)

sacctmgr show qos normal\

format=”Name,MaxWall,MaxCPUSPerUser,MaxTresPerUser%20”

Name MaxWall MaxCPUsPU MaxTRESPU

normal 7-00:00:00 65 cpu=65,gres/gpu=3

Please note that MonARCH uses the QOS to control how much of the cluster one user can use. For the QOS normal a user has:

  • a maximum of 65 CPUs (cores)

  • a maxmimum of 3 GPU cards

  • a maximum wall time of 7 days

Partitions Available

MonARCH hardware is split into several partitions.

The default partition for all submitted jobs is:

  • comp for compute nodes

Other partitions include:

  • short for jobs with a walltime < 1 day. This will run only on previously MonV1 hardware.

  • gpu for the GPU nodes

Example: To use the short partition for jobs < 1 hour, put this in your SLURM submission script.

#SBATCH --partition=short

Selecting a particular CPU Type

The hardware available consists of several sort of nodes: All nodes have hyper-threading turned off.

  • mi* nodes are 36 core Xeon-Gold-6150 @ 2.70GHz servers wtih 158893MB usable memory

  • gp* nodes are 28 core Xeon-E5-2680-v4 @ 2.40GHz servers with 241660MB usable memory. Each gp server has two P100 GPU cards.

  • mk* nodes are 48 core Xeon-Platinum-8260 @ 2.4GHz servers with 342000M usable memory.

  • md* nodes are 48 core Xeon-Gold-5220R @ 2.20GHz servers with 735000M usable memory. Each server has two processors with 28 cores each.

  • hm00. This single node is 36 core Xeon-Gold-6150 @ 2.7GHz server with 1.4TB usable memory.

Sometimes users may want to constrain themselves to use a particular CPU type, e.g. for timing reasons. In this case, they need to specify this with a constraint flag in the SLURM submissions script. As a parameter, this flag specifies the CPU type needed. The CPU type of a particular node can be viewed by runinng this command:

scontrol show node <nodename>

command and then looking for the Feature field.

Examples:

# this command requests only mi* nodes that have Xeon-Gold processors
#SBATCH --constraint=Xeon-Gold-6150

This feature should only be used if you must have a particular processor. Jobs will schedule faster if you do not use it.

Selecting a particular server

Users can specify to use only a particular server if they wish.

Example: Only run jobs on server ge00

#SBATCH --nodelist=ge00

Selecting a GPU Node

To request one or more GPU cards, you need to specify:

  • the gpu partition

  • the name and type of GPU in a gres statement. Your running program will only be allowed access to the number of cards that you specify.

You should not use the constraint feature described above.

# this command requests one P100 card on a node. This is a gp* machine
#SBATCH --partition=gpu
#SBATCH --gres=gpu:P100:1