About MonARCH

MonARCH is pioneering and building high performance computing upon Monash’s specialist Research Cloud fabric. MonARCH has been supplied by Dell with a Mellanox low latency network and NVIDIA GPUs.

System configuration

The MonARCH cluster serves the university’s HPC users as its primary community, and remains distinct and independent from MASSIVE M3. However, it is closely aligned with M3. Specifically, MonARCH features:

  • two dedicated login nodes and a dedicated data transfer node (like on MASSIVE M3);

  • over 60 servers, totalling to over 1600 CPU cores;

  • 15 GPU nodes, with a mix of nVIDIA Tesla P100 (http://www.nvidia.com/object/tesla-p100.html) cards and K80 (https://www.nvidia.com/en-gb/data-center/tesla-k80/) cards;

  • a SLURM scheduler with service redundancy, with better stability and new features to improve fair share;

  • a website for MonARCH HPC user documentation; and

  • a convergence to a single HPC software module environment, shared with MASSIVE M3.

Hardware

Name

CPU

Number of cores / Server

Usable Memory /Server

Notes

mi*

Xeon-Gold 6150 @ 2.70GHz

36

158893MB

hi*

Xeon-Gold 6150 @ 2.70GHz

27

131000MB

Same hardware as mi* nodes, but with less cores/memory in the VM

ga*

Xeon-Gold-6330 @ 2.00GHz

56

1011964MB

Each server has two A100 GPU devices

hm00

Xeon-Gold-6150 @ 2.70GHz

26

1419500MB

Specialist High Memory ~1.4TB machine. Please contact support to get access.

md*

Xeon(R) Gold 5220R @ 2.20GHz

48

735000MB

The most recent Monarch Nodes which are baremetal.

mk*

Xeon-Platinum-8260 @ 2.50GHz

48

342000MB

ms*

Xeon-Gold-6338 @ 2.00GHz

64

505700MB

The most recent Monarch Nodes


Login Information

MonARCH has two interactive login nodes and one dedicated for data transfers. The hostnames for these are:

MonARCH Login Node Information

Hostname

Purpose

monarch.erc.monash.edu

This alias will take you to one of the two login nodes below.

monarch-login1.erc.monash.edu

The first login node of MonARCH.

monarch-login2.erc.monash.edu

The second login node of MonARCH.

monarch-dtn.erc.monash.edu

A dedicated data transfer node ideal for large file transfers and rsync operations.

MonARCH vs M3

MonARCH and M3 share the same user identity system. However users on one cluster can not log into the other unless they belong to an active project on that cluster.

Hyperthreading

All nodes on MonARCH V2 have hyperthreading turned off for performance reasons.

Software Stack

MonARCH V2 uses the M3 software stack (on /usr/local). Software packages are enabled using environment modules (i.e. the module command). This is explained in https://docs.monarch.erc.monash.edu.au/MonARCH/software/software.html .

SLURM Partitions

MonARCH V2’s SLURM scheduler currently uses a simple partition (queue) structure:

  • comp for CPU-only jobs of up to seven days long;

  • gpu for GPU jobs of up to seven days long;

  • short for 24-hour jobs;

  • himem for the high memory node only. Please contact support to get access to this partition.

MonARCH uses SLURM’s QOS (Quality of Service) feature to control access to different features of the cluster. All users belong to a default QOS called normal. Users may be directed to use a different QOS at times (i.e. to use a Partner Share).

How to examine the QOS:

sacctmgr  show qos normal format="Name,MaxWall,MaxCPUSPerUser,MaxTresPerUser%20"
     Name     MaxWall    MaxCPUsPU    MaxTRESPU
     normal   7-00:00:00        64    cpu=64,gres/gpu=3