Welcome to the MonARCH documentation!

Important

Rollout of a new operating system by 30 April 2024

A major security uplift of MonARCH is currently underway.

The nine-year old CentOS operating system on MonARCH is approaching end-of-life (EOL) and will be out of support by the end of Q2 2024. Over the next few weeks, MonARCH will be progressively upgraded to run Rocky Linux, a newer and more secure operating system.

Our focus is on building new software packages for Rocky Linux. Actively-used applications on /usr/local are being tested on the new OS and will be reinstalled if they are incompatible. These applications along with future software requests will be built for Rocky Linux and installed at: /apps

This upgrade will be conducted in several phases so that you are able to continue running your analyses on MonARCH throughout. We will progressively upgrade existing CentOS compute nodes to Rocky Linux with the aim of meeting our target date of 30 April 2024.

MonARCH will retain the use of:

  • SLURM for job scheduling; and

  • Environment modules for activating applications

Please visit this page for updates.

Upcoming information - stay tuned:

  • MonARCH Rocky Linux login and data transfer nodes;

  • How to submit jobs to Rocky compute nodes; and

  • How to request for new software for Rocky.

Important

New A100 GPU Nodes – September 2023

We are pleased to announce the availability of two A100 GPU nodes. The settings to use are:

#SBATCH --gres=gpu:A100:1

Important

Hardware Refresh Plan 2021-22 – Update: September 2023

Please be advised of the following update on MonARCH hardware refresh. Specifically, four nodes will be decommissioned on the 14th of July 2022. We had originally advised a later date for the shut off. Please see below for the updated schedule:

Compute Nodes

Capability

Decommissioned

hc[00-12]

Intel Xeon-E5-2680-v3

in 2021

hs[00-17]

Intel Xeon-E5-2667-v3

in 2021

gf[00-01]

Intel Xeon-E5-2680-v3

& NVIDIA K80 GPUs

in 2022

ge[00-01]

gp[00,03-08]

Intel Xeon-E5-2680-v4

& NVIDIA P100 GPUs

September 2023

gp[01,02]

Important

Hardware Refresh Plan 2021 – Update: 13 May 2021

Please be advised of the following hardware refresh schedule for 2021. These servers are now coming into end-of-life and will be retired this year.

Compute Nodes

Capability

To be retired by the

hc[00-12]

Intel Xeon-E5-2680-v3

end of May 2021

hs[00-17]

Intel Xeon-E5-2667-v3

end of May 2021

gf[00-01]

Intel Xeon-E5-2680-v3

& NVIDIA K80 GPUs

** To be confirmed **

ge[00-01]

gp[00-09]

Intel Xeon-E5-2680-v4

& NVIDIA P100 GPUs

middle of

November 2021

While this will result in a reduction of total CPU capacity for 2021, retiring these servers is necessary to make room for new and faster compute nodes, planned for Q3/Q4 2021 and 2022.

We will be enabling the appropriate mechanisms (e.g., SLURM reservation) to ensure that these nodes will be idle of running jobs prior to their retirement. Please check your job scripts to ensure they do not specify these nodes using --nodelist.

Important

Scheduled Outages

Planned dates for 2022 Maintenance will be:

  • To be announced.

See details at: https://docs.monarch.erc.monash.edu/scheduled-maintenance.html

MonARCH (Monash Advanced Research Computing Hybrid) is the next-generation HPC/HTC Cluster, designed from the ground up to address the emergent and future needs of the Monash HPC community.

A key feature of MonARCH is that it is provisioned through R@CMon, the Research Cloud @ Monash facility. Through the use of advanced cloud technology, MonARCH is able to configure and grow dynamically. As with any HPC cluster, MonARCH presents a single point-of-access to computational researchers to run calculations on its constituent servers.

MonARCH aims to continually develop over time. Currently, it consists of the following servers

  • mi* nodes are 36 core Xeon-Gold-6150 @ 2.70GHz servers wtih 158893MB usable memory

  • hc* nodes are 24 core Xeon-E5-2680-v3 @ 2.50GHz servers with 100550MB usable memory

  • hs* nodes are 16 core Xeon-E5-2667-v3 @ 3.20GHz servers with 100550MB usable memory

  • gp* nodes are 28 core Xeon-E5-2680-v4 @ 2.40GHz servers with 241660MB usable memory. Each server has two P100 GPU cards.

  • mk* nodes are 48 core Xeon-Platinum-8260 @ 2.4GHz servers with 342000M usable memory.

  • ge* baremetal nodes are 24 core Xeon-E5-2680-v3 @ 3.3GHZ servers with 257669M usable memory. Each server has eight K80 GPU processors (four boards with 2 K80 chips each).

  • gf* nodes are are 24 core Xeon-E5-2680-v3 @ 2.5GHz servers with 235980M usable memory. Each server has four K80 GPU processors (two boards with two K80 chips each).

  • hm00. This single node is 36 core Xeon-Gold-6150 @ 2.7GHz server with 1.4TB usable memory.

For data storage, we have deployed a parallel file system service using Intel Enterprise Lustre; providing over 300 TB usable storage with room for future expansion.

The MonARCH service is operated by the Monash HPC team and continuing technical and operational support from the Monash Cloud team, and eSolutions Servers-and-Storage, and Networks teams.

If you have found the MonARCH useful for your research, we will be very grateful if you kindly acknowledge us with a text along the lines of:

This research was supported in part by the Monash eResearch Centre and eSolutions-Research Support Services through the use of the MonARCH HPC Cluster.

MonARCH Documentation