Configure ParallelCluster for Graviton2

In this section we’ll modify the configuration script you created in the previous section to be able to launch a Graviton2 (Arm) cluster. Whilst at the time of writing not all CFD codes support Arm compilation, OpenFOAM does and it can offer up to 40% price/performance improvements.

Modify the AWS ParallelCluster Config File

Your config file should look something like the one below (do not use the subnet/vpc names I’ve listed below but keep your own).

[aws]
aws_region_name = eu-west-1

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

[global]
cluster_template = default
update_check = true
sanity_check = true

[cluster default]
key_name = cfd_ireland
scheduler = slurm
master_instance_type = c5n.large
base_os = alinux2
vpc_settings = default
queue_settings = compute,mesh
s3_read_write_resource = *
dcv_settings = default
fsx_settings = fsxshared

[fsx fsxshared]
shared_dir = /fsx
storage_capacity = 1200
deployment_type = PERSISTENT_1
storage_type = SSD
per_unit_storage_throughput = 100
daily_automatic_backup_start_time = 00:00
automatic_backup_retention_days = 30

[dcv default]
enable = master

[vpc default]
vpc_id = vpc-xxxxxxxxxxxxx
master_subnet_id = subnet-xxxxxxxxxxx
compute_subnet_id = subnet-yyyyyyyyyy
use_public_ips = false

[queue compute]
enable_efa = true
placement_group = DYNAMIC
disable_hyperthreading = true
compute_type = ondemand
compute_resource_settings = default

[compute_resource default]
instance_type = c5n.18xlarge
min_count = 0
max_count = 10

[queue mesh]
placement_group = DYNAMIC
disable_hyperthreading = true
compute_type = ondemand
enable_efa = false
compute_resource_settings = defaultmesh

[compute_resource defaultmesh]
instance_type = m5.24xlarge
min_count = 0
max_count = 10

It should be modified to the following (making sure to make a new version e.g cp ~/.parallelcluster/config ~/.parallelcluster/config-arm):

[aws]
aws_region_name = eu-west-1

[aliases]
ssh = ssh {CFN_USER}@{MASTER_IP} {ARGS}

[global]
cluster_template = default
update_check = true
sanity_check = false

[cluster default]
key_name = cfd_ireland
scheduler = slurm
master_instance_type = m6g.large
base_os = alinux2
vpc_settings = default
queue_settings = compute,mesh
s3_read_write_resource = *
dcv_settings = default
fsx_settings = fsxshared

[fsx fsxshared]
shared_dir = /fsx
storage_capacity = 1200
deployment_type = PERSISTENT_1
storage_type = SSD
per_unit_storage_throughput = 100
daily_automatic_backup_start_time = 00:00
automatic_backup_retention_days = 30

[dcv default]
enable = master

[vpc default]
vpc_id = vpc-xxxxxxxxxxxxx
master_subnet_id = subnet-xxxxxxxxxxx
compute_subnet_id = subnet-yyyyyyyyyy
use_public_ips = false

[queue compute]
enable_efa = true
placement_group = DYNAMIC
compute_type = ondemand
compute_resource_settings = default

[compute_resource default]
instance_type = c6gn.16xlarge
min_count = 0
max_count = 10

[queue mesh]
placement_group = DYNAMIC
compute_type = ondemand
enable_efa = false
compute_resource_settings = defaultmesh

[compute_resource defaultmesh]
instance_type = r6g.16xlarge
min_count = 0
max_count = 10

To make this compatible with Graviton2 instances we simply change the instance types such that your config file should be now the following:

{% notice info %} Please note if you are using ParallelCluster 2.10.1, a current bug requires ‘sanity_check = false’ to bypass an issue, however for 2.10.2 (which is what is suggested in this workshop) you do not need to do this. {% /notice %}

So lets go through the changes:

  • Removing disable_hyperthreading = true
    • On Graviton2, the number of vCPUs equals the physical CPUs so there is no need to switch off hyperthreading
  • master_instance_type = m6g.large for headnode
    • The m6g.large has 4GB RAM with 2 CPUs, which is similar to the c5n.large and should be enough just to allow SSH access and compilation.
  • instance_type = c6gn.16xlarge for solve
    • The c6gn.16xlarge has 128GB RAM with 64 CPUs and EFA so it’s very similar to the previous c5n.18xlarge configuration but based on an Arm-based platform.
  • instance_type = r6g.16xlarge for meshing
    • The r6g.16xlarge has 512GB RAM with 64 CPUs so it’s perfect for when you need higher RAM for perhaps a serial domain decomposition for a large mesh

Final checks

Before proceeding, compare your final config file to the one below and then save it. Your file should have similar values, except in places with user customization (e.g. - key_name) or account specifics (e.g. – vpc settings).