Firstly follow the instructions in ‘3. Create a HPC cluster’ to SSH to the cluster:
pcluster ssh cfd -i cfd_ireland.pem
To run our jobs we need input data, which can either be your own model or those from the FDS verification/validation suite here.
So either upload to S3 and download (as shown in previous section to install FDS) or download it directly:
mkdir /fsx/fds-test/ wget https://github.com/firemodels/fds/archive/FDS6.7.4.tar.gz
You can see there are lots of cases to test but we’ll go for the MPI_Scaling_Tests folder.
tar -xf FDS6.7.4.tar.gz cd /fsx/fds-test/fds-FDS6.7.4/Validation/MPI_Scaling_Tests/FDS_Input_Files/
We have created a HPC cluster using AWS ParallelCluster that is based upon the Slurm scheduler. You can read more about Slurm here but in essence it allows us to submit the case to an arbitary number of cores and Slurm/AWS ParallelCluster will do the scheduling for you.
The first step is to run a submission script. An example setup script is shown below
#!/bin/bash #SBATCH --job-name=foam-96 #SBATCH --ntasks=96 #SBATCH --output=%x_%j.out #SBATCH --partition=compute #SBATCH --constraint=c5n.18xlarge source /fsx/fds-smv/bin/FDS6VARS.sh source /fsx/fds-smv/bin/SMV6VARS.sh module load intelmpi export OMP_NUM_THREADS=1 export I_MPI_PIN_DOMAIN=omp mpirun -np $SLURM_NTASKS fds strong_scaling_test_096.fds
In this script, we specify the number of cores e.g tasks to be 96 (x3 c5n.18xlarge). The partition line refers to the queues we created previously e.g compute and mesh. The constraint part is only needed if you have multiple compute options for each queue, however we could remove it given we have only one per queue.
The rest of the script is just to load the MPI version and source the FDS installation. Finally we have typical FDS commands where the number of cores is taken from the SBATCH lines at the top of the script.
Use sbatch command to submit the first script.
Other useful Slurm commands:
If you type squeue you should initially see the following which states that it’s in a queue.
[ec2-user@ip-10-0-0-22 FDS_Input_Files]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 9 compute fds-96 ec2-user CF 0:01 3 compute-dy-c5n18xlarge-[1-3]
In the back-end this is now sending a signal for EC2 instances to be created (which you can see via your EC2 console). It should take around 4-5 minutes for these to be launched. If you do not see this move to ‘R’ for running within 5 minutes, check your EC2 console and make sure you have requested an increase to your Service Quotas as described in previous sections.
You should now see:
[ec2-user@ip-10-0-0-22 FDS_Input_Files]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 9 compute fds-96 ec2-user R 0:01 3 compute-dy-c5n18xlarge-[1-3]
The case is setup to write out the various files and it should take approximately 2 minutes to go through the solution.
You can tar up these results (e.g SmokeView files) and bring them back to your machine. This is a typical workflow where end-users only need to bring back the end post-processing results rather than the whole simulation files.
tar -czvf results.tgz strong_scaling_test_096.smv
We can then copy it to S3:
aws s3 cp results.tgz s3://bucketname/
When you are finished make sure you then do the following, where the JOBID, can be seen by firstly typing ‘squeue’. If its empty then nothing is running:
This was just a basic example of FDS but hopefully it gives you an example of how it can be used on AWS.