Running FDS

Firstly follow the instructions in ‘3. Create a HPC cluster’ to SSH to the cluster:

 pcluster ssh cfd -i cfd_ireland.pem

Move simulation file from S3 bucket to the working directory

To run our jobs we need input data, which can either be your own model or those from the FDS verification/validation suite here.

So either upload to S3 and download (as shown in previous section to install FDS) or download it directly:

mkdir /fsx/fds-test/
wget https://github.com/firemodels/fds/archive/FDS6.7.4.tar.gz

You can see there are lots of cases to test but we’ll go for the MPI_Scaling_Tests folder.

tar -xf FDS6.7.4.tar.gz
cd /fsx/fds-test/fds-FDS6.7.4/Validation/MPI_Scaling_Tests/FDS_Input_Files/

Create a Slurm submit script

We have created a HPC cluster using AWS ParallelCluster that is based upon the Slurm scheduler. You can read more about Slurm here but in essence it allows us to submit the case to an arbitary number of cores and Slurm/AWS ParallelCluster will do the scheduling for you.

The first step is to run a submission script. An example setup script is shown below

 vi submit.sh
#!/bin/bash
#SBATCH --job-name=foam-96
#SBATCH --ntasks=96
#SBATCH --output=%x_%j.out
#SBATCH --partition=compute
#SBATCH --constraint=c5n.18xlarge

source /fsx/fds-smv/bin/FDS6VARS.sh
source /fsx/fds-smv/bin/SMV6VARS.sh

module load intelmpi

export OMP_NUM_THREADS=1
export I_MPI_PIN_DOMAIN=omp

mpirun -np $SLURM_NTASKS fds strong_scaling_test_096.fds

In this script, we specify the number of cores e.g tasks to be 96 (x3 c5n.18xlarge). The partition line refers to the queues we created previously e.g compute and mesh. The constraint part is only needed if you have multiple compute options for each queue, however we could remove it given we have only one per queue.

The rest of the script is just to load the MPI version and source the FDS installation. Finally we have typical FDS commands where the number of cores is taken from the SBATCH lines at the top of the script.

Submitting jobs to the scheduler

Use sbatch command to submit the first script.

sbatch submit.sh

Other useful Slurm commands:

  • squeue – shows the status of all running jobs in the queue.
  • sinfo – shows partition and node information for a system
  • srun – run an interactive job
  • scancel jobid – kill a Slurm job

If you type squeue you should initially see the following which states that it’s in a queue.

[ec2-user@ip-10-0-0-22 FDS_Input_Files]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
                 9   compute  fds-96 ec2-user CF       0:01      3 compute-dy-c5n18xlarge-[1-3]

In the back-end this is now sending a signal for EC2 instances to be created (which you can see via your EC2 console). It should take around 4-5 minutes for these to be launched. If you do not see this move to ‘R’ for running within 5 minutes, check your EC2 console and make sure you have requested an increase to your Service Quotas as described in previous sections.

You should now see:

[ec2-user@ip-10-0-0-22 FDS_Input_Files]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON) 
                 9   compute  fds-96 ec2-user R       0:01      3 compute-dy-c5n18xlarge-[1-3]

The case is setup to write out the various files and it should take approximately 2 minutes to go through the solution.

You can tar up these results (e.g SmokeView files) and bring them back to your machine. This is a typical workflow where end-users only need to bring back the end post-processing results rather than the whole simulation files.

 tar -czvf results.tgz strong_scaling_test_096.smv

We can then copy it to S3:

 aws s3 cp results.tgz s3://bucketname/

When you are finished make sure you then do the following, where the JOBID, can be seen by firstly typing ‘squeue’. If its empty then nothing is running:

 scancel JOBID

This was just a basic example of FDS but hopefully it gives you an example of how it can be used on AWS.