HIGH PERFORMANCE COMPUTING (HPC) & RESEARCH COMPUTING
ITS’ High Performance Computing group maintains various HPC (supercomputer) platforms and offers support for Mines faculty and students using using HPC systems in their research efforts. The goal of this service is to provide high quality HPC resources to help scientists do their science.
What we do:
In pursuit of its goal, the HPC group:
- Maintains several HPC platforms:
- Mio – Machine managed unde the condo paradigm (150+ Tflops)
- AuN (Golden) – Standard 144 node x86 based machine (50 Tflops)
- Wendian – Mines’ newest machine with 87 nodes, 5 with GPUS (350+ Tflops)
- Monitors the platforms for potential issues
- Installs user software on the HPC platforms
- Installs common libraries and community codes
- Maintains documentation
- Offers consulting services to enable more effective research
- Offers workshops covering HPC topics
- Provides help porting and optimizing applications
- Provides recommendations for more effective use HPC platforms
- Represents the Mines community at HPC related conferences
Where to find things?
This page describes how you can use Mines’ HPC resources, in particular:
- The module system for setting your environment
- Modules for compiling programs
- Compiling Programs
- Using the scheduler (Slurm)
The module system is commonly used on many HPC systems to help users set up their environment to run particular programs. The behavior on Linux systems is controlled by setting environmental variables. You can see the settings of all variables by running the command printenv. Arguably, the most important variables are PATH and LD_LIBRARY_PATH. PATH is a list of directories that can be searched for finding applications. Likewise, LD_LIBRARY_PATH is a list of directories that will be searched to find libraries used by applications. If you enter a command and see “command not found” then it is possible the directory containing the application is not in PATH. If an application can not find a library it the system will display a similar message. The module system is designed to easily set collections of variables. You can set a number of variables by loading a module Mines uses the lmod module system. The following description is taken from: https://lmod.readthedocs.io/en/latest/015_writing_modules.html
A Reminder of what Lmod is doing
All Lmod is doing is changing the environment. Suppose you want to use the ddt debugger installed on your system which is made available to you via the module. If you try to execute ddt without the module loaded you get:
[joeuser@mio001 ~]$ ddt bash: command not found: ddt [joeuser@mio001 ~]$ module load ddt [joeuser@mio001 ~]$ ddt
After the ddt module is loaded, executing ddt now works. Let’s remind ourselves why this works. If you try checking the environment before loading the ddt modulefile:
[joeuser@mio001 ~]$ env | grep -i ddt [joeuser@mio001 ~]$ module load ddt [joeuser@mio001 ~]$ env | grep -i ddt DDTPATH=/opt/apps/ddt/5.0.1/bin LD_LIBRARY_PATH=/opt/apps/ddt/5.0.1/lib:... PATH=/opt/apps/ddt/5.0.1/bin:... [joeuser@mio001 ~]$ module unload ddt [joeuser@mio001 ~]$ env | grep -i ddt
The first time we check the environment we find that there is no ddt stored there. But the second time there we see that the PATH and LD_LIBRARY_PATH have been modified. Note that we have shorten the path-like variables to show the important changes. There are also several environment variables which have been set.
Full documentation on the module system can be found at: https://lmod.readthedocs.io/en/latest/
Here are some examples:
- module spider
- will list all available modules
- module -r spider mpi
- List modules that are related to MPI
- module keyword gromacs
- List modules that are related to the gromacs program
- module load Apps/gromacs/5.1.2
- Load the module for gromacs – this will enable you to run gromacs
- module purge
- Unload all modules
- module list
- List currently loaded modules
1] Not all applications are accessible by all HPC users. Some codes are commercial and require licensing, and hence PI approval. Some require PI approval for other reasons. If you are unable to load a module, or see permission errors when executing a job, and would like to know how you might obtain access, please submit a help request.
2] Some modules have dependencies that need to be manually entered. For example, the gromacs module requires that modules for the compiler and MPI be loaded first. If there is an unsatisfied dependency you will be notified. A list of modules are available on the web pages:
- AuN Modules
- Mio Modules
- Wendian Modules
Modules for Compilers
The primary compilers on AuN, Mio, and Wendian are from the Intel and gnu (gcc) suites. For most parallel applications an MPI is also needed. MPI compilers actually require a backend compiler, again normally either Intel or gcc based. The gnu compilers are:
- gcc – C
- g++ – C++
- gfortran – fortran
The Intel compilers are:
- icc -C
- icpc – C++
- ifort – fortran
The default version of the gnu compilers is rather old 4.x. Newer versions of the compilers are available via a module load. For example:
[joeuser@mio001 ~]$ gcc -v 2>&1 | tail -1 gcc version 4.4.7 20120313 (Red Hat 4.4.7-18) (GCC) [joeuser@mio001 ~]$ module load PrgEnv/devtoolset-6 [joeuser@mio001 ~]$ gcc -v 2>&1 | tail -1 gcc version 6.2.1 20160916 (Red Hat 6.2.1-3) (GCC) [joeuser@mio001 ~]$ gfortran -v 2>&1 | tail -1 gcc version 6.2.1 20160916 (Red Hat 6.2.1-3) (GCC) [joeuser@mio001 ~]$ g++ -v 2>&1 | tail -1 gcc version 6.2.1 20160916 (Red Hat 6.2.1-3) (GCC)
The 2018 version of the Intel compilers can be loaded:
[joesuer@mio001 ~]$ module load Compiler/intel/18.0 [joesuer@mio001 ~]$ ifort -V Intel(R) Fortran Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 184.108.40.206 Build 20171018 Copyright (C) 1985-2017 Intel Corporation. All rights reserved. [joesuer@mio001 ~]$ icpc -V Intel(R) C++ Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 220.127.116.11 Build 20171018 Copyright (C) 1985-2017 Intel Corporation. All rights reserved. [joesuer@mio001 ~]$ icc -V Intel(R) C Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 18.104.22.168 Build 20171018 Copyright (C) 1985-2017 Intel Corporation. All rights reserved.
When using the Intel compilers if a program uses some of the newer features of C++ it will need to reference libraires associated with the newer version of g++. So you may want to load the newer version of the gnu compilers.
[joeuser@mio001 ~]$ module purge [joeuser@mio001 ~]$ module load PrgEnv/devtoolset-6 [joeuser@mio001 ~]$ module load MPI/openmpi/3.0.0/gcc [joeuser@mio001 ~]$ which mpicc /sw/compilers/mpi/openmpi/3.0.0/gcc/bin/mpicc [joeuser@mio001 ~]$ which mpic++ /sw/compilers/mpi/openmpi/3.0.0/gcc/bin/mpic++ [joeuser@mio001 ~]$ which mpif90 /sw/compilers/mpi/openmpi/3.0.0/gcc/bin/mpif90 [joeuser@mio001 ~]$ which mpif77 /sw/compilers/mpi/openmpi/3.0.0/gcc/bin/mpif77 [joeuser@mio001 ~]$
To build with Intel compilers and Openmpi
[joeuser@mio001 ~]$ module purge [joeuser@mio001 ~]$ module load PrgEnv/devtoolset-6 [joeuser@mio001 ~]$ module load Compiler/intel/18.0 [joeuser@mio001 ~]$ module load MPI/openmpi/3.0.0/intel [joeuser@mio001 ~]$ which mpicc /sw/compilers/mpi/openmpi/3.0.0/intel/bin/mpicc [joeuser@mio001 ~]$ which mpic++ /sw/compilers/mpi/openmpi/3.0.0/intel/bin/mpic++ [joeuser@mio001 ~]$ which mpif90 /sw/compilers/mpi/openmpi/3.0.0/intel/bin/mpif90 [joeuser@mio001 ~]$ which mpif77 /sw/compilers/mpi/openmpi/3.0.0/intel/bin/mpif77 [joeuser@mio001 ~]$
To build with gnu compilers and Intel MPI
[joeuser@mio001 ~]$ module purge [joeuser@mio001 ~]$ module load PrgEnv/devtoolset-6 [joeuser@mio001 ~]$ module load MPI/impi/2018.1/gcc [joeuser@mio001 ~]$ which mpicc /sw/compilers/intel/2018/compilers_and_libraries_2018.1.163/linux/mpi/bin64/mpicc [joeuser@mio001 ~]$ which mpicxx /sw/compilers/intel/2018/compilers_and_libraries_2018.1.163/linux/mpi/bin64/mpicxx [joeuser@mio001 ~]$ which mpif77 /sw/compilers/intel/2018/compilers_and_libraries_2018.1.163/linux/mpi/bin64/mpif77 [joeuser@mio001 ~]$ which mpif90 /sw/compilers/intel/2018/compilers_and_libraries_2018.1.163/linux/mpi/bin64/mpif90 [joeuser@mio001 ~]$
You can verify that the gnu compilers are being used as the backend
[joeuser@mio001 ~]$ mpicc -v 2>&1 | tail -1 gcc version 6.2.1 20160916 (Red Hat 6.2.1-3) (GCC) [joeuser@mio001 ~]$
To build with Intel compilers and Intel MPI
[joeuser@mio001 ~]$ module purge [joeuser@mio001 ~]$ module load PrgEnv/devtoolset-6 [joeuser@mio001 ~]$ module load Compiler/intel/18.0 [joeuser@mio001 ~]$ module load MPI/impi/2018.1/intel [joeuser@mio001 ~]$ [joeuser@mio001 ~]$ [joeuser@mio001 ~]$ which mpicc /sw/compilers/intel/2018/compilers_and_libraries_2018.1.163/linux/mpi/bin64/mpicc [joeuser@mio001 ~]$ which mpif90 /sw/compilers/intel/2018/compilers_and_libraries_2018.1.163/linux/mpi/bin64/mpif90 [joeuser@mio001 ~]$ which mpif77 /sw/compilers/intel/2018/compilers_and_libraries_2018.1.163/linux/mpi/bin64/mpif77 [joeuser@mio001 ~]$ which mpicxx /sw/compilers/intel/2018/compilers_and_libraries_2018.1.163/linux/mpi/bin64/mpicxx [joeuser@mio001 ~]$ which mpiicc /sw/compilers/intel/2018/compilers_and_libraries_2018.1.163/linux/mpi/bin64/mpiicc [joeuser@mio001 ~]$
You can verify that the Intel compilers are being used as the backend
[joeuser@mio001 ~]$ mpicxx -v mpiicpc for the Intel(R) MPI Library 2018 Update 1 for Linux* Copyright(C) 2003-2017, Intel Corporation. All rights reserved. icpc version 18.0.1 (gcc version 6.2.1 compatibility)
Compiling for various processors with the Intel compilers
Mines’s HPC resources AuN, Mio, and Wendian, contain various generations of Intel Processors. AuN contains 16 core nodes with Sandybridge processors. Mio contains: Nehalem, Westmere, Sandybridge, Ivybridge, Haswell, Broadwell and Skylake (oldest to newest) processors. Wendian uses Skylake processors. This is important because newer generation processors have some instructions that will not work on older processors. The newer instructions offer optimizations for the newer processors. Some of these optimizations allow for significantly faster speed for some operations, in particular when working on arrays or vectors. When a program is built the Intel compilers detect what generation processor is being used to compile the code and will include the latest instructions for that processor. If the code is then run on an older processor it might return an illegal instruction error. Or if it is run on a newer processor it will not take advantage of the increased functionality. Applications built on Wendian may not run on Mio or AuN because Wendian’s head nodes contains Skylake processors. It is possible to:
- Build an application with the lowest common instruction set so it will run on all processors
- Build an application so that it contains multiple sets of instructions so it will use advanced features on processors when available
- Build an application so that it will only run on specific (or newer) processors
Here are two options that control the instruction set used for a compile. -ax and -march These command have the same sub-options, specifying which processor to target. However, they work differently. The -ax sub-options are additive. You can specify multiple processors and the binary will contain instructions for each. This will create programs that can be larger but it will run well on all specified processors. The -march option will only take a single sub-option. This will create programs that can run well on the specified generation processor or newer but will most likely not run on older processors. The table below shows the various generations of processors on Mines’ platforms and the important extra instructions that are added in each generation. With the exception of fma these are advanced versions of vector instructions.
|SUBOPTION||Adding instructions for Processor|
|-ax=SSE4.2||Nehalem or newer|
|-ax=AVX||Sandybridge or newer|
|-ax=SANDYBRIDGE||Sandybridge or newer|
|-ax=IVYBRIDGE||Ivybridge or newer|
|-ax=HASWELL||Haswell or newer|
|-ax=BROADWELL||Broadwell or newer|
|-ax=CORE-AVX-I||Sandybridge or newer|
|-ax=CORE-AVX-2||Ivybridge or newer|
As noted above, -ax options are additive. You can specify -ax=SSE2,CORE-AVX512 and the code should run on any processor but it will also use the Skylake instruction set if run on a node that supports it. Since the -ax options are additive, if you specify -ax=CORE-AVX512 the code will contain the default instruction set which will run anywhere but will also contain Skylake specific instructions that will be used if run on that processor. As you add more options the code will grow in size. For example, a common chemistry code was 44 Mbytes with -ax=SKYLAKE-AVX512 and 24 Mbytes without the option. The table below shows the options for march
|-march=corei7||nehalem or newer|
|-march=core-avx-i||sandybridge or newer|
|-march=sandybridge||sandybridge or newer|
|-march=ivybridge||ivybridge or newer|
|-march=haswell||haswell or newer|
|-march=core-avx2||haswell or newer|
|-march=broadwell||broadwell or newer|
|-march=core-avx-i||sandybridge or newer|
|-march=core-avx-2||ivybridge or newer|
The -march options are not additive. You can only specify one and you can not use both the -ax and -march options. Note: if you build an application on Wendian and don’t specify either -ax or -march it will in default to effectively -march=skylake-avx512 and the application will most likely not run on Mio or AuN. Here are some example compiles with notes on where the apps will run and the size of the application.
#Build on Mio to run anywhere [tkaiser@mio001 hybrid]$ icc phostone.c [tkaiser@mio001 hybrid]$ ls -lt a.out -rwxr-x--x 1 tkaiser tkaiser 120095 Sep 7 13:18 a.out #Build on Mio to run anywhere but include skylake instructions [tkaiser@mio001 hybrid]$ icc -ax=SKYLAKE-AVX512 phostone.c -o runs_well [tkaiser@mio001 hybrid]$ ls -lt a.out -rwxr-x--x 1 tkaiser tkaiser 140969 Sep 7 13:18 a.out #Build on Mio to run anywhere but include skylake instructions, same as above [tkaiser@mio001 hybrid]$ icc -ax=SKYLAKE-AVX512,SSE2 phostone.c [tkaiser@mio001 hybrid]$ ls -lt a.out -rwxr-x--x 1 tkaiser tkaiser 140969 Sep 7 13:19 a.out #Build on Mio but can only run on skylake nodes [tkaiser@mio001 hybrid]$ icc -march=skylake-avx512 phostone.c -o skylake_only [tkaiser@mio001 hybrid]$ ls -lt a.out -rwxr-x--x 1 tkaiser tkaiser 125392 Sep 7 13:19 a.out #Trying to run a skylake code on Mio returns an error [tkaiser@mio001 hybrid]$ srun -N 1 --tasks-per-node=8 ./a.out srun: job 4393558 queued and waiting for resources srun: job 4393558 has been allocated resources srun: error: compute030: tasks 0-7: Illegal instruction (core dumped) [tkaiser@mio001 hybrid]$
Compiling for various processors with the gcc compilers
The gcc compilers support the -march option as described under Intel Compilers with the following sub-options.
- Core 2 CPU
- Nehalem CPU
- Westmere CPU
- Sandy Bridge CPU
- Ivy Bridge CPU
- Haswell CPU
- Broadwell CPU
- Skylake CPU
Using the scheduler (Slurm)
Access to compute resources on Mines’ HPC platforms in managed via a scheduler. That is, to run on a compute node a user normally would create a script. The script is submitted to the scheduler using the sbatch command. The scheduler will launch the script on compute resources when they are available. The script consists of two parts, instructions for the scheduler and the commands that the user wants to run.
Here is a simple example:
#!/bin/bash #SBATCH --job-name="sample" #SBATCH --nodes=2 #SBATCH --ntasks-per-node=4 #SBATCH --ntasks=8 #SBATCH --exclusive #SBATCH --export=ALL #SBATCH --time=01:00:00 cd $HOME ls > myfiles srun hostname
The lines that begin with #SBATCH are instructions to the scheduler. In order:
1. #SBATCH –job-name=”sample”: You are naming the job.
2. #SBATCH –nodes=2 : You are telling the scheduler that you want to run on two nodes
3. #SBATCH –ntasks-per-node=4: You want to run four tasks per node for a total of 8 tasks. (The —ntasks-per-node line is redundant in this case.)
4. #SBATCH –exclusive: When the job runs you will have exclusive access to the nodes, i.e. no other jobs can run on that node.
5. #SBATCH –export=ALL : All your environmental variables setting will passed to the compute nodes.
6.#SBATCH –time=01:00:00: You will run no longer than 1 hr.
The last three lines are normal commands. In order: You will be put in your home directory. A directory listing will be put in the file myfiles. Finally, the srun command will launch the program hostname in parallel, in this case 8 copies will be started simultaneously. Note that the “ls” command is not run in parallel; only a single instance will be launched.
The script is launched using the sbatch command. By default the standard output and standard error will be put in files with the names slurm-######.out and slurm-######.err where ###### is a job number. For example running this script on AuN produces:
[joeuser@aun001 ~]$ sbatch dohost Submitted batch job 363541 After some time... [joeuser@aun001 ~]$ ls -lt | head total 88120 -rw-rw-r-- 1 joeuser joeuser 64 Sep 24 16:28 slurm-363541.out -rw-rw-r-- 1 joeuser joeuser 2321 Sep 24 16:28 myfiles ... [joeuser@aun001 ~]$ cat slurm-363541.out node001 node002 node001 node001 node001 node002 node002 node002
There are several user level commands available for managing jobs on the HPC resources. These are discussed briefly by the How do I manage jobs? page under the FAQ menu item.
Basic scripting and job submission are discussed under the FAQ menu in How do I do a simple build and run?
If you want to select specific nodes, such as nodes belonging to a particular group or with a particular type of processor see: FAQ How do I select MY nodes?
There is a guide for creating complex scripts, including how to run parameter sweep jobs (array jobs), chained jobs, and scripts that self- document under the FAQ I want to run complex scripts; any advice?
Full documentation for the slurm scheduler can be found at the Slurm Home Page.
- Nodes on Wendian are available for purchase.
- Node purchases are Mines-subsidized for a per-node cost of $8500 / $10427 (high mem).
- When the number of PI-purchased nodes reaches 75% of the total, new nodes will be added.
- The percent of nodes kept from private (PI) purchase will remain at or above 25% of the total.
- New nodes acquired at policy-dictated intervals will be of the current generation.
- Queue management of this hybrid environment is outlined below, with periodic re-evaluation as HPC community experience evolves:
- There are two types of queue:
- Group: Comprises QoSes of nodes purchased by PIs;
- Full: Comprises all nodes on machine (including purchased nodes).
- There are two types of queue:
- Allocations on Wendian are by proposal.
- Allocations are awarded in fixed core-hours; upon expiration or depletion, priority for running jobs will decrease.
- Allocations will not be debited if users run jobs on their set of purchased nodes.
- Allocations will be charged for 36 core per node by jobs run as ‘exclusive’, regardless of cores used.
- A default amount of memory is set per job. Users are encouraged to request only the amount of memory needed by a job.
- Allocations will be charged based on the higher of two metrics; number of cores or amount of memory used; for nonexclusive jobs.
- Wendian will have approximately 1 Pbyte (1000 Tbytes) of storage with the majority in scratch. Additional storage is available for research groups to purchase. Files stored in owned storage will not be purged, and are NOT BACKED UP.
- AuN is fully supported by Mines.
- Compute time will not be charged against allocations.
- Neither new allocations nor new users will be added to AuN, unless a specific request is accompanied by a proposal for Wendian.
- Tech Fee allows students who are not supported by a researcher to run on Mio. Use of Mio in this capacity precludes faculty from authorship of papers based on associated research.
- Research group members (students and faculty) may run on Mio if their research group has purchased nodes.
- Running on nodes outside of a research group’s partition (running in compute) exposes said job(s) to risk of pre-emption.
- Nodes on Mio that fail outside of warranty will be retired.
- Neither new nodes nor new research groups will be added to Mio.
- Directories of users who leave Mines will be deleted after 3 months. It is the PI’s responsibility to archive any desired data before that time.
- Directories inactive and not accessed for 1 year are subject to deletion. (Mines retains the right to remove directories should circumstances mandate).
- Wendian will have approximately 1 Pbyte (1000 Tbytes) of storage with a majority of the storage in scratch. Research groups may “purchase” additional storage. Files stored in owned storage will not expire and are not backed up.