Application development (Hexagon)From Hpcdoc
ModulesEnvironment Modules allows you to dynamically modify your user environment by using information provided by "modulefiles". This make it easy to change between environments or settings, e.g. the Intel compiler environment and the PGI compiler environment. If you have problems during compiling, running the "module list" command could help you see if you have missing or wrong environment modules loaded. When writing a PBS job script (see Job execution for more information), the wanted environment has to be set inside the script using the modules command. The reason for this is that the user environment is not inherited by the PBS script. The same applies for interactive jobs (i.e. qsub -I). The "module" command have several subcommands, e.g. "module avail". The following list shows some of the subcommands used with "module".
To load the netcdf module into your environment you type: module load netcdf If you want a specific version of the module you instead specify: module load netcdf/3.6.2 Please avoid using version numbers unless strictly necessary since older versions of packages may be removed at a later time. If you want to change from the PGI compiler (default) to the Intel compiler you type: module swap PrgEnv-pgi PrgEnv-intel You should also use swap if you want to load a different version of the same module, this will e.g. replace your current pgi version with 8.0.6: module swap pgi pgi/8.0.6 A complete list of subcommands can be found in the module man page or here. Please note, if the module command does not work inside your job scripts, add the line "export -f module" to your ~/.bashrc file. This should be automatically set for new users and is only valid if your shell is bash. For other shells you may source the corresponding file in /opt/modules/default/init/ inside your qsub script before you use any "module" command. Compilers and programming languagesFour different compilers are available on Hexagon:
All compilation for compute nodes must be done using compiler wrappers. To switch between compilers module command must be used: module switch PrgEnv-pgi PrgEnv-gnu By default the latest available version will be loaded. You can switch to another compiler version with e.g.: module switch pgi pgi/8.0.6 How to invoke the compilerCompiling an application for use on the compute node should be done by the wrappers specified below. Running the command "module list" will give you one entry like "PrgEnv-###", where ### is either pgi, pathscale, gnu or intel. Compiling programs for compute nodesWhen using the compiler wrappers, the wrappers take care of MPI and all additional modules switches/settings automatically.
NOTE: These wrappers also handles MPI and openMP, so you should not compile with mpicc, mpif90 or similar, nor should you need to add any reference to MPI libraries in CFLAGS or similar variables. Compiling the C program test.c can be done by the command: cc -o test.out test.c Where test.out is my selected name of the executable file. Compiling programs for login nodesWhen compiling for the login node the executable will not be able to run on the compute nodes, neither will OpenMP or MPI be supported. The general rule in this case is to call the compiler directly (like pgcc for PGI). NOTE: You can compile code for login nodes using compute node wrappers, just keep in mind that in this case you will include MPI and other libraries which are loaded as modules. Compiler version(s)Currently installed Programming Environments for compilers:
Frequently used compiler optionsCompiling OpenMP programs To activate OpenMP directives, compile and link with Fortran:
C and C++:
Recommended compiler optionsNormally if you use compiler wrappers all recommended options will be included. In some cases you may need to use "--enable-static" during configure for running on compute nodes. Usefull optimization flags for quadcore When using PGI the "-tp barcelona-64" flag will improve the performance of your code. For Pathscale the flag for optimizing for quadcore is "-march=barcelona". These options are automatically provided by the module xtpe-barcelona. Recommended environment variable settingsWe recommend you to have the module xtpe-barcelona loaded. It will automatically add recommended optimization flags. Additionally, the "xt-libsci" module contains optimized versions of common scientific/math libraries (e.g. LAPACK, BLAS). Debugging toolsList of tools and usage summarySeveral tools are available on hexagon for debugging. ATPAbnormal Termination Processing (ATP) is a system that monitors Cray XT System user applications, and should an application take a system trap, ATP preforms analysis on the dying application. With release 1.0 all of the stack backtraces of the application processes are gathered into a merged stack backtrace tree and written to disk as the file "atpMergedBT.dot". The stack backtrace for the first process to die is sent to stderr as is the number of the signal that caused the death. You can load ATP environment with: module load atp Further information on ATP can found in the intro_atp man page. man intro_atp LgdbThis gdb based debugger and launcher allows users to attach to and debug codes which execute multiple processes or threads. You can load lgdb environment with: module load xt-lgdb Usage documentation can be found in the manpage: man lgdb The following example shows how to connect to an already running program: qstat -f JOBID | grep exec_host ssh loginX #take from exec_host of previous command ps x | grep aprun # find your aprun module load xt-lgdb # to connect to the first rank lgdb --pes=0 --pid=APRUNPID # You use APRUNPID from ps x command above # to connect to a list of ranks (from first to 8th) lgdb --pes=0-7 --pid=APRUNPID # You use APRUNPID from ps x command above TotalViewTotalView is a graphical, source-level, multiprocess debugger. License is limited to the number of cores. Maximum is 66. When using this debugger you need to turn on X-forwarding, which is done when you login via ssh. This is done by adding the -Y on newer ssh version, and -X on older. Following is an example of using a new version of ssh. ssh -Y username@hexagon.bccs.uib.no If you don't know if you have an old or new version of ssh, you should run "man ssh" and look for an explanation of "-X" and/or "-Y". The program you want to debug has to be compiled with the debug option. Normally this is the "-g" option, but that depends on the compiler. The executable from this compilation will in the following examples be called "filename". First, load the totalview module to get the correct environment variables set: module load xt-totalview To start debugging run: totalview "filename" Which will start a graphical user interface. Once inside the debugger, if you cannot see any source code, and keep the source files in a separate directory, add the search path to this directory via the main menu item File->Search path. Source lines where it is possible to insert a breakpoint are marked with a box in the left column. Click on a box to toggle a breakpoint. Double clicking a function/subroutine name in a source file should open the sourcefile. You can go back to the previous view by clicking on the left arrow on the top of the window. The button "Go" runs the program from the beginning until the first breakpoint. "Next" and "Step" takes you one line / statement forward. "Out" will continue until the end of the current subroutine/function. "Run to" will continue until the next breakpoint. The value of variables can be inspected by right clicking on the name, then choose "add to expression list". The variable will now be shown in a pop up window. Scalar variables will be shown with their value, arrays with their dimensions and type. To see all values in the array, right click on the variable in the pop up window and choose "dive". You can now scroll through the list of values. Another useful option is to visualize the array: after choosing "dive", open the menu item "Tools->Visualize" of the pop up window. If you did this with a 2D array, use middle button and drag mouse to rotate the surface that popped up, shift+middle button to pan, Ctrl+middle button to zoom in/out. Running totalview inside the batch system (compute nodes) qsub -I -l mppwidth=[#procs],walltime=[time] -A [account] -j oe -X mkdir -p /work/$USER/test_dir cp $HOME/test_dir/a.out /work/$USER/test_dir cd /work/$USER/test_dir module load xt-totalview totalview aprun -a -n [#procs] ./a.out Replace [#procs] with the core-count for the job. Note that totalview is licensed for a limited amount of cores. Note: When totalview starts it will get 'aprun' up first. Click GO and YES.) The users guide for totalview can be found here. Application optimizationPerformance optimization. General recommendations.Compilation flags and environment settingsCorrect optimization flags will be automatically selected if you use compiler wrappers and module xtpe-barcelona. Recommended optimized librariesThe following modules are optimized by Cray and are therefore recommended to use:
Correct use of file systemsThere is no local disk available on the compute nodes. Only a shared file system is available - /work file system, which is a Lustre FS. Note that this file system is not optimized to be accessed as a local scratch. Please avoid having small read/writes per chunk, instead replace the access pattern with bigger chunks, creating well-formed IO. Performance analysisList of tools and usage summaryCrayPatThe Cray performance analysis tool. CrayPat is a performance analysis tool for evaluating program execution on Cray systems. CrayPat consists of three major components:
Example:
module load xt-craypat apprentice2
make clean make
pat_build -O apa a.out This will create an executable "a.out+pat".
This will create the file "a.out+pat+<*>.xf".
pat_report a.out+pat+<*>.xf > my_report.txt This command will automatically create a report file "a.out+pat+<*>.ap2", which can be viewed by Apprentice2.
pat_build -O a.out+pat+<*>.apa This will create an executable "a.out+apa".
export PAT_RT_MPI_SYNC=0 export PAT_RT_HWPC=[2|3|...] Running this instrumented application will create a file "a.out+apa+<*>.xf".
pat_report a.out+apa+<*>.xf > my_hwcp_report.txt This command will automatically create a report file "a.out+apa+<*>.ap2", which can be viewed by Apprentice2. The command will also create a new text file in ascii format: "my_hwcp_report.txt"
app2 a.out+pat+<*>.ap2 & -for visualizing sampling results app2 a.out+apa+<*>.ap2 & -for visualizing hardware counting results Apprentice2 generates a variety of interactive graphical reports. For more info, see man. man app2 This summary is based on the slides of Luiz DeRose at the Cray XT4 workshop. More information can be found in the corresponding manpages or at http://docs.cray.com. IPMYou can find the short version for hexagon below. Loading the module will add all requirement libraries for linking into cc wrapper. module load ipm cc -o a.out main.c Next time you execute your binary it will generate IPM report. To parse results into HTML: module load ipm ipm_parse -html IPM_result_file.0 A deeper IPM usage is covered on NOTUR pages. Parallel applicationsMPIHexagon has wrappers that should be used when compiling programs for the compute nodes. More information about the wrappers can be found here. These wrappers handle MPI automatically, by using a module called xt-mpt. MPT is based on mpich2. If you want to change from the default PGI compiler to GNU, PathScale or Intel you can do that by changing the PrgEnv module. This is done by using modules. Not all MPI-2 features are supported, for a complete list - see: man intro_mpi OpenMPAt hexagon you can run OpenMP jobs within the node, i.e. on maximum 4 cores/processors. Since hexagon is to be used for jobs with high core-counts the use of pure OpenMP is discouraged, see below for an explanation of MPI/OpenMP hybrid. To activate openMP directives, compile with Fortran:
C and C++:
In the batch-script set (replace "threads_per_node" with 1-4) #PBS -l mppnppn=1,mppwidth=1,mppdepth=threads_per_node export OMP_NUM_THREADS=threads_per_node This number should correspond to aprun ... -d threads_per_node ... Hybrid MPI/OpenMPYou can run a hybrid MPI + OpenMP job where MPI is used between the nodes and OpenMP within the node. No special compiler directives are needed to activate MPI, but to activate the OpenMP directives, compile and link with the following. Fortran:
C and C++:
In the batch-script set #PBS -l mppnppn=mpi_processes_per_node #PBS -l mppdepth=threads_per_node #PBS -l mppwidth=number_of_nodes export OMP_NUM_THREADS=threads_per_node These numbers should correspond to aprun ... -n number_of_nodes -d threads_per_node ... Note: the mppnppn and mppdepth values must be chosen such that mppnppn x mppdepth <= 4. Checkpoint and restart of applicationsTo use the checkpointing feature the application must be compiled with blcr and Cray MPT version 3.0.1 and up: module load blcr With the module loaded, all necessary options will be automatically added by the compiler wrapper. Please recompile your application to include the blcr support. Note that only MPI and SHMEM programming models are supported. The Cray checkpoint/restart solution uses the BLCR software from Berkley Lab's and inherits its limitations. For more information, refer to the BLCR documentation: http://upc-bugs.lbl.gov/blcr/doc/html/index.html. The job must be submitted with the "-c enabled" parameter. Please see Job execution (Hexagon)#List of useful job script parameters. Recommended readingCray XT Programming Environment User's Guide - contains everything needed to start to work with examples on the Cray XT machine. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||