ARSC HPC Users' Newsletter 313, April 8, 2005
ARSC Faculty Camp 2005
Faculty Camp teaches skills ranging from programming and data visualization to collaborative environments. It is a series of seminars and hands-on experiences presented by ARSC staff, UAF/ARSC Joint Faculty, and current ARSC users. "Camp" commences on August 1st, and runs for three weeks.
All ARSC users are invited to apply, but Faculty Camp is geared toward UA affiliated faculty and researchers. Applicants must submit a description of the skills they want to develop and a project (using ARSC resources) they intend to pursue. The application deadline is May 18th. If you're interested, you'll find more information at:
Also, we encourage you to attend the Faculty Camp Informational Meeting (refreshments will be provided!):
Tues. April 26, 1-2pm, West Ridge Research Building (WRRB) room 103
New Training Opportunities
ARSC welcomes Simone Sbaraglia from IBM's Advanced Computing Technology Center (ATCT) to teach several classes on IBM performance analysis tools. All users are encouraged to attend and/or schedule one-on-one time.
All three classes are scheduled for: ==================================== Time: 9:00 am - Noon Location: West Ridge Research building (WRRB), Room 009 Topics: ==================================== Monday Apr. 18: "HPC Toolkit" Tuesday Apr. 19: "Sigma: An Infrastructure for Performance Analysis using Symbolic Specification" Wednesday Apr. 20: "Totalview Tutorial"
For complete course descriptions, see:
Simone will also be available to discuss performance analysis and optimization techniques with individual users. For more information on the classes or to schedule a consultation, email Tom Logan or call him at: 907-450-8624.
IBM XLF: Floating Point Exceptions and Traps
The IBM XL Fortran compiler allows floating point exception traps to be enabled using the flag "-qflttrap". When a specified floating point exception is detected, a trap signal (SIGTRAP) will be generated. By default this will cause the program to core dump, however the compiler flag "-qsigtrap" allows this behavior to be altered.
Several different exception handlers are available including:
* xl__ieee displays the floating point exception error message, the contents of the floating point registers, and a stack trace to stderr then continues execution of the program. This handler supplies the default IEEE result for the exception, so the results will be identical to results without "-qflttrap" enabled. * xl__trce displays the floating point exception error message, the contents of the floating point registers, and a stack trace to stderr then terminates the program after the first exception without dumping core. * xl__trcedump acts the same as xl__trce but also produces a core dump.
There are other exception handlers available that require changes to source code to use. The aforementioned trap handlers only require that the code be recompiled with the proper compiler flags.
I'll use the following sample Fortran code to demonstrate these features:
iceberg2 1% cat float_except.f ! ================================================================== program test implicit none real :: val1, val2, val3 val1 = 0.0 val2 = 0.0 val3 = 0.0 call divide_by_zero(val1) print *,"val1=", val1 call underflow_overflow(val2, 45.00) print *,"val2=", val2 call underflow_overflow(val3, -45.00) print *,"val3=", val3 print *,"program complete" end program subroutine divide_by_zero(value) real value integer ii print *, "------ divide_by_zero subroutine:---------" value = (value * value) / value end subroutine subroutine underflow_overflow(value, power) real value real power print *, "------ underflow_overflow subroutine: ----" value = 10 ** power + 1.125 end subroutine ! ==================================================================
The following compile statement enables the detection of overflow, underflow, divide by zero, and invalid floating point exceptions. It also specifies that the trap handler be set to the xl__ieee exception handler. Thus, any floating point exceptions will be echoed to stderr without ending the program execution. The -g and -qfullpath flags allow the line numbers of each call and the full path of the source code to be shown in the traceback.
iceberg2 2% xlf90_r float_except.f -g -qfullpath \ -qflttrap=en:ov:und:zero:inv \ -qsigtrap=xl__ieee -o fe_ieee
If you happen to have more than one source file, you will need to use the flttrap flag to enable the trap for each file. Fortran Libraries do not have support for flttrap.
Running the sample code yields the following:
iceberg2 3% ./fe_ieee ------ divide_by_zero subroutine:--------- Signal received: SIGTRAP - Trace trap Signal generated for floating-point exception: FP invalid operation Instruction that generated the exception: fdivs fr01,fr01,fr02 Source Operand values: fr01 = 0.00000000000000e+00 fr02 = 0.00000000000000e+00 Traceback: Offset 0x000000dc in procedure divide_by_zero, near line 27 in file /gpfsu/u1/uaf/username/float_except.f Offset 0x00000064 in procedure test, near line 8 in file /gpfsu/u1/uaf/username/float_except.f --- End of call chain --- val1= NaNQ ------ underflow_overflow subroutine: ---- Signal received: SIGTRAP - Trace trap Signal generated for floating-point exception: FP overflow Instruction that generated the exception: frsp fr01,fr01 Traceback: Offset 0x000000e0 in procedure underflow_overflow, near line 37 in file /gpfsu/u1/uaf/username/float_except.f Offset 0x00000130 in procedure test, near line 11 in file /gpfsu/u1/uaf/username/float_except.f --- End of call chain --- val2= INF ------ underflow_overflow subroutine: ---- Signal received: SIGTRAP - Trace trap Signal generated for floating-point exception: FP underflow Instruction that generated the exception: frsp fr01,fr01 Traceback: Offset 0x000000e0 in procedure underflow_overflow, near line 37 in file /gpfsu/u1/uaf/username/float_except.f Offset 0x000001fc in procedure test, near line 14 in file /gpfsu/u1/uaf/username/float_except.f --- End of call chain --- val3= 1.125000000 program complete
According to the XL Fortran User's Guide, the performance impact of using floating point traps is relatively low. However it's probably best to avoid using floating point traps in debugged, production codes.
While writing this article I discovered that two of the "-qflttrap" suboptions caused programs to stop even when using the xl__ieee exception handler. These were the "imprecise" and "inexact" suboptions.
The User's Guide notes that it may not be possible for the handler to substitute results when an "imprecise" result is detected, which explains the program termination. The "inexact" suboption results in a trap for signals generated at the beginning and end of subroutines only, which would make it impossible for the handler to provide results in all but the most trivial subroutines.
For more information on exception handlers and flttrap options see "man xlf" or the IBM manual:
XL Fortran for AIX: Language Reference (SC09-4947-01) Chapter 7 XL Fortran Floating-Point Processing
This manual is available on iceberg and iceflyer in pdf format, here:
X1: Useful pat_hwpc Counters to Sample
Default use of "pat_hwpc" on the X1 gives you more data than most of us know how to interpret. If you hunger for even more, here are four additional, useful counters to sample:P:6:3 -- Stall_VU_No_Inst
- CrayDoc Description:
- "CPs VU has no valid instruction"
The value reported is time running in scalar mode without any overlapping computation in vector mode. Other fields reported by pat_hwpc, such as total "Vector ops" and "Scalar ops," hide the fact that scalar and vector processing often occurs simultaneously on the X1.
The Stall_VU_No_Inst counter reports the time the Vector Units are stalled because no Vector instruction is available to process. If this a large percentage of the code's total run time, the code may benefit from profiling and optimization to improve vectorization.
ARSC users are encouraged to ( email ARSC Consulting or call 907-450-8602 for assistance.
- CrayDoc Description:
- "CPs VLSU stalled waiting for load buffers (LB)"
Stall_VLSU_LB is the time the Vector Load/Store Unit is stalled because all the vector load buffers are already busy "talking" to vector cache and main memory.
A large value here may indicate that the performance of this code is limited by memory bandwidth. Again, profiling the code is the next step, but likely optimizations include restructuring bottleneck loops to increase the number of operations performed per load and/or fixing inefficiencies in the cache and/or memory access pattern.
- CrayDoc Description:
- "CPs VLSU stalled waiting for VU vector mask (VM)"
The vector mask registers are commonly used to vectorize loops containing conditional statements based on vector data. E.g.,
do i=1,N if ( A(i) == 0.0 ) B(i) = 0.0 else B(i) = func(A(i)) endif enddo
A large value for Stall_VLSU_VM may indicate that the complexity or nesting depth of vectorizable conditional loops is slowing the code down.
- CrayDoc Description:
- "CPs VLSU stalled waiting for VU index vector for gather or scatter"
A disproportionately large value suggests that vectorizable loops with indirect array access, such as this, should be evaluated:
do i=1,N A( indx(i) ) = B(i) enddo
As always, profiling with cray_pat is the next step.
To report data on these four counters (in addition to the usual report), invoke pat_hwpc as:
pat_hwpc -e 'P:14:3,P:12:3,P:6:3,P:9:3'
Here's what the output looks like for a 1-MSP user code tested on klondike:
% pat_hwpc -e 'P:14:3,P:12:3,P:6:3,P:9:3' aprun -n 1 ./a.out [ ... cut ...] Stall VU No Inst 33.351 secs 13340419864 clks Stall VLSU LB 19.388 secs 7755336688 clks Stall VLSU VM 16.299 secs 6519742910 clks Stall VLSU Index 15.969 secs 6387416611 clks [ ... cut ...]
The list of all available counters is available at:
Optimizing Applications on the Cray X1TM Series System - S-2315-53 Appendix B. Hardware Performance Counters
(ARSC users, read "news documents" on klondike for instructions on obtaining the above manual.)
Quick-Tip Q & A
A: [[ I often find myself comparing versions of source files trying [[ to figure out what changed between the versions. Good [[ old-fashioned diff works just fine, but there's got to be a more [[ modern solution. [[ [[ Do you know of any text editors or other tools that have file [[ comparison functionality built in? # # From Martin Luthi # If you are using Emacs or XEmacs, try ediff. It is almost too useful to not try out, even if you are not hooked on Emacs (yet). Since Emacs also fully supports CVS (or other version control systems), this feature is even more useful. (And of course Emacs does syntax highlighting, file access on remote machines, ....). To get an idea, look at these screenshots. http://www.thomas-guettler.de/vortraege/emacs/einfuehrung.html http://www.student.northpark.edu/pemente/img/ediff_24.gif You can invoke ediff in a running Emacs with M-x ediff-buffers if the files are already visited by Emacs M-x ediff-files if the files are not loaded in Emacs For comparison of three files, the analogous commands are M-x ediff-buffers3 and M-x ediff-files3. [M-x means pressing the Meta key and then hit x while Meta is still pressed. The Meta key is often identical to Alt (PC Keyboards), Diamond (Sun Keyboards), or you can press Esc x to obtain the same effect] # # Thanks to James Keenan and former co-newsletter editor Guy Robinson # tkdiff uses Tcl/Tk and diff to give a graphical representation between files. You can store your favorite diff options, such as -ib, for each time you launch tkdiff. There is also a decent merge capability that allows the user to move lines from one file to the other. See http://sourceforge.net/projects/tkdiff/ for more information. # Editors Note: tkdiff even runs on the Cray X1 (slowly), and # doesn't require any compilation. # # # Kate Hedstrom # The emacs flavor is built into xemacs. The vi flavor comes with vim as gvimdiff. There is also xxdiff and I've heard that mgdiff has similar functionality. They are all trying to emulate the old SGI command xdiff. # # Jed Brown # Try vimdiff or the gui version gvimdiff. It displays two or three files side-by-side with differences highlighted and automatic folding of the boring sections. You can edit the files with realtime updating of the diff. If you want to use the mouse more, there is the Qt-based kdiff3. An interesting tool if you want to generate patches for binary files is bsdiff. # # Thanks to Greg Newby for a different take on the question # For single-author projects, try RCS. The "ci" and "co" commands are used to check in and check out code (or other text files) and annotate them. This is lightweight and easy to use. For multiple-author projects, options get more powerful and complex. SCCS, CVS and subversion are popular, and quite functional. The trick, for these, is to gracefully handle situations where different people edit the same file. # # And, from Wendy Palm # Emacs has a number of commands that provide version control information. You can directly compare two files (M-x diff), or use the version control commands (http://www-2.cs.cmu.edu/cgi-bin/info2www?(emacs)Version%20Control) There are a number of tools (free) listed here, perhaps one will appeal to you: http://www.thefreecountry.com/programming/filecomparison.shtml. Generally, I've been able to use the different options of diff and get the information I need. Specifically, remember the use of "-u", "-c" and "-y". For example, given the following simple files: [hubble-e0:~] palm% cat file1 aaaaaaaaaaa bbbbbbbbbbb ccccccccccc ddddddddddd [hubble-e0:~] palm% cat file2 aaaaaaaaaaa bbbbbbbbbbb 33333333333 ddddddddddd diff by itself gives just the 2 lines that are different: [hubble-e0:~] palm% diff file1 file2 3c3 < ccccccccccc --- > 33333333333 diff -u gives the changes in unified context where "-" is file1 and "+" is file2 [hubble-e0:~] palm% diff -u file1 file2 --- file1 Fri Mar 25 17:56:14 2005 +++ file2 Fri Mar 25 17:56:27 2005 @@ -1,4 +1,4 @@ aaaaaaaaaaa bbbbbbbbbbb -ccccccccccc +33333333333 ddddddddddd diff -c gives the changes in context, showing 2 lines above and 2 lines below the changes where *** is file1 and --- is file2 [hubble-e0:~] palm% diff -c file1 file2 *** file1 Fri Mar 25 17:56:14 2005 --- file2 Fri Mar 25 17:56:27 2005 *************** *** 1,4 **** aaaaaaaaaaa bbbbbbbbbbb ! ccccccccccc ddddddddddd --- 1,4 ---- aaaaaaaaaaa bbbbbbbbbbb ! 33333333333 ddddddddddd and, finally, diff -y shows the changes side by side, where file1 is on the left and file2 is on the right, and differing lines are identified with " ". [hubble-e0:~] palm% diff -y file1 file2 aaaaaaaaaaa aaaaaaaaaaa bbbbbbbbbbb bbbbbbbbbbb ccccccccccc 33333333333 ddddddddddd ddddddddddd If you want a graphical interface, check out "xdiff". It can handle up to 4 files with the text presented side-by-side with the differences color coded. # # And in the spirit of our April 1st issue: # Bob Robins reports that like the editors, he prints the two files, but then he uses a light table. "I just put the two versions one on top of the other and the differences pop right out." Q: Emacs isn't available on the Cray X1. Thus, I must edit source code on my desktop workstation and move it to the X1 for compilation and testing. It goes like this: ---WORKSTATION--- edit save forget to move updated file to X1 ---X1--- make (Ooops! now I realize I don't have the updated file.) ---WORKSTATION--- ftp file to X1 ---X1--- make run ... REPEAT ... ... REPEAT ... ... REPEAT ... There must be a better way!
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.