ARSC HPC Users' Newsletter 266, April 4, 2003
Chilkoot/Yukon PrgEnv Upgraded to 3.6
Programming Environment 3.6 (PE3.6) was made the default on March 12. The previous default, PE3.5, remains available as PrgEnv.old.
To switch back to PrgEnv.old, execute this command prior to compiling your code:
module switch PrgEnv PrgEnv.old
Here are some of the changes in the new programming environment... lots of good stuff for C/C++ programmers. ARSC users, see the man pages or contact us for more detail. Enjoy!
- f90 command now named ftn (but "f90" remains available in this release).
- The Cray Standard C compiler now supports the C99 standard (ISO/IEC 9899:1999). (You might review Jim Long's article on C99 in issue 245 )
- The Cray Standard C++ compiler now supports the C++98 standard (ISO/IEC FDIS 14882:1998).
- Fortran and C/C++ programs developed on Cray SV1ex systems can now use the new bte_move intrinsic command for faster data transfers (about 50 times faster).
- A new infinite vector length optimization feature was added to the Cray Fortran Compiler and Cray Standard C/C++ compilers.
- Fortran Aggressive Inlining Option Changes. Because the Cray Fortran Compiler -O inline4 option has a new functionality (described in Section 2.2.1), aggressive inlining is now activated by the -O inline5 option.
(Not) Vectorizing Loops With Pointer Arrays
Should a compiler vectorize a loop which contains possible data dependencies, due to references to pointer arrays? Interestingly, NEC's f90 and Cray's ftn compilers answer this question differently. NEC f90 lives dangerously, opting for performance, Cray ftn opts for safety.
The problem is that pointers might address overlapping memory, which in some cases will create dependencies within a loop which are unknowable at compile time. Here's an example:
module gridtype integer :: size real (kind=8), allocatable, dimension (:) :: A real (kind=8), pointer, dimension (:) :: B real (kind=8), allocatable, target, dimension (:) :: C end module gridtype ... do i = 3, size A(i) = A(i-1) + A(i-2) enddo do i = 3, size C(i) = B(i-1) + B(i-2) enddo
The first loop has an obvious dependency, requiring that the computations must proceed in serial order. A(n) can't be computed before A(n-1) and A(n-2) are computed. It's not vectorizable.
The second loop would have no dependency if we were guaranteed that C and B were disjoint in memory, but given that B is pointer, we lack that guarantee. In particular, if B addresses the same location as C, then the two loops are equivalent and suffer the same dependency.
From Cray's loopmark listing and showing "negmsgs", ftn -rm -Omsgs,negmsgs -o tst4a tst4a.f90 we see that Cray ftn doesn't vectorize either loop:
55. 56. 1---< do i = 3, size 57. 1 A(i) = A(i-1) + A(i-2) 58. 1---> enddo 59. 60. f---< do i = 3, size 61. f C(i) = B(i-1) + B(i-2) 62. f---> enddo 63. end subroutine work ftn-6254 f90: VECTOR File = tst4a.f90, Line = 56 A loop starting at line 56 was not vectorized because a recurrence was found on "A" at line 57. ftn-6004 f90: SCALAR File = tst4a.f90, Line = 60 A loop starting at line 60 was fused with the loop starting at line 56.
No matter what I tried, including the advisory directive to ignore vector dependencies, !CDIR$ IVDEP I couldn't persuade the Cray compiler to vectorize the second loop. (If anyone knows how to do this, let us know.)
It's a different story on the SX-6. The "fullmsg" output, f90 -Wf"-pvctl fullmsg" -o tst4a tst4a.f90 shows that the SX-6 compiler vectorizes the second loop:
f90: vec(3): tst4a.f90, line 56: Unvectorized loop. f90: vec(13): tst4a.f90, line 56: Overhead of loop division is too large. f90: opt(1037): tst4a.f90, line 57: Feedback of array elements. f90: vec(20): tst4a.f90, line 57: Unvectorizable dependency.:a f90: vec(1): tst4a.f90, line 60: Vectorized loop. f90: tst4a.f90, work: There are 5 diagnoses.
The SX-6 compiler can be inhibited from vectorizing the second loop by inserting this directive: !CDIR NOVECTOR directly above the second "DO".
Whether or not vectorizing the loop will give bad results depends on what the pointer addresses. As you may suspect, we investigated this issue because it arose in a real user application.
In that case, the pointer array was allocated separately, occupied different memory from its companion on the LHS of the expression, and it was correct to vectorize the loop. Thus, in that application, performance on the Cray suffered until the code was rewritten to eliminate the pointer, which, since there really was no dependency, allowed the compiler to vectorize it.
However, for the curious, here's the complete test code used in this article. It gives incorrect results on the SX-6 if the default compiler behavior, which is to vectorize, is used. It always gives the correct results on the Cray.
module gridtype integer :: size real (kind=8), pointer, dimension (:) :: B real (kind=8), allocatable, dimension (:) :: A real (kind=8), allocatable, target, dimension (:) :: C end module gridtype program tst use gridtype implicit none integer i call init () print*, "before A:", (A) print*, "before B:", (B) print*, "before C:", (C) call work () print*, "after: A", (A) print*, "after: B", (B) print*, "after: C", (C) end subroutine init () use gridtype implicit none integer :: i size = 10 allocate (A(size)) allocate (C(size)) C = 1. A = 1. B => C end subroutine init subroutine work () use gridtype implicit none integer :: i do i = 3, size A(i) = A(i-1) + A(i-2) enddo do i = 3, size C(i) = B(i-1) + B(i-2) enddo end subroutine work
Quick-Tip Q & A
A:[[ In vi, and perl for that matter, it's a nuisance to search and [[ replace on strings which contain forward slashes. Is there an easier [[ way to do what I want here? Some Unix trick, maybe? (Please don't [[ say "emacs"...) [[ [[ :%s/\/lib\//\/usr\/lib\//gc You're not stuck using the "/" to delimit the expession. For instance, this example uses a "," which makes it unnecessary to escape the forward slashes: :%s,/lib/,/usr/lib/,gc # # Thanks to Rich Griswold for this response: # I'm not sure about vi, but perl is pretty flexible about the delimiter characters for regular expressions. From the perlop manpage: Any non-alphanumeric, non-whitespace delimiter may replace the slashes. For example, you can use any of the following: $foo =~ s /lib/ /usr/lib/ g; $foo =~ s-/lib/-/usr/lib/-g; $foo =~ s(/lib/)(/usr/lib/)g; You can even mix delimiters like this: $foo =~ s</lib/>/\/usr\/lib\//g; There are special rules for some delimiters. Check the m/PATTERN/cgimosx and s/PATTERN/REPLACEMENT/egimos entries in the perlop manpage for a full explanation. Q: I did "du -sk" of a directory, I got 320 Mbytes, then I made a tarfile of the directory, and it was near 1 GByte! What happened?
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.