ARSC HPC Users' Newsletter 266, April 4, 2003

Chilkoot/Yukon PrgEnv Upgraded to 3.6

Programming Environment 3.6 (PE3.6) was made the default on March 12. The previous default, PE3.5, remains available as PrgEnv.old.

To switch back to PrgEnv.old, execute this command prior to compiling your code:

module switch PrgEnv PrgEnv.old

Here are some of the changes in the new programming environment... lots of good stuff for C/C++ programmers. ARSC users, see the man pages or contact us for more detail. Enjoy!

  • f90 command now named ftn (but "f90" remains available in this release).
  • The Cray Standard C compiler now supports the C99 standard (ISO/IEC 9899:1999). (You might review Jim Long's article on C99 in issue 245 )
  • The Cray Standard C++ compiler now supports the C++98 standard (ISO/IEC FDIS 14882:1998).
  • Fortran and C/C++ programs developed on Cray SV1ex systems can now use the new bte_move intrinsic command for faster data transfers (about 50 times faster).
  • A new infinite vector length optimization feature was added to the Cray Fortran Compiler and Cray Standard C/C++ compilers.
  • Fortran Aggressive Inlining Option Changes. Because the Cray Fortran Compiler -O inline4 option has a new functionality (described in Section 2.2.1), aggressive inlining is now activated by the -O inline5 option.

(Not) Vectorizing Loops With Pointer Arrays

Should a compiler vectorize a loop which contains possible data dependencies, due to references to pointer arrays? Interestingly, NEC's f90 and Cray's ftn compilers answer this question differently. NEC f90 lives dangerously, opting for performance, Cray ftn opts for safety.

The problem is that pointers might address overlapping memory, which in some cases will create dependencies within a loop which are unknowable at compile time. Here's an example:


  module gridtype
    integer :: size
    real (kind=8), allocatable, dimension (:) :: A
    real (kind=8), pointer, dimension (:) :: B
    real (kind=8), allocatable, target, dimension (:) :: C
  end module gridtype

  ... 

  do i = 3, size
    A(i) = A(i-1) + A(i-2)
  enddo

  do i = 3, size
    C(i) = B(i-1) + B(i-2)
  enddo

The first loop has an obvious dependency, requiring that the computations must proceed in serial order. A(n) can't be computed before A(n-1) and A(n-2) are computed. It's not vectorizable.

The second loop would have no dependency if we were guaranteed that C and B were disjoint in memory, but given that B is pointer, we lack that guarantee. In particular, if B addresses the same location as C, then the two loops are equivalent and suffer the same dependency.

From Cray's loopmark listing and showing "negmsgs",      ftn -rm -Omsgs,negmsgs -o tst4a tst4a.f90 we see that Cray ftn doesn't vectorize either loop:


   55.        
   56.  1---<   do i = 3, size
   57.  1         A(i) = A(i-1) + A(i-2)
   58.  1--->   enddo
   59.        
   60.  f---<   do i = 3, size
   61.  f         C(i) = B(i-1) + B(i-2)
   62.  f--->   enddo
   63.        end subroutine work

  ftn-6254 f90: VECTOR File = tst4a.f90, Line = 56 
    A loop starting at line 56 was not vectorized because a recurrence 
                was found on "A" at line 57.

  ftn-6004 f90: SCALAR File = tst4a.f90, Line = 60 
    A loop starting at line 60 was fused with the loop starting at line 56.
  

No matter what I tried, including the advisory directive to ignore vector dependencies,      !CDIR$ IVDEP I couldn't persuade the Cray compiler to vectorize the second loop. (If anyone knows how to do this, let us know.)

It's a different story on the SX-6. The "fullmsg" output,      f90 -Wf"-pvctl fullmsg" -o tst4a tst4a.f90 shows that the SX-6 compiler vectorizes the second loop:


  f90: vec(3): tst4a.f90, line 56: Unvectorized loop.
  f90: vec(13): tst4a.f90, line 56: Overhead of loop division is too large.
  f90: opt(1037): tst4a.f90, line 57: Feedback of array elements.
  f90: vec(20): tst4a.f90, line 57: Unvectorizable dependency.:a
  f90: vec(1): tst4a.f90, line 60: Vectorized loop.
  f90: tst4a.f90, work: There are 5 diagnoses.

The SX-6 compiler can be inhibited from vectorizing the second loop by inserting this directive:      !CDIR NOVECTOR directly above the second "DO".

Whether or not vectorizing the loop will give bad results depends on what the pointer addresses. As you may suspect, we investigated this issue because it arose in a real user application.

In that case, the pointer array was allocated separately, occupied different memory from its companion on the LHS of the expression, and it was correct to vectorize the loop. Thus, in that application, performance on the Cray suffered until the code was rewritten to eliminate the pointer, which, since there really was no dependency, allowed the compiler to vectorize it.

However, for the curious, here's the complete test code used in this article. It gives incorrect results on the SX-6 if the default compiler behavior, which is to vectorize, is used. It always gives the correct results on the Cray.


module gridtype
    integer :: size
    real (kind=8), pointer, dimension (:) :: B
    real (kind=8), allocatable, dimension (:) :: A
    real (kind=8), allocatable, target, dimension (:) :: C
end module gridtype

program tst
  use gridtype
  implicit none

  integer i

  call init ()

  print*, "before A:", (A)
  print*, "before B:", (B)
  print*, "before C:", (C)

  call work ()

  print*, "after: A", (A)
  print*, "after: B", (B)
  print*, "after: C", (C)
end

subroutine init ()
  use gridtype
  implicit none 
  integer :: i

  size = 10
  allocate (A(size))
  allocate (C(size))
  C = 1.
  A = 1.
  B => C
end subroutine init

subroutine work ()
  use gridtype
  implicit none 
  integer :: i 

  do i = 3, size
    A(i) = A(i-1) + A(i-2)
  enddo

  do i = 3, size
    C(i) = B(i-1) + B(i-2)
  enddo
end subroutine work



Quick-Tip Q & A


A:[[ In vi, and perl for that matter, it's a nuisance to search and
  [[ replace on strings which contain forward slashes. Is there an easier
  [[ way to do what I want here?  Some Unix trick, maybe?  (Please don't
  [[ say "emacs"...)
  [[
  [[   :%s/\/lib\//\/usr\/lib\//gc


  You're not stuck using the "/" to delimit the expession. For instance,
  this example uses a "," which makes it unnecessary to escape the
  forward slashes:

  :%s,/lib/,/usr/lib/,gc


  # 
  # Thanks to Rich Griswold for this response:
  # 

  I'm not sure about vi, but perl is pretty flexible about the
  delimiter characters for regular expressions.  From the perlop
  manpage:

     Any non-alphanumeric, non-whitespace delimiter may replace the slashes.

  For example, you can use any of the following:

     $foo =~ s
/lib/
/usr/lib/
g;
     $foo =~ s-/lib/-/usr/lib/-g;
     $foo =~ s(/lib/)(/usr/lib/)g;

  You can even mix delimiters like this:

     $foo =~ s</lib/>/\/usr\/lib\//g;

  There are special rules for some delimiters.  Check the
  m/PATTERN/cgimosx and s/PATTERN/REPLACEMENT/egimos entries in the
  perlop manpage for a full explanation.



Q: I did "du -sk" of a directory, I got 320 Mbytes, then I made a tarfile
   of the directory, and it was near 1 GByte!  What happened?

[[ Answers, Questions, and Tips Graciously Accepted ]]

Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.

Back to Top