ARSC HPC Users' Newsletter 215, March 9, 2001

SGI Origin Introduction

Here are some highlights from the Origin2000/Origin3000 training held at ARSC this week:

  • Although the Origin has physically distributed memory, it's a shared-memory system. The OS maps memory across the entire multiprocessor system and attempts to store data near the processors which are using it and to otherwise minimize NUMA effects. Thus, you can program the Origin using shared-memory models, using OpenMP or compiler auto-parallelization (like the SV1). However, you can also program it using distributed-memory models, using MPI, PVM, or SHMEM (like the T3E). For the best scalability, treat the Origin as a distributed memory system.

  • Default compiler optimization level on Origins is only zero. You should normally compile with -O2 or at least -O1.

  • As on the T3E, single CPU-optimization should be your first step in assessing/improving performance. In particular, use the cache effectively. The compiler will try to help with this (if you over-ride the default optimization level of 0, that is), but you might have to help. The most basic advice: stick with stride 1 array access.>

    A nice example of cache optimization in a real user code appears in the article "Another Cache Related Example," in T3D Newsletter #76:

    /arsc/support/news/t3dnews/t3dnews76/index.xml

    The cache lines on the R10000 and R12000 are 32 bytes (L1 cache) and 128 bytes (L2 cache).

  • Memory locality can become a big issue if caching is poor. The OS can allocate pages of your program's memory anywhere on the Origin, but it attempts to place them near the processor using them. Why is this important? The latency to access "home" memory is about 75 clock periods (cp). Add about 75 cp to make one router hop to "non-home" memory, and add another 25 cp per additional router hop.

    The OS's default algorithm for allocating memory is called "first-touch". A given page of memory will be allocated near the first processor to access, or "touch," it. This is great for serial programs and MPI programs, in which every process has it's own private data. It's not always good for OpenMP or auto-MP programs in which some master process might initialize, and thus, "touch" first, entire arrays, in advance of the first parallel region. You can force the OS to use a different memory placement policy (like "round-robin"), or you might try an algorithm which considers the first touch policy (like initializing in parallel).

  • As on the Crays, tools are available to help analyze your code. In particular:

    timex --
    time your run
    perfex --
    dump counter info for entire run (or instrumented sections). See the article on "perfex" in:

    /arsc/support/news/hpcnews/hpcnews211/index.xml

    Speed Shop --
    extensive, detailed performance analysis. (See: "man speedshop".)
    Debugger --
    The debugger ("cvd") works on multiprocess programs and includes performance analysis tools in addition to the expected debugging tools. (See: "man cvd".)
  • SGI's online manuals tend to be well-written and are often more informative than man pages. Search for a given topic at:

    http://techpubs.sgi.com/

    If you're willing to share your list of gotchas, tips, and experience gained working on Origins, please send it in!

Cray Programming Environment 3.5

ARSC will be installing PE3.5 on the T3E and SV1 in the next couple of weeks. As usual, we'd appreciate it if users could run their codes through the new environment. This is advance notice--watch the "Message Of The Day" and "news" items for an announcement.

What follows are some excerpts from the release info on this PE:


> Chapter 2. Programming Environment 3.5 Features 
> 
> This chapter lists the new features of the CF90 and Cray C/C++ Programming
> Environments.
> 
> 2.1 CF90 Programming Environment
> 
> The following subsections describe the new features of the CF90 Programming
> Environment.
> 
> 2.1.1 New Features of the Cray Fortran Language
> 
> The following extensions and features are new to the Cray Fortran language:
> 
>    * Cray extensions to the Fortran language:
> 
>         o STATIC attribute and statement (allows a variable to retain its
>           value, association, definition, and allocation status after the
>           subprogram in which it was declared in completes execution)
> 
>         o PROTECTED attribute (protects module members by declaring them as
>           read-only; only objects within the module can modify protected
>           objects)
> 
>    * C interoperability features from the Fortran 2000 draft:
> 
>         o BIND attribute and statement (binds a Fortran function,
>           subroutine, or variable name to a C function or extern variable)
> 
>         o Enumerations (defines a group name for a set of values and a name
>           for each value within the set)
> 
>         o TYPEALIAS statement (defines another name for an intrinsic data
>           type or user-defined type)
> 
>         o VALUE attribute and statement (Deferred implementation)
> 
>    * Other features from the Fortran 2000 draft:
> 
>         o ALLOCATABLE attribute (components, dummy arguments, scalars, and
>           function results can now use this attribute)
> 
>         o INTENT statement (this statement can now use the POINTER
>           attribute)
> 
> C interoperability allows you to access C functions and variables from
> Fortran and to access Fortran functions, subroutines, and variables from C.
> 
> For more information about the new features, see Fortran Language Reference
> Manual, Volume 1. For more information about interoperating Fortran
> functions or subroutines with C functions, see Fortran Language Reference
> Manual, Volume 2.
> 
> 2.1.2 CF90 Compiler Enhancements
> 
> The following enhancements apply to the CF90 3.5 compiler on the UNICOS and 
> UNICOS/mk systems:
> 
>    * New compiler options:
> 
>         o -e L, -d L: Allow zero-trip shortloops. This option allows you to
>           use the !DIR$ SHORTLOOP directive on shortloops that do not
>           execute (called a zero-trip shortloop). (Disabled by default)
> 
>         o -r o: Show all options used by the compiler during compilation.
>           The compilation section of the lister output file (a file that has
>           a .lst extension) contains this information together with a
>           listing of CIFs generated. This option invokes the ftnlx command.
> 
>         o -r t: Calls the ftnlx command to produce a lint report that
>           includes a common block report. For more information, see the
>           ftnlx(1) man page.
> 
>         o Cautionary bit matrix message: The CF90 compiler now issues a
>           cautionary message for bit matrix operations when the bit matrix
>           hardware is not initialized in the same function that uses it. To
>           see the message, use the -m 2 option.
> 
>    * New directive: A new directive, CONCURRENT, lets you tell the compiler
>      that a loop is parallel. Use this directive when the compiler cannot
>      determine on its own that a loop is parallel. For more information
>      about the directive, see Cray SV1 Optimization Guide.
> 
>    * Changed error-message prefix: The prefix for error message numbers
>      displayed by the compiler is now f90 instead of cf90. For example, the
>      compiler now displays f90-xxxx instead of cf90-xxxx. (xxxx is the error
>      message number.)
> 
> The CF90 Commands and Directives Reference Manual and f90(1) man page were
> corrected to state that the -e v option allocates all variables to static
> storage unless they are explicitly or implicitly defined as automatic
> variables.
> 
> 
> 2.2 Enhancements to the Cray C and C++ Compilers
> 
> The Cray C and C++ compilers now issue a cautionary message for bit matrix
> operations when the bit matrix hardware is not initialized by the same
> function that uses it. To see the message, use the -m 2 option.
> 
> 2.3 CF90 and/or Cray C/C++ Compiler Optimizations
> 
> Enhancements to the CF90 and Cray C/C++ compilers generate smaller code
> and/or improve code performance. You must recompile your code to take
> advantage of the compiler enhancements. Programs can take advantage of the
> enhancements if they have one or more of the following characteristics and
> can use the specified Programming Environment and system:
> 
>                                   Table 1.
> 
>   ------------------------------------------------------------------------
> 
>                   Characteristics                  CF90   C/C++   SV1   T3E
> 
>   Bit matrix multiplication                          x      x      x
> 
>   Multistreamed code                                 x      x      x
> 
>   Scalar code(1)                                     x      x      x
> 
>   Loops that are vectorizable                        x      x      x
> 
>   Loops that modify bit fields                              x      x     x
> 
>   Nested vector loops with abundant memory                  x      x
>   references
> 
>   Zero length arrays and character strings           x             x     x
> 
>      (1) Code that is not vectorized
> 
>   ------------------------------------------------------------------------
> 
> Other compiler enhancements, such as better loop selection for
> vectorization, allow some programs to run faster.

Announcements: SGI User Group/Parallel Computing/Burton Smith

There's a lot happening the last week of March. Mark your calendar!

UAF SGI User's Group Wednesday, March 28, 2001, 10am-12pm

The next UAF SGI User's Group meeting will be held March 28th, 10am to 12pm in the Board of Regents Conference Room (Butrovich 109). Topics will include the O2K/O3K Applications Development Course, Visualization Projects, and Experiences and Successes.

ARSC Training: "Parallel Computing Concepts" Wednesday, March 28, 2001, 2-4pm

In this course, Jeff McAllister will introduce parallel computing concepts, and message passing algorithms (using MPI) for new and existing codes.

Colloquium: Burton Smith Thursday, March 29, 2001, 1-2pm

Dr. Burton Smith is currently the Chief Scientist with Cray Inc., and has been a leading, innovative force in high-performance computing throughout his career. He'll be addressing the question: "How Shall We Program High Performance Computers?"

Quick-Tip Q & A



A:

[[ Here's one for the Fortran 90 programmers out there.
[[
[[ This situation appeared when porting a good-sized code from the Cray
[[ SV1 to an Origin 3000.  It's been mightily condensed, but the program
[[ below duplicates the problem and compiler error message.
[[
[[ What's wrong on the Origin (or with the code) and how would you 
[[ fix it? 


[[ ==================
[[ selected_kinds.f ====================================================
[[ ==================
[[       MODULE SELECTED_KINDS
[[ 
[[       IMPLICIT NONE
[[ 
[[       ! The kind of this data type must represent integer values N, 
[[       !  where  -10**16 < N < 10**16
[[ 
[[       INTEGER, PARAMETER :: 
[[      &         TYPE_INT = SELECTED_INT_KIND (16)
[[ 
[[       END MODULE SELECTED_KINDS
[[ ==============
[[ interfaces.h ========================================================
[[ ==============
[[       INTERFACE INCR
[[       FUNCTION INCR (BASE, I)
[[         USE SELECTED_KINDS
[[      INTEGER(TYPE_INT), INTENT(INOUT) :: BASE
[[      INTEGER(TYPE_INT), INTENT(IN) :: I
[[      INTEGER(TYPE_INT) :: INCR
[[       END FUNCTION INCR
[[       END INTERFACE INCR
[[ ========
[[ incr.f ==============================================================
[[ ========
[[       FUNCTION INCR (BASE, I)
[[       USE SELECTED_KINDS
[[       
[[       INTEGER(TYPE_INT), INTENT(INOUT) :: BASE
[[       INTEGER(TYPE_INT), INTENT(IN) :: I
[[       INTEGER(TYPE_INT) :: INCR
[[ 
[[       BASE = BASE + I
[[       INCR = BASE 
[[ 
[[       END FUNCTION INCR
[[ =====
[[ t.f =================================================================
[[ =====
[[       PROGRAM TEST
[[       USE SELECTED_KINDS
[[       IMPLICIT NONE
[[       INCLUDE 'interfaces.h'
[[ 
[[       INTEGER(TYPE_INT), SAVE :: J = 0
[[ 
[[       PRINT*, "TYPE_INT==", TYPE_INT
[[       PRINT*, "Incremented J : ", INCR (J, 1)
[[ 
[[       END PROGRAM TEST
[[ ==========
[[ makefile ============================================================
[[ ==========
[[ all:
[[      f90 -c selected_kinds.f
[[      f90 -c incr.f
[[      f90 -o t t.f incr.o selected_kinds.o
[[ 
[[ --------------------------
[[ Make and run on Cray SV1 --------------------------------------------
[[ --------------------------
[[ chilkoot$ make
[[         f90 -c selected_kinds.f
[[         f90 -c incr.f
[[         f90 -o t t.f incr.o selected_kinds.o
[[ chilkoot$ ./t
[[  TYPE_INT== 8
[[  Incremented J :  1
[[ chilkoot$ 
[[ 
[[ -------------------------------------------
[[ Failed make and explain on SGI Origin 3000 ----------------------------
[[ -------------------------------------------
[[ sard$ make
[[         f90 -c selected_kinds.f
[[         f90 -c incr.f
[[         f90 -o t t.f incr.o selected_kinds.o
[[ 
[[       print*, "Incremented j : ", incr (j, 1)
[[                                   ^           
[[ f90-389 f90: ERROR TEST, File = t.f, Line = 9, Column = 35 
[[   No specific match can be found for the generic subprogram call "INCR".
[[ 
[[ f90: MIPSpro Fortran 90 Version 7.3  (f61) Thu Feb 22, 2001  11:35:42
[[ f90: 21 source lines
[[ f90: 1 Error(s), 0 Warning(s), 0 Other message(s), 0 ANSI(s)
[[ cf90: "explain cf90-message number" gives more information about each message
[[ *** Error code 2 (bu21)
[[ 
[[ 
[[ sard$ explain cf90-389
[[ Error : No specific match can be found for the generic subprogram call "%s".
[[ 
[[ A function or subroutine call which invokes the name of a generic
[[ interface does not match any specific subprogram interfaces in the
[[ generic interface block.  All dummy arguments that do not have the
[[ OPTIONAL attribute must match exactly all corresponding actual arguments
[[ in type, kind type, and rank.




From Trey White of ORNL
============================================

>        PRINT*, "Incremented J : ", INCR (J, 1)

The kind of "1" is default "INTEGER", which is probably "4" on the
Origin. "INCR" is expecting a kind that can handle 16 digits of
precision, like "8". Since the default "INTEGER" is "8" on the Cray, it
works.

To fix it, use "INCR (J, 1_TYPE_INT)" to indicate the constant "1" of
kind "TYPE_INT".

Technically, you should also use "0_TYPE_INT" in the "PARAMETER"
statement.




From Alan Wallcraft of NAVO:
============================================

The error message is unclear because the named interface block defines a
generic function.  The normal use of a generic function is to have
different actual functions called based on the types used, e.g.

       INTERFACE INCR
         FUNCTION INCR_INT (BASE, I)
           USE SELECTED_KINDS
          INTEGER(TYPE_INT), INTENT(INOUT) :: BASE
          INTEGER(TYPE_INT), INTENT(IN) :: I
          INTEGER(TYPE_INT) :: INCR_INT
         END FUNCTION INCR_INT
         FUNCTION INCR_INTEGER (BASE, I)
          INTEGER, INTENT(INOUT) :: BASE
          INTEGER, INTENT(IN) :: I
          INTEGER :: INCR_INTEGER
         END FUNCTION INCR_INTEGER
       END INTERFACE INCR

Then (assuming INCR_INT and INCR_INTEGER exist) the following will work:
      
       PROGRAM TEST
       USE SELECTED_KINDS
       IMPLICIT NONE
       INCLUDE 'interfaces.h'
       INTEGER(TYPE_INT), SAVE :: J = 0
       INTEGER,           SAVE :: K = 0
       PRINT*, "TYPE_INT==", TYPE_INT
       PRINT*, "Incremented J : ", INCR (J, 1_TYPE_INT)
       PRINT*, "Incremented K : ", INCR (K, 1)
       END PROGRAM TEST

Note that Incrementing J and K are invoking different actual functions
via the same generic function.

In the example, the interface block only contains one actual function so
there is no need to define a generic function.  Removing the name from
the interface block removes the generic function:

       INTERFACE
         FUNCTION INCR (BASE, I)
           USE SELECTED_KINDS
           INTEGER(TYPE_INT), INTENT(INOUT) :: BASE
           INTEGER(TYPE_INT), INTENT(IN) :: I
           INTEGER(TYPE_INT) :: INCR
         END FUNCTION INCR
       END INTERFACE

Now the error message (on a Sun) is clearer:

       PRINT*, "Incremented J : ", INCR (J, 1)
                                            ^ 
  "t.f", Line = 12, Column = 44: ERROR: The kind (4) of this actual 
  argument does not match that of its associated dummy argument (8).

The problem is that the constant 1 has default integer kind (32-bit) but
INTEGER(TYPE_INT) is 64-bits.  This can be fixed by using a constant of
type INTEGER(TYPE_INT) as follows:

       PRINT*, "Incremented J : ", INCR (J, 1_TYPE_INT)

This will work with either the original (named) interface block or the
modified (unnamed) version.  Passing constants through argument lists
can be dangerous, but in this case passing a constant is known to be
safe because the corresponding dummy argument has INTENT(IN).

The reason that the error did not happen on the SV1 is that its default
integer kind is 64-bits.  It is actually a good feature of Fortran 90
that this bug caused a compile-time error.  Passing a INTEGER*4 in place
of an INTEGER*8 in Fortran 77 won't be detected at compiler time and can
cause very strange run time errors.




Q: What's up with this?

  chilkoot% rm ldat.199801.*
    Arguments too long.
  chilkoot% 
  chilkoot% ls ldat.199801.*
    Arguments too long.

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top