ARSC T3D Users' Newsletter 22, February 13, 1995

ARSC T3D Upgrades

In the next month we will be upgrading the T3D Programming Environment (libraries, tools and compilers) from P.E. 1.1 to P.E. 1.2,

Users will be notified when these upgrades will happen in mailings to the ARSC T3D user's group (i.e., those who receive this newsletter).

Upgrade to the T3D Memory

On February 7th, ARSC upgraded the memory on each PE from 2MWs to 8MWs. If any users have questions about this, please contact Mike Ess.

Upgrade to MAX 1.2

On January 31st, ARSC upgraded to the 1.2 version of MAX, the T3D operating system. If any users notice differences in their codes running in the T3D they should notify Mike Ess.

What's in the Libraries

In the directory /mpp/bin there is a mpp utility call nm that prints the object files that are in a library. It performs the same functions as nm in /bin/nm does for Y-MP libraries and those functions are described in the man page on denali.

In going from P.E. 1.1 to P.E. 1.2 this utility provides a quick method to see what's new in the library. First we run:


  /mpp/bin/nm /mpp/lib/lib*.a 
 grep lib > pe11.objs
then after ARSC has upgraded to P.E. 1.2 we run:

 
  /mpp/bin/nm /mpp/lib/lib*.a 
 grep lib > pe12.objs
Now a diff between pe11.objs and pe12.objs will show us what's been added in the new release.

More Speed

Once a T3D application is up and running we always need more speed. One place to look for increased speed is in the libsci routines. Libsci has single PE versions of all of the BLAS 1, 2, and 3 routines and and many of LAPACK routines. However before we add them to our code expecting a speed improvement, we should time the code they replace and the routine itself. It may be that calling the routine in libsci is actually slower than the code it replaces. For example, the overhead of the subroutine call to libsci could be as significant as the function performed.

Here is a small program I've used to see if the code replacement with a libsci will help:


      real a( 2000 ), b( 2000 )
      integer index( 16 )
      data index / 0, 1, 2, 3, 4, 5, 10, 20, 40, 50, 100, 200, 400,
     +  500, 1000, 2000 /

      do 10 i = 1, 2000
            a( i ) = i
            b( i ) = i 
   10 continue

      do 100 i = 1, 16
         t1 = second()
            s1 = sdot( index( i ), a, 1, b, 1 ) ! libsci replacement
         t2 = second()
            s2 = 0.0
            do 20 j = 1, index( i )
               s2 = s2 + a( j ) * b( j )        ! code replaced
   20       continue
         t3 = second()
            s3 = snrm2( index( i ), a, 1 )      ! libsci replacement
         t4 = second()
            s4 = 0.0
            do 30 j = 1, index( i )
               s4 = s4 + a( j ) * a( j )        ! code replaced
   30       continue
            s4 = sqrt( s4 )
         t5 = second()
            
         if( s1 .ne. s2 ) then
            write( 6, 600 ) s1, s2              ! same answers ?
            stop
         endif
         if( s3 .ne. s4 ) then
             write( 6, 601 ) s3, s4             ! same answers ?
             stop
          else
             write( 6, 602 ) i, index( i ), t2-t1,t3-t2,t4-t3,t5-t4
          endif

  100 continue

  600 format( " Error in sdot , found ", f10.2, " should be ", f10.2 )
  601 format( " Error in snrm2, found ", f10.2, " should be ", f10.2 )
  602 format( I3, i8, 4f10.6 )
      end
      real function second( )
      second = dble( irtc( ) ) / 150000000.0
      end
The results for the above program are:

   1     0  0.000009  0.000001  0.000004  0.000007
   2     1  0.000010  0.000002  0.000017  0.000006
   3     2  0.000011  0.000002  0.000017  0.000007
   4     3  0.000011  0.000002  0.000018  0.000007
   5     4  0.000011  0.000002  0.000017  0.000007
   6     5  0.000011  0.000002  0.000017  0.000007
   7    10  0.000012  0.000003  0.000020  0.000008
   8    20  0.000015  0.000004  0.000023  0.000009
   9    40  0.000020  0.000006  0.000027  0.000011
  10    50  0.000020  0.000007  0.000030  0.000011
  11   100  0.000028  0.000020  0.000038  0.000015
  12   200  0.000031  0.000035  0.000055  0.000023
  13   400  0.000043  0.000067  0.000086  0.000038
  14   500  0.000048  0.000083  0.000102  0.000046
  15  1000  0.000079  0.000163  0.000184  0.000084
  16  2000  0.000138  0.000323  0.000426  0.000231
In this program we are investigating replacing the code for computing the dot product of two vectors and the Euclidean 2 norm of a vector with calls to libsci routines sdot and snrm2 (sdot and snrm2 are described in man pages on denali). From the times above it looks like replacing the Fortran code for a dot product with the libsci routine doesn't pay until the vectors are greater than 100 or so. For snrm2 is doesn't look like using the libsci version will ever payoff. With these small test cases it's easier to decide which libsci routines will improve the speed of a T3D program.

Reminders

List of Differences Between T3D and Y-MP

The current list of differences between the T3D and the Y-MP is:
  1. Data type sizes are not the same (Newsletter #5)
  2. Uninitialized variables are different (Newsletter #6)
  3. The effect of the -a static compiler switch (Newsletter #7)
  4. There is no GETENV on the T3D (Newsletter #8)
  5. Missing routine SMACH on T3D (Newsletter #9)
  6. Different Arithmetics (Newsletter #9)
  7. Different clock granularities for gettimeofday (Newsletter #11)
  8. Restrictions on record length for direct I/O files (Newsletter #19)
  9. Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
I encourage users to e-mail in differences that they have found, so we all can benefit from each other's experience.
Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top