ARSC T3D Users' Newsletter 34, May 5, 1995

T3D Jobs Are Lost When the Y-MP Goes Down

Once a job begins on the T3D it cannot be interrupted until it finishes. So when the Y-MP goes down for preventive maintenance or testing, all jobs executing on the T3D will not be restarted: they are aborted when the Y-MP is shut down. All planned shutdowns are announced in the logon MotD ("Message of the Day"). Usually testing on ARSC's machine is scheduled for Tuesday night starting at 5:30 PM Alaska time, but check the MotD in /etc/motd to be sure.

MPI on the T3D

The following announcement appeared on Newsnet in the past week:

  > 
  >                  MPI for the Cray T3D Release Notice
  >               (MPICH with Cray T3D Shared Memory Driver)
  >                           Alpha Version 0.1a
  >                        
  >                              April 19, 1995
  >  
  > We are happy to announce alpha release 0.1a of the MPICH on the Cray
  > T3D.  Previously, only a subset of the MPI specification existed on
  > the T3D using the native message passing software T3DPVM.  This
  > new technology offers a more complete implementation of the MPI
  > standard and is expected to offer substantially higher performance.
  >  
  > MPICH, the model implementation of MPI, is a joint research effort
  > project between Argonne (Bill Gropp and Rusty Lusk) and Mississippi
  > State (Tony Skjellum and Nathan Doss).   MPICH is the most widely used
  > public implementation of MPI.  For more information on MPICH, please
  > see http://www.info.mcs.anl.gov/mpi/ or http://www.erc.msstate.edu/mpi/.
  >  
  > Current status of the Cray T3D implementation (as of April 19, 1995):
  >  
  > 1.  Most MPI functions are supported.  The only functions known not
  >     to work are MPI_Rsend, MPI_Irsend, MPI_Ssend, and MPI_Issend.
  > 
  > 2.  The current limit on the number of processes is 256. 
  >  
  > 3.  There are quite a few known optimizations that have not yet been 
  >     done.  The main goal of this initial release is to provide a
  >     functional, working version of MPI.  
  >  
  > 4.  The code has not been thoroughly or systematically tested.
  >  
  > 5.  This software is not yet part of the standard MPICH release,
  >     and will not be until it has been tested and upgraded further.
  >  
  > 6.  Optimization of collective operations remains to be done.
  >  
  > Note: We will work quickly to remove these limitations.
  >                                                                   
  > ------------------------------------------------------------------------
  > Ron Brightwell                                          Anthony Skjellum
  > bright@ERC.MsState.Edu                               tony@CS.MsState.Edu      
  >  
  >      Mississippi State University NSF Engineering Research Center
  > P.O. Box 6176                                          Fax: 601-325-7692    
  > Mississippi State, MS 39762                      Telephone: 601-325-2497
  > ------------------------------------------------------------------------
From Ron Brightwell I got the ftp address to download the T3D version of MPICH and I have installed the libraries and include files in:

  /usr/local/examples/mpp/mpich
on denali. Also in this directory is a subdirectory of examples that compile and execute correctly on the ARSC T3D. By examining the Makefile and examples in that subdirectory a user could start using MPI on the T3D now. This is a preliminary version and I haven't done any timings or testing other than running the example programs.

There is a MPI homepage at: http://www.mcs.anl.gov under MCS Research Topics, then under Computer Sciences then under Programming Tools. It has reports, manuals, documentation and more examples on MPI.

GAMESS on the T3D and Denali

From Mike Schmidt of the Iowa State Quantum Chemistry Group I got a copy of the GAMESS ab initio quantum chemistry code. I have compiled, linked, run, tested and timed the program on both the Y-MP and various T3D configurations. This is a big program of almost 175K lines of Fortran and has lots of options and capabilities. It comes in both uniprocessor and multiprocessor versions and was ported to the T3D by Nick Nystrom of Pittsburg Supercomputing Center and Carlos Sosa of Cray Research. If you are interested in running GAMESS on the ARSC T3D please contact Mike Ess.

The package comes with a benchmark suite of problems and a table of execution times on many different platforms for this benchmark suite of problems. I have augmented that table with the execution times I got on ARSC machines. I can email this table of timings and a graph of timings on ARSC machines to anyone who is interested.

The vector_fastmath Routines of benchlib

In newsletter #29 (3/31/95) I announced the availability of benchlib on the ARSC T3D. The sources for these libraries are available on the ARSC ftp server in the file:

  pub/submissions/libbnch.tar.Z
The compiled libraries are also available on Denali in:

  /usr/local/examples/mpp/lib/lib_32.a
  /usr/local/examples/mpp/lib/lib_scalar.a
  /usr/local/examples/mpp/lib/lib_util.a
  /usr/local/examples/mpp/lib/lib_random.a  
  /usr/local/examples/mpp/lib/lib_tri.a 
  /usr/local/examples/mpp/lib/lib_vect.a
and the sources are available in:

  /usr/local/examples/mpp/src.
In previous newsletters I've described the contents of some of the libraries:

  #30 (4/7/95)  the "pref" routine of lib_util.a
  #32 (4/28/95) the fast scalar math routines in lib_scalar.a
In this newsletter, I describe the fast vector routines of lib_vect.a. This library provides routines that take a vector input and produce a vector output, one for each element of the input vector. The general calling sequence is:

  call routine_v( vectorlength, inputvector, outputvector )
and there is no restriction on the vector length (called with a vector length of 0 or less than zero produced no error messsage). They could be used in loops that look like:

                       becomes:

      do 10 i = 1,n              call sqrt_v(n,x,y)
      y(i) = sqrt(x(i))          do 10 i = 1,n
        .                          .
        .                          .
        .                          .
   10 continue                10 continue
Like all vector operations, the cost (in clock ticks) per element computed goes down as the length of the vector increases. For the table below, I ran and computed the cost for vector lengths 1, 10, 100 and 1000. As expected, the cost per element decrease as a function of vector length and the savings can be substantial when compared to the cost of calling the default Fortran intrinsic. The crossover point between these vector routines and the Fortran intrinsic routines is always less than 10 elements.

Cost (in clock ticks) per element computed for different vector lengths and the default Fortran intrinsics:


  Performance of benchlib's fast vector routines
  ----------------------------------------------
  in lib_vect.a
                       vector lengths
                                             default
               1      10      100     1000  intrinsic 

  sqrt_v    1261.0   126.2    57.3    39.5   226.0
  sqrti_v    821.0   168.5    54.5    37.2   300.7
  exp_v      909.0   163.8    61.5    45.6   198.8
  alog_v    1178.0   169.1    67.9    52.6   246.7
  aln2_v     552.0    92.6    35.2    43.7   309.3
  rtor_v    1482.0   215.9    95.0    99.2  1473.5
  twotox_v   602.0   112.9    41.6    36.0  1404.0
  oneover_   819.0   143.1    41.2    21.9    75.0
  sin_v     1092.0   200.4    68.3    57.0   203.1
  cos_v      802.0   163.8    64.0    56.7   195.9
  atan_v     952.0   234.5    78.5    61.6   258.5
  vscale     537.0    31.8     9.5     6.0    10.8
  vset       279.0    26.2     4.4     2.8     5.4
  zcopy      465.0    44.9     9.0     4.3     8.4
The performance case for the last three entries in the table is not so clear:

  call vscale(n,s,a,b) is  do i = 1, n
                             a(i) = s * b(i)
                           enddo

  call vset(n,s,a)     is  do i = 1, n
                             a(i) = s
                           enddo

  call zcopy(n,a,b)    is  do i = 1, n
                             a(i) = b(i)
                           enddo
Also note that the positions of input and output are reverse in vscale and zcopy from what they are in the first 11 routines in the table.

I have not done extensive testing of the benchlib routines to see how accurate they are. For the test cases I tried, I didn't see any difference but I have a feeling that the default libraries are more accurate.

List of Differences Between T3D and Y-MP

The current list of differences between the T3D and the Y-MP is:
  1. Data type sizes are not the same (Newsletter #5)
  2. Uninitialized variables are different (Newsletter #6)
  3. The effect of the -a static compiler switch (Newsletter #7)
  4. There is no GETENV on the T3D (Newsletter #8)
  5. Missing routine SMACH on T3D (Newsletter #9)
  6. Different Arithmetics (Newsletter #9)
  7. Different clock granularities for gettimeofday (Newsletter #11)
  8. Restrictions on record length for direct I/O files (Newsletter #19)
  9. Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
  10. Missing Linpack and Eispack routines in libsci (Newsletter #25)
  11. F90 manual for Y-MP, no manual for T3D (Newsletter #31)
I encourage users to e-mail in differences that they have found, so we all can benefit from each other's experience.
Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top