ARSC T3D Users' Newsletter 37, May 26, 1995
New T3D Batch PE Limits
In the past week all active users of the ARSC T3D had their batch PE limit increased to 128. This allows these users access to the 128PE 8hour queues that run on the weekends. If you need your T3D UDB limits changed please contact Mike Ess.
New Fortran Compiler
An upgrade version of the cf77 compiler is available on Denali with the path:/mpp/bin/cft77new and /mpp/bin/cf77newFor the default versions we have:
/mpp/bin/cf77 V Cray CF77_M Version 6.0.4.1 (6.59) 05/25/95 13:36:39 Cray GPP_M Version 6.0.4.1 (6.16) 05/25/95 13:36:39 Cray CFT77_M Version 6.2.0.4 (227918) 05/25/95 13:36:39and for this new version:
/mpp/bin/cf77new V Cray CF77_M Version 6.0.4.1 (6.59) 05/25/95 13:37:26 Cray GPP_M Version 6.0.4.1 (6.16) 05/25/95 13:37:26 Cray CFT77_M Version 6.2.0.9 (259228) 05/25/95 13:37:27This new compiler fixes a potential race condition in shared memory accesses and also fixes an inlining problem with the F90 intrinsics, MINLOC and MAXLOC.
This compiler will become the default after we finish testing it and users will be notified before that happens. I encourage users to try this compiler before it becomes the default.
Random Number Generation on the T3D and YMP
In newsletter #29 (3/31/95), I announced the availability of benchlib on the ARSC T3D. The sources for these libraries are available on the ARSC ftp server in the file:pub/submissions/libbnch.tar.ZThe compiled libraries are also available on Denali in
/usr/local/examples/mpp/lib/lib_32.a /usr/local/examples/mpp/lib/lib_scalar.a /usr/local/examples/mpp/lib/lib_util.a /usr/local/examples/mpp/lib/lib_random.a /usr/local/examples/mpp/lib/lib_tri.a /usr/local/examples/mpp/lib/lib_vect.aand the sources are available in: /usr/local/examples/mpp/src. In previous newsletters, I've described the contents of some of the libraries:
#30 (4/7/95)  the "pref" routine of lib_util.a #33 (4/28/95)  the fast scalar math routines in lib_scalar.a #34 (5/05/95)  the fast vector math routines in lib_vector.a #35 (5/12/95)  the tridiagonal solvers in lib_tri.aIn this newsletter, I will describe the routines in lib_random.a and compare them to the other random number generators on the T3D and YMP. This is the last library from benchlib. I welcome any user comment or experience with these libraries and I will pass it on to readers of the ARSC T3D newsletter.
Random Number Generators
Of course, a 'random' generator doesn't actually produce random numbers but a sequence of pseudorandom numbers that have characteristics of a sequence of random numbers. These sequences are necessarily reproducible so that computer experiments can be run over and over. As in most areas of computing, there is always of tradeoff between speed and quality and so it is with these pseudorandom number generators (RNG). The easiest to measure is their speed and that is what is presented here. (Analyzing the quality of their random sequences is left for some Ph.D. thesis.)On the T3D there are 3 available random number generators:
rand: rand() is supplied with most implementations of C in libc.a. It usually produces a 16 bit integer, that can be converted to a double in the range 0.0 to 1.0, i.e.: random_real = rand() / (double)RAND_MAX; where RAND_MAX is defined in <stdlib.h>. Because only 16 bits can change from call to call it's usually not considered "random" enough. But its implementation is the same on probably all machines. It is the same on both the YMP and the T3D. There is a man page on Denali that describes rand(). (The division to obtain a random real number is not the same on each machine.) ranf: RANF is the random number generator on the YMP. It exists in both scalar and vector versions in libm.a and is written in highly optimized assembly language. This routine is described in a man page on Denali and in that manpage there is Fortran version that mimics the assembly language. That Fortran version does not run on the T3D because of differences in normalizing floating point numbers, but the T3D does have a version in /mpp/lib/libm.a that produces results similar to those on the YMP. It's a little inconsistent to have the common manpage for the YMP and T3D to have a program describing the function run only on the YMP. rantom: The versions in benchlib's lib_random.a are different than both of the above options but have been written for FAST execution on the T3D processor. In lib_random.a are both Fortran and assembly language versions and a manpage describing the algorithm and its speed is in /usr/local/examples/mpp/src/random
Timing Routines
Below is the program I used to time the T3D routines:#include <stdlib.h> main() { int a[ 1000000 ], b[ 1000000 ], c[ 1000000 ], d[ 1000000 ]; int nlog, n, i; double t1, second(), t2, t3, t4; int rand(); fortran double RANF(); fortran double RANTOM(); double denom; denom = (double)RAND_MAX; printf( " RAND_MAX = %d %f\n", RAND_MAX, denom ); n = 1; for( nlog = 0; nlog < 7; nlog++ ) { t1 = second(); for( i = 1; i <= n; i++ ) { a[ i ] = rand() / denom; } t1 = second()  t1; t2 = second(); for( i = 1; i <= n; i++ ) { b[ i ] = RANF(); } t2 = second()  t2; t3 = second(); for( i = 1; i <= n; i++ ) { c[ i ] = RANTOM(); } t3 = second()  t3; t4 = second(); for( i = 1; i <= n; i++ ) { d[ i ] = RANTOMS(); } t4 = second()  t4; printf("%3d %10d %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f\n" ,nlog,n,t1,n/(t1*1000000),t2,n/(t2*1000000), t3,n/(t3*1000000),t4,n/(t4*1000000)); n = n * 10; } } double second() { fortran irtc(); return( irtc( ) / 150000000.0 ); } </pre> The timing program used on the YMP was: <pre> #include <stdlib.h> main() { int a[ 1000000 ], b[ 1000000 ], c[ 1000000 ], d[ 1000000 ]; int nlog, n, i; double t1, SECOND(), t2, t3, t4; int rand(); fortran double ranf(); fortran double RANTOM(); fortran double SECOND(); double denom; int zero = 0; denom = (double)RAND_MAX; printf( " RAND_MAX = %d %f\n", RAND_MAX, denom ); n = 1; for( nlog = 0; nlog < 7; nlog++ ) { t1 = SECOND(); for( i = 1; i <= n; i++ ) { a[ i ] = rand() / denom; } t1 = SECOND()  t1; t2 = SECOND(); for( i = 1; i <= n; i++ ) { b[ i ] = RANFF(); } t2 = SECOND()  t2; RANSET( &zero ); t3 = SECOND(); for( i = 1; i <= n; i++ ) { c[ i ] = ranf(); } t3 = SECOND()  t3; RANSET( &zero ); t4 = SECOND(); for( i = 1; i <= n; i++ ) { d[ i ] = _ranf(); } t4 = SECOND()  t4; for( i = 0; i <= n; i++ ) { if( c[ i ] != d[ i ] ) { printf( "diff in ranf %f %f\n", i, c[ i ], d[ i ] ); } } printf("%3d %10d %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f %10.6f %4.1f\n" ,nlog,n,t1,n/(t1*1000000),t2,n/(t2*1000000),t3,n/(t3*1000000),t4,n/(t4*1000000)); n = n * 10; } }
Timing Results
Usually the the results of a RNG are given in terms of millions of random numbers per second and that is how I've arranged the table below. I also like to time a loop's worth of results and then divide by the length of the loop. This gives some feel for the overhead of the loop compared to the work of the loop body and also shows the asymptotic speed for a large number of calls.The speed of random generators on the T3D and YMP ================================================== (in millions of random numbers per second) T3D routines: RNG > rand() ranf() rantom() Fortran Assembler     loops 1 0.2 0.2 0.3 0.3 10 1.1 0.9 1.1 1.5 100 2.3 1.2 1.5 2.3 1000 2.6 1.3 1.6 2.4 10000 2.6 1.3 1.6 2.5 100000 2.6 1.3 1.6 2.4 1000000 2.6 1.3 1.6 2.4 YMP routines: RNG > rand() ranf() ranf() Fortran library routines Scalar Vector     loops 1 0.2 0.1 0.1 0.2 10 0.6 0.3 0.6 1.5 100 0.8 0.3 0.9 10.5 1000 0.9 0.3 0.9 18.5 10000 0.9 0.3 0.9 19.3 100000 0.9 0.3 0.9 19.1 1000000 0.9 0.3 0.9 19.4Observations:
 Rand() runs faster on a single PE of the T3D than on the YMP. The price of portability often obscures performance differences.
 The difference between rand() and ranf() on both machines shows that the quality of a random sequence does not come without cost.
 That the difference between the Fortran version of rantom() and the assembler version on the T3D is not as great as the difference as the ranf() versions on the YMP may be a sign that the days of assembly language writing are on the wane.

The last three columns show a range of YMP performance, all computing the same sequence of random numbers. The performance follows the effort:
Fortran > assembler > vectorized assembler
 The last two loops for the YMP timing program show: "What a difference an underscore makes!" (The underscore invokes the vectorized version.)
List of Differences Between T3D and YMP
The current list of differences between the T3D and the YMP is: Data type sizes are not the same (Newsletter #5)
 Uninitialized variables are different (Newsletter #6)
 The effect of the a static compiler switch (Newsletter #7)
 There is no GETENV on the T3D (Newsletter #8)
 Missing routine SMACH on T3D (Newsletter #9)
 Different Arithmetics (Newsletter #9)
 Different clock granularities for gettimeofday (Newsletter #11)
 Restrictions on record length for direct I/O files (Newsletter #19)
 Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
 Missing Linpack and Eispack routines in libsci (Newsletter #25)
 F90 manual for YMP, no manual for T3D (Newsletter #31)
 RANF() and its manpage differ between machines (Newsletter #37)
Current Editors:
Email Subscriptions:
Ed Kornkven ARSC HPC Specialist ph: 9074508669 Kate Hedstrom ARSC Oceanographic Specialist ph: 9074508678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 997756020

Subscribe to (or unsubscribe from) the email edition of the
ARSC HPC Users' Newsletter.

Back issues of the ASCII email edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.