ARSC T3D Users' Newsletter 36, May 17, 1995
New T3D Batch Queues
The T3D batch queues were changed on May 16, 1995. The current T3D queues are:
16pe_24h 1 job using at most 16 PEs for 24 hours 32pe_24h 1 job using at most 32 PEs for 24 hours 64pe_24h 1 job using at most 64 PEs for 24 hours 64pe_10m 1 job using at most 64 PEs for 10 minutes 128pe_5m 1 job using at most 128 PEs for 5 minutesThere is one additional queue that is enabled on Friday at 6:00 PM and disabled at 4:00 AM on Sunday:
128pe_8h 1 job using at most 128 PEs for 8 hoursA request made to these queues will be run as soon as enough PEs are available to satisfy the request. The intent of this change is to provide more production access to users who are moving from development work to production runs. We will be closely monitoring how this new queue structure is working and we may need to modify it in the future. Please contact Mike Ess if you have any concerns about the batch queues.
User's UDBSEE LimitsMost T3D users currently have a limit of 32 PEs for batch access. Users can check their limits with the udbsee command:
udbsee grep jpelimitThe output will indicate their limits in interactive (i) and batch (b). For example:
jpelimit[b] :32: jpelimit[i] :8:If your batch PE limit is too small to access these new NQS queues and you would like to use them, please contact Mike Ess, either by phone at 907-474-5404 or email to firstname.lastname@example.org , to have your PE batch limits increased.
Users can query the NQS batch system with the command:
qstat -ato see what other NQS T3D jobs are scheduled to run on the T3D. The utility mppmon is available to see what jobs are currently running on the T3D. T3D jobs are executed on a "first fit" priority and run to completion without interruption.
A Barrier Routine with a Fixed DelayIn developing code for the T3D it is often the case that not all PEs reach a barrier point. The behavior of the program when this happens is that the program looks hung. This is because those PEs that have reached the barrier are spinning and the PE that hasn't reached the barrier is holding everyone up.
One of our users, Dr. Alan Wallcraft, a scientist with the Naval Research Center in Stennis, Mississippi and I had the need to implement a barrier function that waits a specific number of seconds. For this function, if the time delay is exceeded, then the T3D job is aborted and the user can tell which PEs have reached the barrier and which one(s) were holding up the show.
Below are two implementation of this "barrier with delay". The first is implemented with PVM and the SET_BARRIER/TEST_BARRIER functions. This version can be used with Fortran 90. The second version is implemented with Craft Fortran. A driver program and the complete source files are listed below. (These routines are also available in /usr/local/examples/mpp/src as barrier.1f and barrier2.f).
program test real a( 100000 ) intrinsic irtc include '/usr/include/mpp/fpvm3.h' c call pvmfmytid(itid) call pvmfgetpe(itid, mype) c irtc0 = irtc() c c loop on idelay. c do idelay= 5,1,-1 c c generate some unequal size tasks c do i = 1, 5000 * (mype+1) a(i) = sqrt( real(i+idelay)**3) enddo do k= 1,35 do i = 2, 5000 * (mype+1) a(i) = a(i-1) + a(i) + sqrt( real(i)**3 ) enddo enddo c c call a barrier that aborts if any processor waits more than idelay seconds c call debug_barrier(idelay) if (mype.eq.0) then write(6,*) 'idelay=',idelay,' ok at ', + (irtc()-irtc0)/150000000.0 ,' sec' call flush(6) endif c c a test that is always .false., to prevent optimizing away a(:) c if (a(5000*(mype+1)).eq.-999.9) then write(6,*) a(1),a(99),a(5000*(mype+1)) endif enddo end SUBROUTINE DEBUG_BARRIER(IDELAY) IMPLICIT NONE INTEGER IDELAY C C A VERSION OF BARRIER THAT ABORTS AFTER IDELAY SECONDS. C INTRINSIC IRTC INTEGER IRTC LOGICAL TEST_BARRIER C INTEGER ITICK,NTICK, ITID,MYPE C INCLUDE '/usr/include/mpp/fpvm3.h' C CALL SET_BARRIER() IF (TEST_BARRIER()) THEN RETURN ENDIF C ITICK = IRTC() NTICK = ITICK + IDELAY*150000000 C DO WHILE (ITICK.LE.NTICK) IF (TEST_BARRIER()) THEN RETURN ELSE ITICK = IRTC() ENDIF ENDDO C C ONLY GET HERE AFTER IDELAY SECONDS. C CALL PVMFMYTID(ITID) CALL PVMFGETPE(ITID, MYPE) WRITE(0,*) 'ERROR - DEBUG_BARRIER(',IDELAY, + ') TIMED OUT ON PE ',MYPE CALL FLUSH(0) CALL ABORT() STOP C END OF DEBUG_BARRIER. ENDA version using Craft Fortran:
program test real a( 100000 ) intrinsic irtc intrinsic my_pe c c loop on idelay. c call mybarriersetup c mype = my_pe() irtc0 = irtc() c do idelay= 5,1,-1 c c generate some unequal size tasks c do i = 1, 5000 * (mype+1) a(i) = sqrt( real(i+idelay)**3) enddo do k= 1,35 do i = 2, 5000 * (mype+1) a(i) = a(i-1) + a(i) + sqrt( real(i)**3 ) enddo enddo c c call a barrier that aborts if any processor waits more than idelay seconds c delay = idelay call mybarrier(delay) if (mype.eq.0) then write(6,*) 'idelay=',idelay,' ok at ', + (irtc()-irtc0)/150000000.0 ,' sec' call flush(6) endif if (a(5000*(mype+1)).eq.-999.9) then write(6,*) a(1),a(99),a(5000*(mype+1)) endif enddo end subroutine mybarrier(delay) c c this subroutine is a replcement for the standard call barrier routine c if any processor waits at a barrier more than delay seconds a call to c abort is made and all PE dump core c integer flags( 0:127 ) common /mine/ flags CDIR$ shared flags(:block) intrinsic my_pe mype = my_pe() flags( mype ) = flags( mype ) + 1. t1 = real( irtc( ) ) / 150000000.0 10 continue et = real( irtc( ) ) / 150000000.0 - t1 if( et .gt. delay ) then write(0,*) 'error - mybarrier(',delay, + ') timed out on pe ',mype do i = 0, N$PES - 1 if( flags( i ) .lt. flags( mype ) ) then write(0,*) 'pe ',i,' not at the barrier' endif enddo call flush(0) call abort() endif do i = 0, N$PES - 1 if( flags( i ) .lt. flags( mype ) ) then ! barrier goto 10 endif enddo end subroutine mybarriersetup() c c an initialization routine for the status flags c integer flags( 0:127 ) common /mine/ flags CDIR$ shared flags(:block) intrinsic my_pe flags( my_pe() ) = 0 call barrier() ! make sure we're in sync endWith the 1.2 PE version of totalview, a user can envoke these test programs as:
totalview a.outWhen executed from within totalview the progress of each PE is shown at the time of the abort initiated by the one PE that has waited at the barrier more than "delay" seconds.
If you know of similar techniques on the T3D please send them to me and I'll pass them on to the readers of the ARSC T3D newsletter.
List of Differences Between T3D and Y-MPThe current list of differences between the T3D and the Y-MP is:
- Data type sizes are not the same (Newsletter #5)
- Uninitialized variables are different (Newsletter #6)
- The effect of the -a static compiler switch (Newsletter #7)
- There is no GETENV on the T3D (Newsletter #8)
- Missing routine SMACH on T3D (Newsletter #9)
- Different Arithmetics (Newsletter #9)
- Different clock granularities for gettimeofday (Newsletter #11)
- Restrictions on record length for direct I/O files (Newsletter #19)
- Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
- Missing Linpack and Eispack routines in libsci (Newsletter #25)
- F90 manual for Y-MP, no manual for T3D (Newsletter #31)
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.