ARSC T3D Users' Newsletter 65, December 15, 1995
NQS - T3D Miscommunication
At ARSC, we recently had an interesting situation where PEs were available but couldn't be used. At that time, the T3D mppview display looked something like this:
_________________________________________________________________________ \ . . . . smith smith smith smith \ \ . . . . smith smith smith smith \ \________________________________________________________________________\ _________________________________________________________________________ \ . . . . smith smith smith smith \ \ . . . . smith smith smith smith \ \________________________________________________________________________\ _________________________________________________________________________ \ ess ess ess ess . . . . \ \ ess ess . . . . . . \ \________________________________________________________________________\ _________________________________________________________________________ \ ess ess ess ess . . . . \ \ ess ess . . . . . . \ \________________________________________________________________________\A user now submitted a 64 PE job through the NQS queues, but his job did not run. From NQS's point of view (as shown with the qstat -a command), the 64 PE job was running! But for the T3D, the job could not run because the two idle blocks of 32 PEs were not 'torus contiguous' (I just made up this word).
NQS seems only to keep track of how many PEs are in use and not whether they are arranged in blocks that can be used on the T3D. This miscommunication between NQS and the T3D is a known problem but there's not much that can be done. What made this case even worse was that the 64 PE job not only could not run but was blocking smaller 2 and 8 PE NQS jobs from running also.
At ARSC, I try to watch out for these situations but there is no operator watching the T3D and there's not much an operator could do anyway. If you feel that the T3D is in this situation please call Mike Ess at 907-474-5405 or send e-mail and I'll see what I can do. In the above case, if either ess or smith stopped their jobs then the 64 PE job could find a 'torus contiguous' block of PEs and everything would be OK.
HPF and CRIThe following article describes the recent announcement between the Portland Group and CRI at Supercomputer '95 and appeared in HPCwire:
> Subject: 7573 Cray Research Chooses The Portland Group's HPF Product 12.06.95 > > Cray Research Chooses The Portland Group's HPF Product 12.06.95 > NEWS BRIEFS LIVEwire > ============================================================================= > > San Diego, Calif. -- Cray Research and The Portland Group Inc. (PGI) > announced at Supercomputing '95 that they have signed a letter of intent for > Cray to offer PGI's pghpf High Performance Fortran (HPF) compiler on all of > its computing systems, including the recently announced CRAY T3E(TM) scalable > parallel system. > > "We are looking forward to offering our customers and prospects an HPF > product," said Mike Booth, vice president of Cray's software division. "Our > corporate strategy is to continue to leverage leading technologies from other > companies, while applying Cray core competencies in parallel computing. We > were seeing interest from our customers in HPF and conducted a very thorough > technical evaluation. PGI's product clearly emerged as the HPF of choice and > we believe that this product can provide ease-of-use and robustness to our > users wanting HPF." > > HPF extends ISO/ANSI Fortran 90 to support implicit data parallel > programming. It provides all the power of Fortran 90, including array syntax, > array intrinsics, and dynamic storage allocation. In addition, HPF directives > support the distribution of data among processors, the alignment of data > objects to one another, and assertion of the independence of parallel loop > iterations. HPF is the de facto standard for implicit parallel programming > for shared- and distributed-memory systems. > > PGI's pghpf product allows users to run applications unchanged on Cray > systems ranging from the CRAY J90(TM) low-cost compact supercomputer to the > company's high-end CRAY T90 system and the company's scalable parallel > systems, the current CRAY T3D and the new CRAY T3E systems. The pghpf > compiler has already proven effective on several large applications in the > areas of fluid flow, wave simulation, particle simulation, and 3D reservoir > modeling. > > "This is a real milestone for HPF as well as PGI," said Douglas Miles, > director of marketing at PGI. "We are pleased to see Cray Research offer HPF > on its systems and that our product, after rigorous analysis, emerged as the > leader for Cray. We look forward to seeing our product move into Cray's > customer base and to serving the High Performance Fortran needs of these > users. The fact that Cray, a leader in high-performance parallel computing, > has selected pghpf is a strong vote of confidence in our HPF technology." > > The PGI HPF compiler is currently available through PGI on the CRAY CS6400 > symmetric multi-processing server and the CRAY T3D system. Users of pghpf on > the CRAY T3D system can port and develop HPF applications that will run > unchanged on the soon-to-be-available CRAY T3E. Cray said it expects to > directly offer pghpf on these systems and on Cray's parallel vector > supercomputers beginning in early 1996. pghpf will be available on the CRAY > T3E when volume shipments begin next year. > > PGI is offering product demonstrations of pghpf and related HPF program > development tools in its booth (#911). > > HPCwire has released all copyright restrictions for this item. Please feel > free to distribute this article to your friends and colleagues. For a free > trial subscription, send e-mail to firstname.lastname@example.org. >
Using the PGI HPF Compiler at ARSCSince May of this year, I have been working with the PGI HPF compiler for the T3D. Below is a description of how to access the 2.0 version of this compiler on Denali. PGI has given ARSC permission to use an evaluation copy and PGI and I would like to hear about your experiences.
accessing the PGI HPF compilers
Add these lines to the end of your .cshrc file:
setenv PGI /tmp/ess/pgi setenv MANPATH "$MANPATH":/tmp/ess/pgi/man setenv PATH "$PATH":/tmp/ess/pgi/t3d/bin setenv LM_LICENSE_FILE /tmp/ess/pgi/license.dat
A typical makefile for using pghpf might be:
F77 = /mpp/bin/cf77 HPF = pghpf .SUFFIXES: .o .hpf .f .hpf.o: $(HPF) -c -Minfo -Mautopar -Mkeepftn $< .f.o: $(F77) -c $< smooth: smooth.o second.o /mpp/bin/mppldr -o smooth smooth.o second.o (export MPP_NPES=1 ; smooth ) hpf: hpf.o second.o pghpf -o hpf hpf.o second.o (export MPP_NPES=8 ; hpf ) clean: -rm -r *.o smooth hpf.f hpf mppcoreThe command 'make hpf' produces:
12, 2 FORALLs generated 39, 1 FORALL generated 54, 1 FORALL generated 70, SUM reduction generated /mpp/bin/cf77 -c second.f pghpf -o hpf hpf.o second.oLinking:
(export MPP_NPES=8 ; hpf ) 221 0.14845792E-07 0.000070 0.295780 0.339684Several compiler switches are immediately useful:
-Minfo gives a summary of what transformations were done like the FORALLs generated -Mautopar tells pghpf to generate code for multiple processors -Mkeepftn has pghpf generate the parallelized Fortran 77 program that the user can examine
The program smooth.f (see below) is the same program discussed in the Fall CUG article "An Evaluation of High Performance Fortran Products on the Cray-T3D." Besides being an HPF compiler, pghpf, can be used to parallelize Fortran 77 programs with minimal effort. A diff between smooth.f and hpf.hpf shows:
4a5,6 > !HPF$ PROCESSORS NUM_PROC(8) > !HPF$ DISTRIBUTE (*,BLOCK) ONTO NUM_PROC :: P0, P1 35a38 > !HPF$ INDEPENDENT 49a53 > !HPF$ INDEPENDENTSo these minimal changes are:
- describe the number of processors
- specify which arrays are to be distributed
- specify which DO loops have independent iterations
Table 1times (seconds) for relaxation example on t3d on each 3 phases
initialization relaxation residual smooth.f 0.000571 1.343860 0.448627 hpf.hpf, 1PE 0.000494 1.456697 1.002058 hpf.hpf, 2PE 0.000283 0.783687 0.600970 hpf.hpf, 4PE 0.000155 0.454907 0.417283 hpf.hpf, 8PE 0.000070 0.296023 0.338785
There is a lot of documentation that goes with the pghpf compiler, potential users should check out:
man pghpfand there is a host of manpages for individual HPF functions:
ls /tmp/ess/pgi/man/man3 all_prefix.3f iall_prefix.3f minval_prefix.3f all_scatter.3f iall_scatter.3f minval_scatter.3f all_suffix.3f iall_suffix.3f minval_suffix.3f any_prefix.3f iany.3f number_of_processors.3f any_scatter.3f iany_prefix.3f parity.3f any_suffix.3f iany_scatter.3f parity_prefix.3f copy_prefix.3f iany_suffix.3f parity_scatter.3f copy_scatter.3f ilen.3f parity_suffix.3f copy_suffix.3f iparity.3f popcnt.3f count_prefix.3f iparity_prefix.3f poppar.3f count_scatter.3f iparity_scatter.3f processors_shape.3f count_suffix.3f iparity_suffix.3f product_prefix.3f grade_down.3f leadz.3f product_scatter.3f grade_up.3f maxloc.3f product_suffix.3f hpf_alignment.3f maxval_prefix.3f sum_prefix.3f hpf_distribution.3f maxval_scatter.3f sum_scatter.3f hpf_template.3f maxval_suffix.3f sum_suffix.3f iall.3f minloc.3fIn the directory /tmp/ess/pgi/doc/hpf/html there is a whole collection of manuals and manpages available in html format:
drwxr-xr-x 2 ess uaf 4096 Dec 11 16:01 faq drwxr-xr-x 2 ess uaf 4096 Dec 11 16:01 man1 drwxr-xr-x 2 ess uaf 4096 Dec 11 16:01 man3 -r--r--r-- 1 ess uaf 964 Dec 11 16:01 pghpf.index.html drwxr-xr-x 2 ess uaf 4096 Dec 11 16:01 ref_manual drwxr-xr-x 2 ess uaf 4096 Dec 11 16:01 release_notes drwxr-xr-x 2 ess uaf 4096 Dec 11 16:01 users_guideABe careful with these html documents as they can be big:
faq 5 pages release notes 33 pages users_guide 137 pages reference manual 263 pagesIf you have any questions about using pghpf on Denali, please e-mail me your questions and I'll get an answer.
parameter( M = 100, N = 100, MAXTIME = 1000 ) real p0( M, N ), p1( M, N ) real t1, second, tset, tupdate, terror, totupdate, toterror real slamch, error integer i, j, k c c set initial conditions domain and spike c t1 = second( ) do 20 j = 1, N do 10 i = 1, M p0( i, j ) = 0.0 p1( i, j ) = 0.0 10 continue 20 continue tset = second( ) - t1 p0( M / 2, N / 2 ) = 1.0 c c time step loop c totupdate = 0.0 toterror = 0.0 do 100 k = 1, MAXTIME c c smoothing stencil c if( k/2*2 .ne. k ) then t1 = second() c c odd time step c do 40 j = 2, N-1 do 30 i = 2, M-1 p1(i,j) = ( p0(i+1,j)+p0(i-1,j)+p0(i,j+1)+p0(i,j-1) & + 4.0 * p0(i,j) ) / 8.0 30 continue 40 continue tupdate = second() - t1 else t1 = second() c c even time step c do 60 j = 2, N-1 do 50 i = 2, M-1 p0(i,j) = ( p1(i+1,j)+p1(i-1,j)+p1(i,j+1)+p1(i,j-1) & + 4.0 * p1(i,j) ) / 8.0 50 continue 60 continue tupdate = second() - t1 endif c c calculate change c error = 0.0 t1 = second() do 80 j = 2, N-1 do 70 i = 2, M-1 error = error + ( p0( i, j ) - p1( i, j ) ) ** 2 70 continue 80 continue terror = second( ) - t1 c write( 6, 600 ) k, error, tupdate, terror c c stopping criteria c error is less than the square root of the machine epsilon c totupdate = totupdate + tupdate toterror = toterror + terror if( error .le. sqrt( slamch( 'e' ) ) ) goto 101 100 continue 101 continue write( 6, 600 ) k, error, tset, totupdate, toterror 600 format( i5, e16.8, f10.6, f10.6, f10.6 ) end
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.