ARSC HPC Users' Newsletter 227, Aug 31, 2001
ARSC IBM Workshop, Sept 4-7
Here is the list of talks planned for the ARSC IBM Workshop next week. The final schedule will be available shortly, at:
Hands-on SP training conducted IBM's ACTC will also be noted in the final schedule.
DoD Challenge Projects and Chemistry Applications on Tempest (16-way Power3 IBM SP) & Huinalu (2-way Pentium III Linux Cluster) at Maui High Performance Computing Center James Newhouse, Maui High Performance Computing Center
Dr. James Newhouse will be presenting case studies and performance comparisons from DoD Challenge Project Codes such as Gamess ("New Materials Design") and Cobalt ("Analysis of Full Aircraft with Massive Separation Using Detached-Eddy Simulation"). Experiences with production codes such as Noam Bernstein's ("Atomistic Simulation of MEMS Devices via the Coupling of Length Scales") on the 16-way Power 3 nodes and Cobalt, Gaussian and Gamess on Huinalu will be covered.
Clusters and IBM SPs Frank Gilfeather, Albuquerque High Performance Computing Center
Dr. Frank Gilfeather will discuss current HPC initiatives and the state of cluster computing.
Bioinformatics at IBM Peter Ungaro, IBM
Presentation of IBM initiatives related to BioInformatics.
Bioinformatics and Astrophysics George Lake, Institute for Systems Biology
Factors Affecting Decision For A Distributed Advanced Cluster Computer System (DACCS) Kevin Kennedy, Redstone Arsenal
This presentation will include A Multi-DACCS comparison using the fluid dynamic flow solver Craft. The technical issues involving computer performance, the craft code computational methodolgy, the run time effects on multi processors, and the results for each test case will be addressed. In addition to the computer performance recommendations, the computer cost and future supportability factors will be presented.
HPCMP Benchmarking Activities Bill Ward, ERDC DSRC (High Performance Computing Modernization Program)
The objective of the Technology Insertion 2001 (TI-01) benchmark project was to provide the Department of Defense (DoD) High Performance Computing Modernization Program (HPCMP) with performance data on available high performance computing systems, the results of which would provide guidance for system procurement. To that end, a benchmark test package was designed, implemented, tested, and distributed to prospective vendors. The test package itself consists of synthetic tests to measure peak system performance, dedicated tests to measure peak single-application performance, and throughput tests to measure typical operational performance. User surveys and system utilization data were used to identify the most highly used codes as candidates for inclusion in the test package. Furthermore, the same utilization data were used to produce a system usage profile in order to construct two generic throughput tests representative of daily DSRC production workloads. Details of the design process and test package timing results from a number of Government-owned systems are presented.
Using Exact Solution Test Problems for System Validation, Algorithm Testing and Performance Tuning Lorie Liebrock, Liebrock-Hicks Research
Test problems with known exact solutions provide a solid foundation for many areas of development in computational science. They can be used to validate the correct operation of new computer systems and to track down errors in such systems. They provide a quantitative measure of the difference between algorithms, rather than the traditional approach of making a new algorithm match the results of the old accepted one. They provide a basis for comparing performance of different algorithms on different machines, with the results in terms of efficacy, which combines both runtime and accuracy measures. These uses of exact solution test problems will be discussed. Further, a few exact solution test problems will be presented that span the spectrum from simple integration to material dynamics.
The SMS Library for Portable Parallelism Guy Robinson & Kate Hedstrom, Arctic Region Supercomputing Center
Some classes of problems are computed using large rectangular arrays. For many of these, the optimal parallelization is to use domain decomposition into equal sized blocks. The Forecast System Laboratory (FSL) has devised a software library to ease the development of parallel programming for this class of problems. It sits as a software layer between the user code and the underlying MPI or SHMEM code. The goal is to allow portable code development with minimal overhead.
The SMS library has recently been ported to the IBM SP architecture and the ROMS ocean model has been recently ported to use SMS. The SMS library will be described as will our experiences with it on the IBM.
Multilevel Parallelism: Case Studies and Lessons Learned Daniel Duffy (speaker), Tom Oppe, Rebecca Fahey, and Mark Fahey, Computational Science and Engineering Group U.S. Army Engineer Research and Development Center DSRC Vicksburg, MS<p>A major focus of the Computational Science and Engineering (CS&E) Group at the U.S. Army Engineer Research and Development Center (ERDC) DoD Supercomputing Resource Center (DSRC) has been the migration of codes to the various platforms at the ERDC DSRC. Since each computer has unique properties that may be exploited in different ways, different parallel programming paradigms must be adopted in order to utilize the machines to their fullest potential. Recently, the combination of the Message Passing Interface (MPI) with OpenMP threads has resulted in multilevel parallel (MP) codes that run in a distributed environment while taking advantage of the shared memory within compute nodes. In many cases, the resulting MP programs offered a significant speedup compared with codes using MPI or OpenMP alone.
This presentation will discuss the benefits and pitfalls of multilevel parallelism using MPI combined with OpenMP threads. Simple representative codes along with three specific examples that the CS&E Group has worked on over the past year will be discussed. In particular, OpenMP directives have been included in the MPI versions of STWAVE, CGWAVE, and SARA-3D in order to significantly reduce the elapsed wall-clock time of the code.
Lodging and Local Attractions:
A Bug's Life, MPI, PVM, and MPI_PACKED
Local experience plus anecdotal evidence suggests that some message passing codes work fine on one platform and fall apart on others. Some synchronization errors might be hidden on a low latency platform, others might be hidden on a high latency platform, and there are issue of MPI implementation. For more, see the article, "Portable MPI?" in:
What follows is an example of a programming error that was hidden when the code was tested on the T3E, but quickly manifested itself on the SP. The bug, we think, is no longer with us.
The code was being converted from PVM to MPI. For correct operation, it required that pairs of messages sent from worker processes to a master process arrived as a pair.
The first version of the MPI code lacked the logic to guarantee this ordering, and it was possible for pairs of messages from different workers to get interleaved (or shuffled).
The pair of messages is:
- an offset into an array
- data to be stored in the array, offset by the value received in the first message
Here's the incorrect version. This snippet is from the code for the master process (the worker has a matching pair of MPI_Sends ):
MPI_Recv (&recv_datastart, 1, MPI_INT, MPI_ANY_SOURCE, RESULTS, MPI_COMM_WORLD, &status); MPI_Recv (data + recv_datastart, datasz, MPI_INT, MPI_ANY_SOURCE, RESULTS, MPI_COMM_WORLD, &status);
This shows the danger of using MPI_ANY_SOURCE . Here's the updated version, which uses one of the useful fields available in the MPI_Status structure. (The code for the workers did not require any change):
MPI_Recv (&recv_datastart, 1, MPI_INT, MPI_ANY_SOURCE, RESULTS, MPI_COMM_WORLD, &status); worker_pe = status.MPI_SOURCE; MPI_Recv (data + recv_datastart, datasz, MPI_INT, worker_pe, RESULTS, MPI_COMM_WORLD, &status);
It would be comforting to use different tags for the two types of messages. It doesn't appear to be necessary, though, because MPI guarantees that messages from one process will be received in the order sent, and, at this point in the logic, there are no other message types possible from any of the workers.
A look at the original PVM logic may prove interesting. Using the standard PVM paradigm, the pair of messages was packed up, sent, and received as a unit. Thus, there was no chance of interleaving. The code used pvm_recv .
" pvm_recv (1, -1) ", accepts a message with any tag from any task. The message is unpacked with subsequent invocations of pvm_upk . (The workers, of course, are doing two symmetric pvm_pk 's and a pvm_send .) Here's the logic used by the master process:
bufid = pvm_recv (-1, -1); if ((err = pvm_bufinfo (bufid, &bytes, &type, &worker_tid)) != PvmOk) printf ("Error: master: pvm_bufinfo: %d\n", err); pvm_upklong (&recv_datastart, 1, 1); pvm_upklong (data + recv_datastart, datasz, 1);
Seeing the original PVM code, it's clear that a direct translation from PVM was actually available. MPI provides functions for packing and unpacking messages and the MPI_PACKED type for sending and receiving packed messages.
Coded up this way, here is the master's side of this exchange:
MPI_Recv (packbuf, PACKBUFSZ, MPI_PACKED, MPI_ANY_SOURCE, RESULTS, MPI_COMM_WORLD, &status); packbuf_pos = 0; MPI_Unpack (packbuf, PACKBUFSZ, &packbuf_pos, &recv_datastart, 1, MPI_INT, MPI_COMM_WORLD); MPI_Unpack (packbuf, PACKBUFSZ, &packbuf_pos, data + recv_datastart, datasz, MPI_INT, MPI_COMM_WORLD);
Two other advantages to the packed approach would have been a 2-fold reduction in the number of messages sent, possibly less time spent waiting, and increased bandwidth (since larger messages get better bandwidth--to a point). A disadvantage is the overhead of packing and unpacking.
Another MPI approach is available, MPI_Types . But this brings us back to a motivation for sticking with MPI_Send and MPI_Recv in the first place: the observation that MPI codes can be difficult to port, and the more esoteric the functions used, the more likely you are to run into trouble.
Let us know if you have any little lessons like this to share, or if you have a better way to solve the problem, above. We welcome contributions!
Parallel Scientific Computation Course: UAF/UM/UNM
A course in Parallel Scientific Computation is being taught collaboratively over the Access Grid between UAF, the University of Montana, and the University of New Mexico.
At UAF, it is listed as follows:
Parallel Scientific Computation
PHYS 693-FO1, CRN 86657
This course will introduce the concepts of parallel scientific computation to students; the course is primarily in support of graduate students in the physical sciences with research interests requiring the application of parallel computation techniques to specific science applications. Topics will include the basics of problem decomposition and how to identify the necessary communication, with particular attention to scalability and portability of the algorithm. Techniques to assess the reliability, stability and validity of large-scale scientific computations will also be covered. After successful completion of the course, students will be bale to solve scientific problems on the parallel computers commonly found in the modern research environment.
- Permission of instructor
- Graduate standing in physical sciences
- Experience with FORTRAN or C programming
Contact Guy Robinson, firstname.lastname@example.org for details.
How to Change Your Kerberos Passphrase
ARSC has instituted aging for kerberos passphrases. With this change, they will expire after six months, unless changed. Most people can use software already installed on their local workstation or PC.Unix/Linux:
Installed along with kinit, krlogin, etc., on your Unix or Linux host should be a copy of "kpasswd." Invoke it with your kerberos principal. For instance
$ kpasswd myname@ARSC.EDU
You will be prompted for your old passphrase, a new passcode from your SecurID card, and finally, the new desired passphrase (twice).Mac/PC:
Click the "change password" button in the Macintosh "Kerberos5 Configuration Manager" or the windows "KRB5.EXE" dialog windows.Notes: =======================================================
If you're behind a firewall, note that while port 88 must be opened up to use kinit, ktelnet, and krlogin, an additional port, 769, must be opened up to use kpasswd.
Misleading error messages:
Entering a bad passcode can result in a message like this:
kpasswd: Cannot establish a session with the Kerberos administrative server for realm ARSC.EDU.
kpasswd is not available under UNICOS or UNICOS/mk:
(Use your local workstation/PC)
Options for ARSC Users without SGI Accounts:
If, due to a firewall or other problem, you can't get your password changed from your local workstation/PC, we can fire up your ARSC SGI account (and inactivate it again when you're done). Contact:
Requests for ARSC Seminars?
ARSC is putting together its schedule for fall training sessions. If there's some topic you'd like covered, send email to:
Quick-Tip Q & A
A:[[ My favorite web site uses html frames. Can I capture the URL of a [[ frame, so I can send it to someone? Or do I have to recite every [[ link I follow? # Five readers responded: Nic Brummell, Evelyn Price, Derek # Bastille, Brad Chamberlain, and Rich Griswold. # # Here are all the solutions, with the (surprisingly little) # duplication removed: Netscape: ========= o Yes, if you're using netscape. Move the mouse arrow into the frame, right click on the mouse and drag down to "send page". Then, follow the usual routine for sending mail from your browser. o With Netscape, right click in the frame and select "View Frame Info". The browser allows you to copy the URL. o In Netscape: right-click on the frame and select "Open Frame in New Window." The URL of the frame will appear in netscape's location bar, from which it can be cut and pasted into other documents. Lynx: ===== o If you use a newer version of Lynx, you will be given a navigation menu that will allow you to view each frame individually. Select the frame you want the URL for, then hit the equals (=) key to view the frame info. IE: === o In IE: right-click on the frame and select "Properties." The URL of the frame will appear in the properties window that pops up and can be cut and pasted into other documents. general... ========== o Cut-n-paste the address from the page source. For example, www.wired.com uses frames. If you want to, say, get rid of the ads, do a 'View Source' in your favorite browser, copy the /news/nc_index.html src address and paste it onto the end of the main URL (some sites put the whole URL into the <frame src="" tag>). Et Voila! you now have just the URL you want. Example: URL: http://www.wired.com Page Source: <!-- Vignette StoryServer 5.0 Fri Aug 17 16:21:13 2001 --> <html> <head> <title>Wired News</title> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> </head> <frameset rows="1*,78" topmargin="0" leftmargin="0" marginwidth="0" marginheight="0" framespacing="0"> <frame src="/news/nc_index.html"> <frame src="/news/ot_ad.html" scrolling="NO"> </frameset> </html> After Cut-n-paste: http://www.wired.com/news/nc_index.html Q: Is RESHAPE broken on every system? Here's what it's supposed to do, from my documentation: "The RESHAPE intrinsic function constructs an array of a specified shape from the elements of a given array." Here's my test program: !-------------------------------------------------------------------- program shape_change implicit none real, dimension (30,40) :: X X = 1 print*, "Old shape: ", shape (X) X = reshape (X, (/ 20,60 /), (/ 0.0 /) ) print*, "New shape: ", shape (X) end !-------------------------------------------------------------------- And here's the result from an SP (Crays and SGIs give exactly the same result): ICEHAWK1$ xlf90 shape.f ** shape_change === End of Compilation 1 === 1501-510 Compilation successful for file shape.f. ICEHAWK1$ ./a.out Old shape: 30 40 New shape: 30 40
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.