ARSC HPC Users' Newsletter 291, April 30, 2004
ARSC Migration to New Storage Server and System-Wide Outage
Start Time: Thursday, 5/6/04, 8:00 a.m. Alaska time End Time: Monday, 5/10/04, 8:00 a.m. Alaska time Duration: 96 hours Affected Systems: seawolf, chilkoot, yukon, klondike, iceberg, icehawk (These systems will not be accessible during the outage.)
Reason for Outage:
The Arctic Region Supercomputing Center (ARSC) is bringing on board its new storage server, seawolf. This storage server will service all sensitive systems at ARSC, including: chilkoot, yukon, icehawk, klondike and iceberg. Seawolf will provide more storage capacity as well as improved bandwidth for file transfers. In addition, chilkoot and yukon will be de-commissioned this fall.
On May 6-10, the directory trees on chilkoot and yukon will be duplicated on seawolf. (The seawolf directories will initially contain pointers to the actual files on the old Crays. Accessing a file on seawolf for the first time will cause retrieval of the data from the Cray while subsequent accesses of a file on seawolf will not involve the Cray.)
The directories on chilkoot and yukon will be moved as follows:
- All files on /allsys$HOME will be moved to $ARCHIVE/allsys, also referenced as $OLDALLSYS.
- All yukon home directory files will be moved to $ARCHIVE/yukon, also referenced as $OLDYUKONHOME.
- All chilkoot home directory files will be moved to $ARCHIVE/chilkoot, also referenced as $OLDCHILKOOTHOME.
After this change, icehawk and klondike users will need to look for their /allsys$HOME files in $ARCHIVE/allsys (also referenced as $OLDALLSYS).
Users need take no action to move the data from chilkoot or yukon to seawolf. ARSC will be running scripts in the background that will retrieve user data. As there is a total of over 60 TB of data to move off of chilkoot and yukon, the transfer process is expected to take 3-4 months for large users of DMF. For most users, the transfer of data will happen during the downtime. Disk quotas for $ARCHIVE have been deferred until later.
As of May 10, chilkoot and yukon users will have new home directories with 100 MB quotas. All existing dot files (.login, .cshrc, .forward, etc.) will be moved to the $OLDCHILKOOTHOME or $OLDYUKONHOME directories. New dot files will be created in your new home directories but you may want to replace these with any custom dot files you have already created.
Note: Our 10-day purge policy is still in effect for $WRKDIR and $SCRATCH so be sure your files are saved prior to this down time.
We will be updating our web pages and news items soon. Please watch for more information on our new storage capabilities. If you have any questions or concerns, please contact ARSC Consult at "firstname.lastname@example.org" or (907) 450-8602.
IBM Large Page Memory
[ Thanks to Jeff McAllister for this article... ]
Introduction To Large Page Memory
IBM POWER4 processors (as found in ARSC's iceflyer and iceberg machines) support "large" and "small" page memory access. ARSC is evaluating letting users choose which type of memory use suits their applications best on iceberg.
Large page memory will only improve runtimes for a subset of codes. In many cases they will offer no improvement and may even make the code run slower.
In theory, since large pages utilize memory in 16 MB chunks (vs 4 KB for small pages) the page table for an application could have ~4000x fewer entries. Also translation lookaside buffer (TLB) misses are reduced because larger memory ranges are mapped. Of course use of large pages also carries some overhead. In practice, it is difficult to predict whether the effect will be a net gain or loss of performance without actually running the program.
As an added complication, the iceberg nodes are not homogenous. All have 128 MB of large page memory configured for MPI buffers. Some nodes have most of their memory configured as large pages while others have only the large page MPI buffers. Check "news queues" on iceberg for the latest information on how to specify that your job should run on nodes configured for large page memory.
Enabling Large Pages
There are three ways of "suggesting" that an application should use large pages:
- compile with -b lpdata xlf90 myprogram.f90 -qsuffix=f=f90 -b lpdata -o myprogram
- modify the large page data flag in the executable with ldedit ldedit -b lpdata ./myprogram
- set the LDR_CNTRL environment variable. *** this mode is not recommended ***
Executables can be returned to small page memory access (the default state) by recompiling without -qlargepage or turning the large page data flag off with:
ldedit -b nolpdata ./myprogram
The following types of applications should benefit most:
- codes heavily dependent on MPI internode bandwidth. With our current incarnation of Federation high-performance switch technology, applications MUST be enabled for large page memory to achieve the best performance. Our tests indicate approximately 50% better MPI bandwidth is possible if an application is large page enabled.
- codes heavily dependent on memory bandwidth using heap (dynamically allocated) vs stack (statically allocated) arrays. This includes programs which make extensive use of Fortran 90 array syntax.
Here are some guidelines for using large pages on iceberg:
- If your code does not fit the two scenarios above, you probably don't need to worry about large pages.
- You can enable (that is "suggest") large page use for your application and still run it on nodes configured without lots of large page memory. If the configured large page memory is not sufficient, your large-page-enabled application will be run in small page memory. Main code performance will be the same as if large page memory was not enabled, but you will get the benefit of the large page buffers for MPI bandwidth.
- Some run characteristics may be a little quirky on the large page nodes. If your application was compiled without large pages enabled, the node may run out of small page memory. Even if your application is large page enabled, stack memory (static allocations) is taken from the small page pool. In addition, there may be problems with library calls and debuggers.
Watch for more info. Please let the ARSC consultants (email@example.com) know if you have any questions, concerns, or observations about large pages.
Klondike/Iceberg Allocations and Usage
Klondike is scheduled to transition from pioneer/early-user status to a fully allocated system as of May 10th, 2004. This means that all CPU usage in the foreground queues (default, Special, and gcp) will be deducted from your FY2004 allocation. Once your project has run out of allocation, you will need to either contact your S/AAA (for DoD sponsored projects) or the ARSC Help Desk (for non DoD projects) to be granted an additional allocation or submit your work to the background queue (bkg).
Iceberg is scheduled to make a similar transition to a fully allocated system as of 1 June, 2004. All usage in the foreground classes (default, data, debug, lpage, p690, special, and single) will be counted against your allocation.
We will also begin reporting usages for DoD projects to the Information Environment (IE) database on May 10th (klondike) and June 1st (iceberg).
Please note that at the time of this writing that the getUsage and getFYrpt tools are not yet available on either klondike or iceberg. We hope to have them available in time for the transition, but may not be able to do so. Please contact "firstname.lastname@example.org" if you need to know the current allocation status for your project or login to IE and refer to the cumulative usage for your project.
X1: New OS and Programming Environment
Klondike has been running under UNICOS/mp 2.4.14 since last Friday.
Programming Environment 5.2 (PE5.2) is now available for user and staff testing. At this point:
PrgEnv.old : points to the former PrgEnv (5.0) PrgEnv : points to PE 5.1 PrgEnv.new : points to PE 5.2
There has been no change to the default programming environment, PrgEnv, or to PrgEnv.old.
All interested users are encouraged to try the new PE. To so so, switch to the new PE using the following command, and then recompile your code:
module switch PrgEnv PrgEnv.new
Verify your results, and let us know of any problems or improvements you detect. PE5.2 will be available for testing (as a non-default programming environment module) for a limited time before we make it the default.
Hopeful Faculty Campers : We Need To Hear From You!
Faculty Camp 2004 Dates: August 2 - August 20 Expression of Interest Due: May 19, 2004
ARSC's three-week Faculty Camp that begins on August 2, covers the basics of programming high performance computers, familiarization with visualization software, and an introduction to the use of collaborative environments, such as data sharing or the Access Grid Node.
- Need your programs to run faster?
- Overwhelmed by the quantity of your data and frustrated searching through it?
- Tired of using flat graphs and contour plots to explain your discoveries?
- Want to see your data visualized in 3D?
Faculty Camp is a series of seminars and hands-on experiences. It is presented by ARSC staff, UAF/ARSC Joint Faculty and current users and provides "campers" with assistance and expertise while they focus on independent work and individual projects.
Those interested are invited to submit a short, 250-word description of the skills you want to develop and the project they intend to pursue. This will assist ARSC in organizing events and speakers to address the specific needs of the attendees. Please submit your text in ASCII or pdf format no later than May 19.
Successful applicants will be notified by May 28. Those accepted are expected to participate full-time during the three weeks of Faculty Camp.
For more information, contact Tom Logan: email@example.com, 907-450-8624
ARSC Staff Move Is "Complete"
Folks are still unpacking and setting up, but we've essentially completed the move to the West Ridge Research Building (WRRB).
The HPC systems and some staff members who "touch" them, remain in the Butrovich Building. Everyone has a new phone number. See the directory:
CUG and Newsletter
The Cray User Group conference begins May 17th in Knoxville, hosted by ORNL. There's still time to register:
I'll be there, and doing extra travel, so this newsletter is taking a month off. Enjoy spring-time, everyone, and I look forward to seeing some of you in The Smokies.
Quick-Tip Q & A
A:[[ MPI_Rsend ("ready" send) is supposed to give better performance than [[ MPI_Send, but the documentation tells me it's an error to call it [[ unless the matching receive has already been posted. [[ [[ How can I make sure the receive has already been posted? Is it [[ really worth it? Here's one approach: Sender: ------- MPI_Recv : Blocking receive. Wait here until the task to which I'm sending data tells me it's ready. MPI_Rsend : Send the data Receiver: --------- MPI_Irecv : Post a non-blocking receive for the data. MPI_Isend : Tell the task from which I will receive the data that I'm ready. [... do other work if possible ...] MPI_Waitall : Make sure data arrives before I attempt to use it. -- To avoid confusion, it's probably best to not use MPI_ANY_SOURCE or MPI_ANY_TAG, and instead code with explicit values for "source", "dest", and "tag" arguments. Q: I want the X1 and SV1 compilers to completely unwind this loop nest: do i=1,3 do j=1,3 do k=1,3 ... enddo enddo enddo There's no "!DIR$ UNWIND" directive, however. Any suggestions?
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.