ARSC HPC Users' Newsletter 294, June 25, 2004
ARSC Welcomes Three New Senior Staff Members
Chief Scientist, Dr. James O'Dell
Jim O'Dell is a recognized leader in the fields of education, aerospace, defense and electronics. Formerly the Deputy Director of Brown University's Technology Center for Advanced Scientific Computing and Visualization, O'Dell brings with him solid experience in the HPC community, including work at MIT, Los Alamos, and Brown. Among other achievements, O'Dell played a significant role in establishing the Advanced Computing Laboratory at Los Alamos National Laboratory. He received his PhD at Columbia University.
Part of Jim's role as Chief Scientist will be helping to set and implement a deeper scientific research focus within the center and to expand the center's role within the state and broader research communities. (Jim will start at ARSC on July 5th.)
Research Liaison, Richard Barrett
Richard Barrett specializes in parallel computing, the use of advanced-computer architectures, programming methodology, tools for parallel computers, and numerical algorithms in linear algebra. His research includes the development, testing and documentation of high quality software involved in many aspects of scientific computing. He has contributed to the design and implementation of several large scale applications, mainly high fidelity multi-physics simulations, as well as several open source or licensed software packages, including UPS, L7, Maya, and MCNP.
Barrett has been a technical staff member at Los Alamos National Laboratory for the past ten years, contributing to a variety of code development projects, mainly within the ASCI program. Prior to going to Los Alamos he was a charter member of the Innovative Computing Laboratory at the University of Tennessee.
As Research Liaison, Richard's focus will include working with a variety of researchers on significant and collaborative problems in UAF's primary computational areas. He also hopes to participate in and contribute to the computational science program offered through the UAF physics department.
Vector Specialist, Lee Higbie
Lee worked at Cray Research for several years vectorizing codes for the Cray-1 and Cray-1S before joining the design team for the Cray X-MP. In other stops along the way, he worked at Seki Systems and most recently at the Engineer Research Development Center (ERDC) DSRC.
Lee has lived in most sections of the country and has visited Alaska several times but has not lived here before. He's made decennial trips to Katmai, following his grandfather who discovered the Valley of Ten Thousand Smokes. Lee is apprehensive about the winter because he is coming from the hottest state. It has been years since he experienced sub-zero temperatures and decades since he's seen thirty below.
As Vector Specialist, Lee will lend his extensive experience on Cray systems to porting and optimizing today's large programs for the X1.
Report on FVCOM Workshop
[ Thanks to Kate Hedstrom of ARSC for sharing her trip report! ]
Last week I went to the first ever FVCOM workshop in New Bedford, Massachusetts. FVCOM is the Finite Volume Coastal Ocean Model I picked up last year on my previous trip to New Bedford. FVCOM is a derivative of the Princeton Ocean Model (POM), but is on an unstructured triangular mesh. It is a quite new model and has recently been rewritten to be parallel and in F95. I have been waiting to get my hands on the new code and being there was the best way to get it and the new manual.
The schedule of events is that Tuesday was all lectures, topics including the model physics, numerical methods, coding, visualization with Matlab, and a few coastal applications.
On Wednesday we were split into three groups and we rotated through three presentations. The first of these was on grid generation using a PC Windows package called SMS. I don't use that package, but the tips on grid quality were good to know: don't let the minimum angle get below 30 degrees, and don't let adjacent triangles differ in area by more than a factor of two. The second presentation was hands-on going through the input files, compilation, and running the model, using a pre-defined setup on their Linux systems. The third presentation was a more detailed look at the Matlab postprocessing tools written by Jamie Pringle.
On Thursday, many people still wanted to learn more about the grid generation. Some of the rest of us wanted to talk more about the numerical methods. I won a hat for pointing out that at the edge of the domain there is excessive sensitivity to the choice of grid, such as this:
------------------- \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ -------------------as opposed to this:
------------------- \ / \ \ / \ \ / \ \ / \ \ / \ -------------------
Dr. Changsheng Chen, head of the FVCOM model development project, is trying to promote bug reports by passing out hats for them. In this case, he had a sneaking suspicion that there was a problem, but no clear example of when it happens. They have a policy of always interactively editing the grids to make them look like the upper grid around the open boundaries.
On Friday I talked to FVCOM team member Dr. Haosheng Huang about my test problems. He has since pointed out that I made a mistake when trying to linearize the model for one test - my basin test case is now running much better, but is still sensitive to the grid.
A note on grid generation: the SMS package is not something I am interested in, for one thing it doesn't support a sizing function. One other option that is available is a Matlab interface to a code called "triangle". I have been using Cubit, which has a new GUI version for Sun, Linux, and Windows. The old Cubit doesn't allow the interactive moving of nodes and/or changing connectivity - I'm not sure about the new one. However, the main reason to move nodes is to compensate for the sensitivity mentioned above - a better solution would be for the model to have a more robust boundary condition.
The new parallel F95 code was produced by Goeff Cowles, who did a lot of testing of the parallel code on ARSC's IBM p690 server, iceflyer. His speedup numbers are:
cpu speedup 1 1.0 2 2.02 4 3.94 6 6.05 8 7.81 10 9.5 12 10.56 14 13.31
This is for a Gulf of Maine grid with 60,000 triangles and 31 vertical levels.
To Dump or Not Dump Core Files on the X1
When a parallel job crashes on the X1, each parallel process will dump its own core file. For a large job, this is can be very expensive operation, as all of the processors are writing potentially very large files to the same file system, over the same I/O network, simultaneously. It can easily put your account over quota, as well.
Unless you are planning poet-mortem debugging of the core files, or your code never, or rarely crashes, we strongly encourage you to add the option, "-c core=0", to your aprun or mpirun command. This will prevent the core files from being written.
If you need a call-stack traceback (the usual reason for saving core files), set the environment variable "TRACEBK" to 16 (for instance) in your PBS script. The traceback will go to the stderr file.
The TRACEBK setting and mpirun command will look something like this:
csh users: setenv TRACEBK 16 mpirun -np 10 -c core=0 ./a.out ksh users: export TRACEBK=16 mpirun -np 10 -c core=0 ./a.out
[ Thanks to Don Bahls of ARSC for this. ]
Recently the HPCMPO required ARSC and other centers to reduce kerberos ticket lifetime. This change has especially affected the way people move data to and from ARSC systems. Many people had become accustomed to using the kerberized ftp client (kftp). Unfortunately, kftp checks for a valid ticket each time a file is transferred, so operations such as mget or mput can potentially fail in the middle of a multi-file transfer because the current kerberos ticket expires. What used to be a trivial operation with kftp can now be a major headache. The message kftp gives when the ticket expires is something like this:
ftp> mget * mget nr? y 227 Entering Passive Mode (199,165,85,231,151,219) 150 Opening ASCII mode data connection for nr (924411457 bytes). GSSAPI error major: A token was invalid GSSAPI error minor: Token header is malformed or corrupt GSSAPI error: failed unsealing reply 934916817 bytes received in 3.4e+02 seconds (2.7e+03 Kbytes/s)Fortunately, there is an alternative to kftp which acts a little nicer when transferring large numbers of files. krcp is available with the HPCMPO kerberos kit (for Linux, UNIX, and Macs). krcp works similarly to the standard cp command, but allows for a remote host to be specified. For example:
don's computer> krcp "iceberg.arsc.edu:/u1/uaf/don/myfile" .This command copies the file "myfile" from iceberg to working directory on the local host. Better yet, krcp allows for recursive copying and wildcarding. For example:
don's computer> krcp -r "iceberg.arsc.edu:~/mydirectory/" . don's computer> krcp "iceberg.arsc.edu:/wrkdir/don/mydirectory/ABC*" mydirectoryAs you might expect, a remote computer can be either the source or the destination, but not both.
A valid kerberos ticket is still required at the start of the krcp command so you may still need to get a ticket using 'kinit'. However, wildcarding and recursive copying make krcp a powerful alternate to kftp, which has become even more useful with shortened ticket lives.
Quick-Tip Q & A
A:[[ How can I determine which shell I'm using? Nothing seems to work! [[ Nothing! # # Three people, four different answers. Amazing how that works... # # Thanks first to Jim Long... # ps -p $$ "$$" is the pid of the shell you're using, and "ps -p" gives info about a process, so "ps -p $$" will give you a field with the command associated with the pid. E.g.: % ps -p $$ PID TT STAT TIME COMMAND 3384 std S 0:00.11 -tcsh Even more specific, "ps -p $$ -o command" # # And thanks to Grep Newby: # This works from any shell: echo $0 $0 (dollar-zero) is a Unix/Linux shorthand for the command used for the current process. Sometimes there's a little variety in the output format (for example, one system might show "/bin/tcsh" while another shows "-tcsh"), but the output should make it obvious which shell you're using. Alternate: If you'd just like to know what your default shell is on a particular Unix/Linux system, you can often find it in the /etc/passwd file (some systems do not use /etc/passwd for all users, however). Try: grep USER /etc/passwd substituting your username for USER. The last field of the output is your default login shell. You can also try "finger USER" from the command line for slightly more human-readable output. # # And from Brad Chamberlain # I use: echo $shell which works for the 2-3 shells I use most (tcsh, csh, bash), but I'm not sure how portable it is to all shells... Q: I've gotten some Fortran code that's difficult to read. It could stand consistent indentation, capitalization, etc... Can someone recommend a "pretty-printer" for Fortran? Preferably one that can handle both free- and fixed-form.
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.