ARSC HPC Users' Newsletter 243, April 12, 2002
Etnus Totalview 5.0 Installed on Icehawk
ARSC has just installed the Etnus Totalview 5.0 parallel debugger on its IBM SP Cluster, icehawk.
To use it, add this:
/usr/local/adm/pkg/flexlm/license.datto the settings of your LM_LICENSE_FILE environment variable. Recompile with the -g compiler option, and launch totalview against the resulting executable. E.g.,
icehawk1$ totalview ./a.outTotalview documentation is built in. Click on "Help".
(When the bugs have been exterminated and you start production runs, be sure to recompile again WITHOUT the -g . -g disables optimizations and kills performance.)
Upcoming Visitors, Talks, and Events
ARSC is hosting several visitors in the next couple of weeks to participate in evaluation of large-scale storage systems. They include Robert Bell (Bureau of Meteorology / CSIRO), David McGee (NAVO DSRC), and Roy Campbell (ERDC DSRC).
The visitors will all be giving presentations open to the wider UAF community.
Title: "Our NEC of the Woods: Oz, CSIRO, HPCCC and NEC"Abstract:
Date: Thursday 25th April 2002
Speaker: Dr. Robert BellBiographical Note:
After a brief introduction to Australia and CSIRO, which is the largest research agency in Australia, this talk will give some background on the computing history of CSIRO. This will lead on to the formation of the Bureau of Meteorology / CSIRO High Performance Computing and Communications Centre (HPCCC).
The HPCCC acquired its first system, an NEC SX-4 in August 1997, and now has a core of two NEC SX-5 systems, and associated storage and front-end systems.
The talk will describe the HPCCC core and associated systems, and some of the applications which use the HPCCC facilities.
Dr. Robert Bell is Deputy Manager of the HPCCC. He has worked for CSIRO since 1974, firstly in numerical modeling for atmospheric research, and since 1987 in the management of computing facilities for scientific research.
He is Asia/Pacific Representative on the Cray User Group Board of Directors.
As soon as it's available, the full schedule of talks will be posted on the "Hot Topics" section of, http://www.arsc.edu.
ARSC Faculty Camp:Expression of interest is required by May 1st to participate in ARSC's 2002 "Faculty Camp." See: http://www.arsc.edu/pubs/bulletins/FacultyCamp2002.shtmlWOMPAT 2002 at ARSC/UAF, Aug 5-7:Check our "vacancies" page, http://www.arsc.edu/misc/jobs.html . Within a week or so, we expect to announce summer/fall openings for student positions.ARSC Summer Tours:ARSC welcomes tourists and other drop-in visitors at 1:00 pm every Wednesday afternoon, June 12 - Aug 28, for a one-hour tour. Just show up at the ARSC machine-room viewing window in the basement of the Butrovich Building on the UAF West-Ridge.
SV1 Craylib Problem Isolated
In issue #224, we noted this:
> Unresolved issues in two SV1 user codes were cleared up recently when > the users switched back from the default craylibs version to craylibs > 188.8.131.52. It is suspected that this is an issue with the FFT routines, > but investigation is ongoing. > > If you feel the need to try this, you should use the command: > > module switch craylib.184.108.40.206
Through a lot of difficult trouble-shooting on one of the two user codes, Tom Logan of ARSC was able to narrow the problem down to the LAPACK routine, CTRSM. CTRSM is a low-level routine. The code accesses CTRSM through a call to the LAPACK routine, CHEGST. The problem manifested itself by a failure of the algorithm to converge when run on 1-CPU, but correct and repeatable results when run on multiple CPUs.
It turns out that Cray already had a problem report open on CTRSM, a fix is in testing and will be integrated in a future release of craylib. From the Cray SPR, a more precise statement of the problem: "For the argument N > 64 and odd, the libsci_sv1 version of CTRSM gives wrong answers."
For now, you can download the netlib LAPACK fortran source of ctrsm.f, and add it into your compilation. ARSC users can contact firstname.lastname@example.org for this fortran file. Linking your own ctrsm.o file will preempt the libsci routine of the same name and thus allow you to safely use everything else from the latest installation of craylib (which, on chilkoot, is release 220.127.116.11).
As usual, please report problems and mysteries to us (email@example.com).
UAF Computational Physics Programs
The Sloan Foundation maintains a web page listings for a number of Physics and Chemistry Masters programs:
MS Program in Computational Physics. University of Alaska at Fairbanks.
A new professional masters degree is offered by the Physics Department for students with undergraduate backgrounds in physics or a closely related discipline.
The degree is appropriate for students seeking careers in industry, government, and research that require expertise in modeling and simulation of physical systems. Many department faculty have joint appointments with the Geophysical Institute and International Arctic Research Center and provide a range of interesting computational research projects in the fields of space physics, atmospheric physics, complex system dynamics and turbulence, data analysis techniques, ice-mechanics, and ice-ocean dynamics.
Local access to advanced high-performance computational resources are provided by ready student access to the Arctic Region Supercomputer Center. As well as courses in physics, mathematics, and numerical methods, additional special topics courses such as parallel processing techniques are offered.Contact: Brenton Watkins, Professor of Physics E-mail: firstname.lastname@example.org Web: http://www.uaf.edu/physics/
Quick-Tip Q & A
A:[[ SV1 Totalview question from last issue ... How to view entire [[ automatic arrays, like: [[ COMPLEX Z( LDZ, * ) # # Thanks to Ed Anderson: # To view the full array, dive on one of the variables, such as A. The window shows "Type: COMPLEX(147,1)". Click on COMPLEX(147,1) (or go to the Edit->Type menu), and change the type to COMPLEX(147,147). The data object window should update automatically. You might find it easier to view the array with the menu option Display->Array Browser. Unfortunately, the debugger doesn't remember this info when you close the data object window. # # Editor's Note: # # This works on the T3E, fails on the SV1. It looks like a problem # with the SV1 totalview, but is under investigation. # A:[[ I've been connecting remotely to an SGI Octane2. I use the DISPLAY [[ environment variable to export the X Windows display back to my [[ personal workstation. [[ [[ For some reason, when I sit down at this SGI and log onto the [[ console, the screen flashes, and it immediately logs me off. I'm [[ definitely not over-quota, my account is active, and everything works [[ perfectly when I connect remotely again. [[ [[ Any ideas what's up? # # Thanks to Richard Griswold: # I had a similar problem when my home directory was shared between Linux and AIX. Something about the .Xauthority file was different between the two OSes, so when I logged in on the console of one system, I had to delete the file before I could could log in on the console of the other system and run X apps. You could check for this file and try deleting it. # # Editor's answer: # The specific incident hit an ARSC user because he had put this: setenv DISPLAY <HIS_PERSONAL_WORKSTATION> in his SGI .cshrc file. When logging directly onto the SGI at the console, the IRIX window manager objected to its display being sent elsewhere, and bailed. Q: Am I going nuts? $ ls -d DATA DATA $ ls DATA D2001 index.txt $ cd DATA sh-56 ksh: DATA: not found. Why is it doing this to me?
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.