ARSC HPC Users' Newsletter Issue 434 2014-07-16
Table of Contents
- 1. Guest Lecture: Julia For High Performance Computing
- 2. Registration Opens For SC14
- 3. Transfer Queue Now Available On Fish
- 4. ROMS: More Debugging Fun
- 5. Networking Connectivity Problems: Information Gathering
- 6. More Information
A publication of the Arctic Region Supercomputing Center.
1 Guest Lecture: Julia For High Performance Computing
ARSC will be hosting a guest lecture with Alan Edelman on August 7, from 1pm to 2pm in the GI Globe Room. Edelman is Professor of Applied Mathematics with the MIT Computer Science and Artificial Intelligence Laboratories. Edelman will be discussing the Julia programming language for High Performance Computing.
Julia is a high-level, high performance dynamic programming language for technical computing. It provides a sophisticated compiler, distributed parallel execution, numerical accuracy, and an extensive mathematical function library. The standard library also integrates mature, best of breed, C and Fortran libraries for linear algebra, random number generation, signal processing, and regular expression matching.
Registration is not required for this event.
2 Registration Opens For SC14
3 Transfer Queue Now Available On Fish
A "transfer" queue has now been made available on our Cray XK6m-200
supercomputer, fish.arsc.edu. The transfer queue allows for automated
pre- and post-processing to and from the
$ARCHIVE file system. Use
of this transfer queue is similar to use of the transfer queue on the
4 ROMS: More Debugging Fun
Katherine Hedstrom, Oceanographer, UAF Institute of Marine Science
(The following article is also available from the ROMS/TOMS Developers blog.)
This time the question came up about perfect restarts. In ROMS, the
PERFECT_RESTART promises that a run from beginning to end all
in one go should match one made with a restart in the middle. With
this option, more fields need to be written to the restart field, but
were we saving enough? I had done this exercise before for the sea ice
model, but not with
WET_DRY as well.
The way to test for perfect restarts is to run for some number of time-steps X, save a restart file, run for one more step and save a history file. Then in another directory, copy the first restart file and use it to start at time-step X and run for one step, saving a history file. Now compare the history files with ncdiff and ncview.
What value of X to use? I didn’t try X=1 because ROMS doesn’t get into its default time-step until time-step 3. I therefore started with X=3 and found lots of differences between the two history files.
Time to go to a debugger - actually, dueling debuggers. One debugger starts from time 0, the other from step X. To do this, you don’t want to try a large X because then you’d be running all those steps in the debugger for the first debugger window. I found that for X=3, the u and v fields matched perfectly at step X, but the value of nrhs differed. nrhs is used in the computation of W, the vertical velocity. With the vertical velocities mismatched, everything evolved differently.
Rather than fix this, I decided to try again with an even value for X,
say X=2. After all, I usually save after an even number of time-steps
and I just wanted to see if something else was off. Indeed, the
differences were more localized, but there still were differences,
especially along the shallow parts where
WET_DRY would be involved.
After much digging around, I found two things going on. The first is that the wet-dry masks are computed during initialization and also in calls to wetdry.F from step2d. One needs to make sure that on restart, one either needs to have saved all those masks or to recompute them consistently.
The other source of trouble comes from how ROMS writes out fields to
the NetCDF files. Most fields are written with masked areas getting
WET_DRY, many are being saved with
masks out the dry cells. This can cause a loss of information of the
state of the dry cells. For instance, tracer values in the dry cells
are time-stepped in ROMS, including advection terms if flow happens to
be entering a dry cell. If these dry cell tracers are read from masked
out values in the restart file, they will be set to zero.
In the end, I didn’t have to save any more fields, but simply to update the mask initialization and to change the masks used in the saving of restart fields. I also found that I had a lingering ice restart bug, also fixed by the writing of restart fields. You might find it to be interesting, so I’ll try to describe it:
- The ice is time-stepped before the call to output because it contributes to the computation of surface fluxes.
PERFECT_RESTART, I wrote a routine to overwrite surface fluxes from the saved ones in the restart file, skipping the call to
seaiceon the first restart step.
- I found a mismatch at one lone point next to a land mask (not near
- The mismatch stemmed from the computation of the vertical mixing
GLS_MIXING) near the surface.
- Near-surface mixing gets a contribution from the surface stresses.
- The stresses are averaged from velocity points to rho points.
- On restart, the stresses were set to zero for the velocity points at the land-sea mask boundary.
- Turning off the masking while writing the stresses fixed this restart issue.
5 Networking Connectivity Problems: Information Gathering
The Internet is a dangerous place, if you are one of the trillions of tiny data packets racing across cables, switches and routers. Between complicated routing paths, discriminating firewalls, and aging network equipment, it is a real jungle out there. Beyond this, stories are told among network administrators of a race of tiny, mischievous creatures known as "server gremlins", who sneak about in the darkness, unplugging network cables and tweaking router configurations without proper documentation.
Consequently, we at the ARSC Help Desk are not surprised to occasionally receive support requests from users who are unable to access to a remote network resource, such as a Web page or a file server. Typically, the user is working from an ARSC system and attempting to access an external resource (such as the Web page for another campus department) or is at her personal computer and attempting to access an ARSC resource.
We are eager to help, but often users do not come prepared with the information we need to track down the problem. It can be difficult for us to gather this information on our own, because the networks involved can be quite complicated, and many of network devices are not under the direct control of our department.
If you experience difficulty connecting to a network resource, you can speed up resolution of your support request by doing some preliminary information gathering on the computer from which you are connecting. To do this, you will need to open up a terminal, enter some commands, and copy the commands and the output they produce into an e-mail to the ARSC Help Desk.
For sake of brevity, we will assume you are either working from a Red Hat 6 Linux system, an OS X system, or a Windows 7 system.
For Red Hat 6 systems with the Gnome desktop, a Terminal application
is located in the Applications >> System Tools menu. For Mac OS X,
look in the Finder for Applications >> Utilities >> Terminal. For
Windows 7, select the Start Menu and enter
cmd in the search
box. Windows 7 users will need to learn how to copy text out of the
terminal, which is described in a howtogeek.com article.
5.2 Identifying Your System
These commands provide basic information your computer's network location and identification.
Red Hat 6:
ip addr hostname
5.3 Supporting Network Systems
Here we learn about your network name servers and gateway, which are you computer's "sign posts", as it were, to the outside world.
Red Hat 6:
ip route cat /etc/resolv.conf
netstat -nr networksetup -getdnsservers Ethernet networksetup -getdnsservers Wi-Fi
5.4 Following The Network Trail
These commands can give us some notion of the network path between
your computer and the remote resource. Replace
www.example.com with the
hostname of the network resource you are trying to access. For
example, the hostname of
www.linux.com/news would be
Red Hat 6 or OS X:
ping -c 4 www.example.com ping6 -c 4 www.example.com traceroute www.example.com traceroute6 www.example.com
ping www.example.com tracert www.example.com
5.5 Can You Connect From Another Location?
If you move to a computer on another network, are you able to connect to the network resource? For example, if you cannot view an ARSC Web page from your office PC, try viewing it from your PC at home. This will help us narrow down the location of the network problem.
If it is inconvenient to wait until you leave the office, online programs are available allowing you to connect to a resource through a public Internet server. For Web pages, use a proxy browser:
For other types of resources, you can
through the Web portal ping.eu.
5.6 Who Runs The Network?
Which organization or department owns your computer network, or the network you are trying to connect to (outside of ARSC)? Does it have its own IT department or call center? Armed with this knowledge, we can work with network administrators on the other end of the connection, to diagnose and resolve your networking problem as quickly as possible.
6 More Information
Christopher Howard mailto:email@example.com
Oralee Nudson, ARSC Lead User Consultant. Reviewer and insider source for ARSC news and tips.
6.3 Publication Schedule
The newsletter is usually released on the third Wednesday of each month.
6.4 Subscription Information
6.5 Archived Newsletters
6.6 Questions, Comments, And Submissions
Do you want to find out what our readers know about a particular subject? Submit a question about HPC or ARSC software, and we will feature it in a Q&A section in the newsletter.