ARSC HPC Users' Newsletter 403, May 8, 2009
ARSC Summer Visitors Arriving in June
Summer 2009 is shaping up to have a rich array of visitors and projects. Over ten weeks, ARSC will host twelve interns, four computer engineering graduate students, two Cadets from the US Military Academy at West Point, and a visiting faculty member. These visitors bring diverse skills and interests, and will work on projects with ARSC staff members, as well as with UAF faculty.
Major research themes at ARSC include climate (encompassing weather, oceanography, sea ice, and other high-latitudes phenomena), and next-generation computing systems and technologies. Both themes will get major attention from visitors, as will topics such as geographic information systems, computational chemistry, and computing portals.
Visitors will utilize ARSC's supercomputers and other systems, and will receive the training and mentoring needed to be successful.
More Cachegrind Feedback
Here are more hot memory analysis tips from Newsletter readers. Jed Brown recommends the tool KCachegrind for visualizing the output of Cachegrind and other profiling tools. It is installed as part of the K Desktop Environment (KDE) as of KDE version 3.2. Jed also likes Oprofile, a system-wide Linux profiler, because of its low overhead. So many tools, so little time.... It looks like we've got some more work to do here at the HPC Newsletter.
We also got a great pointer from reader Martin Lüthi. At LWN.net is a massive article titled "What every programmer should know about memory" ( http://lwn.net/Articles/250967/ ). It's a nine-part treatise on, well, memory, covering everything from schematic diagrams and timing charts to performance tools, such as Cachegrind and Oprofile.
Many thanks to Jed and Martin for their helpful contributions!
Raw Binary Data Analysis with Paraview
Raw binary data is easy to produce but can be difficult to visualize. There are methods and tools that are very capable of visualizing and analyzing binary data, and in this article I will focus on one specific tool: Paraview. Paraview is a versatile and powerful visualization and analysis tool developed by Kitware and is available freely as open-source from www.paraview.org . Paraview is also installed on all ARSC workstations. It is accessible by typing "module load paraview" to load the module file for the latest installed version, and then typing "paraview" to launch Paraview.
Paraview is based on the VTK library to provide its engine for rendering data to the screen. The basic idea behind VTK is to organize the rendering into a 'pipeline' where data enters at one end, goes through several transformations applied in order, and is then rendered in its final form. Because Paraview is built on VTK, the majority of its data-handling is based around reading and writing VTK or VTK-derived file formats.
Paraview is quite capable of reading raw binary data. For simple flat binary files (such as a single frame) the process is rather intuitive. However, there are a few tricks that can make reading 3D binary data, and time-stepped data much easier.
First, let's discuss the loading of binary data. Open the binary data file. If it has the file extension of ".raw" then Paraview will treat it as a raw binary file. If the data file does not have the ".raw" extension, you will be presented with a large menu of data readers to choose from. You can press 'R' to skip down to the raw (binary) data reader.
The Object Inspector Properties pane will now contain the fields necessary to describe the incoming binary data. Let's decipher each one in turn. The File Prefix field contains the path to the file in most cases. It works with the File Pattern field to point Paraview to the correct file. The File Pattern field uses a syntax much like C's printf function. "%s" will be replaced with the text from the File Prefix field. Other operations such as "%d", to represent a digit, and plain text can be added to the File Pattern field. For most purposes, these two fields can be left alone.
Data Scalar Type tells Paraview what data type the data is (float, char, unsigned int, etc.). Data Byte Order tells Paraview whether the data is big endian or little endian. File Dimensionality defines whether or not the data is 2D or 3D. Having the file dimensionality wrong will not necessarily cause a failure to load the data, but the wrong dimensionality can cause problems if you need to handle slices or multiple files.
Data Origin defines the position offset when rendering the data. The three fields in order define the X, Y and Z offset. Data Spacing defines the size of each grid point, in the X, Y, and Z direction.
Data Extent defines the size of the data in terms of the number of values along each axis. In order, they are Xmin, Xmax, Ymin, Ymax, Zmin, Zmax. The Zmin and Zmax fields are used when dealing with 2D data that has multiple frames or 3D data that has been separated into slices.
Scalar Array Name allows you to define the name that is attached to the data array that is read in. Labeling the data array is helpful when dealing with multiple data files, so as to be able to keep track of which field is which.
Finally, the File Lower Left check box allows you to tell Paraview that the data file starts at the lower left corner of the image as opposed to the default upper right corner.
Fill in or ensure that each of these fields are set appropriately for the binary data file, and hit "Apply" to load the data!
To handle 2D time-stepped data, an extra step needs to be taken. Paraview does not recognize a 3rd dimension as a time-dimension, so animating along the 3rd dimension will require some work outside of Paraview. First, split the binary file into separate files so that each file consists of a single frame. It would be a good idea to label them along the lines of filename.123 where "123" is the number of the frame.
In Paraview, load one of the frame files (the first is fine, but any will do), and navigate to the Object Inspector, "Properties" pane. The "File Prefix" field contains the path and name to the data file. Change it from "/path/to/filename.123" to "/path/to/filename". Note that the only thing removed is the number at the end of the file prefix. In the "File Pattern" field, change "%s" to "%s.%d". The "%s" will be replaced with the file prefix and the "%d" will be replaced with a digit that will correspond to the frame number.
Now the frames can be animated by controlling the Zmin and Zmax fields of the Data Extent fields. Open the Animation Inspector panel and set up two keyframe tracks for Data Extent(4) and Data Extent(5) . Keyframes are part of the animation controls of Paraview. They are the points in an animation that have concrete values associated with them, while regular frames have interpolated values between each keyframe. The Data Extent(4) and Data Extent(5) fields need to be kept synced in order to properly step through the frames. Set up your keyframes to animate your data as you see fit!
Book Review: The Principles of Successful Freelancing
Because of the sagging economy, many computer engineers must be considering striking out on their own. I recommend Miles Burke's book for them. It discusses many aspects of working as a consultant, a "freelancer" in Australian English, I gather. He has one chapter suggesting how to start working as a consultant, which may be out of date for you. It assumes you have a job you're considering leaving. The departure choice is being forced on many of us.
After the chapters discussing the considerations and preparation for going into business by yourself, Burke covers all (so far as I could see) of the areas where many engineers are weak:
- Managing your money, both for yourself and for the tax man
- Setting up an ergonomic office
- Managing to have a life (Don't you need to get one, even with a job?)
- Selling, the weakest skill area for many engineers
- How to keep your customers, your career, and your family
When he discusses ergonomics, he points out the importance of moving around. For some reason, perhaps it different down under, he misses the most obvious solution for me, and most software and web developers. Because we live on cola and pizza, we drink lots. Hence we're forced to get up often. I would simply recommend more coffee, Coke or Pepsi so you are forced to get up from your computer often.
Not only do engineers typically lack sales skills, we often are arrogant, thinking "we can do it," and "why should sales people be the only ones to receive commissions?" As Burke points out, good salesmen are good listeners, as they all know but many engineers don't. A potential consultant may be asked for a presentation. The best sales tactic is usually a short presentation and a long time listening to the client's problem. No listening, no understanding of what they think they need and probably no sale.
Burke stresses the importance of integrity and maintaining a good reputation—be proactive, give great service, communicate clearly, and so on. One argument he missed is the effort required to undo a slip. Seems to me I've heard that it is 100, or maybe it was 1000, times as hard to recover a good reputation after a minor sin, than to avoid the sin in the first place. Most customers assume honesty, for example, but one lie and they'll remember that transgression and avoid future business with the liar.
The only criticism I have of the book is that he follows two case studies, describing how Emily and Jacob deal with the problems of the chapter. I would like to have seen more discussion of the merits and shortcomings of the approaches they use. The "case studies" were too sterile and rah-rah for me.
Quick-Tip Q & A
A:[[ My code just crashed and generated over a hundred core files. The [[ log files don't have anything meaningful, so I have no idea which [[ task(s) were having problems. How can I get a stack trace from each [[ of these core files in an automated fashion? I'm getting really [[ tired of running gdb on one core file at a time! # # Don Bahls combined shell mastery with gdb cleverness to produce stack # traces labeled by their core file name: # # bash syntax for f in core.*; do echo "core file= " $f gdb ./a.out $f <<< "where" echo; echo "==================================" done Since the core file name format is set to core.<host>.<executable>.<pid> on pingo you can make an educated guess as to which mpi task generate each core file as long as you aren't reordering tasks with MPICH_RANK_REORDER_METHOD. Here's an improvement with an iterator: # bash syntax ii=0; for f in core.nid*; do echo "core file= " $f "task ($ii)" ii=$(( ii + 1 )) gdb ./a.out $f <<< "where" echo; echo "==================================" done Q: Is there a way to get the processor count for a PBS job on the Cray XT5? On midnight I use something like this to get the processor count for my PBS job: NP=$( cat $PBS_NODEFILE wc -l ) mpirun -np $NP ./a.out When I tried this on pingo, $NP is always set to 1. Why is that? Is there anyway to get the value of "mppwidth" from my PBS script so I can use that value with aprun? e.g. NP=??? aprun -n $NP ./a.out Currently I'm creating a different script for each processor count (i.e. mppwidth value) and it's driving me crazy!
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.