ARSC HPC Users' Newsletter 297, August 13, 2004
Announcing ARSC Fall 2004 Science Seminars
You are invited to attend ARSC's Fall Science Seminars:
September 8-10, 2004 Arctic Region Supercomputing Center University of Alaska Fairbanks Fairbanks, AlaskaWednesday September 8th:
- James O'Dell, Chief Scientist, ARSC
"A Perspective on Ocean Modeling and Sea Ice Modeling and the Importance of Supercomputer Power to Their Evolution"
- Albert Semtner, Professor, Naval Postgraduate School
"Development of the Earth System Modeling Framework"
- Nancy Collins, Software Engineer, National Center for Atmospheric Research
"Use of Computers for Paleoclimate Research"
- John Kutzbach, Professor, University of Wisconsin
- Buck Sharpton, President's Professor, University of Alaska Fairbanks
"Little Quarks and Big Computers"
- Gerry Guralnik, Professor, Brown University
"On Molecular Sequence Alignment: Some Reasons for Doing It Well"
- Tom Marr, President's Professor, University of Alaska Fairbanks
A final schedule will be released later. Check at: /arsc/support/news/hpcnews/hpcnews296/index.xml
Lectures by Visiting Mathematicians
Jerome Percus and Ora Percus of The Courant Institute of Mathematical Sciences and Department of Physics, New York University, will present the following series of lectures. All four talks will be held in the Elvey Auditorium, in the UAF Geophysical Institute:August 24, 11:00 am
Fluids Under Tight Confinement
Jerome K. Percus
A preliminary study is made of the new phenomenology encountered when classical fluids are confined to enclosures that are of the order of the particle size in all but one spatial dimension. Self-diffusion is taken as the indicator of this phenomenology. The anomalous diffusion occurring in strictly one-dimensional flow is first reviewed, and then its extension to the single-file regime in which particles cannot pass each other. When the system enters the parametric regime in which particle exchange is first possible, a rapid transition to the characteristics of normal diffusion takes place, which is organized by the concept of "hopping time".August 24, 3:00 pm
Can Two Wrongs Make a Right? Coin Tossing Games and Parrondo's Paradox.
A number of natural and man-made activities can be cast in the form of various one-person games, and many of these appear as sequences of transitions without memory, or Markov chains. It has been observed, initially with surprise, that losing "games" can often be combined by selection, or even randomly, to result in winning games. Here, we present the analysis of such questions in concise mathematical form (exemplified by one nearly trivial case and one which has received a fair amount of prior study), showing that two wrongs can indeed make a right -- but also that two rights can make a wrong!August 25, 11:00 am
Piecewise Homogeneous Random Walk with a Moving Boundary
Ora E. Percus
We study a random walk with nearest neighbor transitions on a one-dimensional lattice. The walk starts at the origin, as does a dividing line which moves with constant speed gamma, but the outward transition probabilities p_A and p_B differ on the right- and left- hand sides of the dividing line. This problem is solved formally by taking advantage of the analytical properties in the complex plane of an added variable generating function, and it is found that (p_A, p_B) space decomposes into four regions of distinct qualitative properties. The asymptotic probability of the walk being to the right of the moving boundary is obtained explicitly in three of the four regions. However, analysis in the fourth region is a sensitive function of the denominator of the rational fraction gamma, and encounters some surprises. Applications of random walk problems to sequential clinical trials will be mentioned.August 25, 3:00 pm
Small Population Effects in Stochastic Population Dynamics
Jerome K. Percus
We focus on several biologically relevant situations in which small populations play a significant qualitative role, and take some first steps to incorporate such situations in the continuous dynamics format that has been so elegantly developed in the past. We first describe a small number of model systems in which the influence of small populations is evident. The we analyze in detail a toy model, exactly solvable, that suggests a path towards the attainment of our goal, and follow this by a formal vehicle for doing so. Application to model systems, and comparison with numerical solutions, indicates the potential utility of this approach.
Tips for Better I/O Performance on the Cray X1: Part II
[ Thanks to U.S. Naval Academy Midshipman Nathan Brasher for this report on his work at ARSC this summer. See: Part I of this two-part series. ]
As described in Part I of this study, I collected I/O bandwidth data on the X1 by writing files, 10MB in the formatted cases and 100MB files in the unformatted cases, and recording the elapsed time per write. I varied several parameters (all of which are under the user's control) to determine their effect on I/O performance. These parameters and the means by which an X1 user can vary them are shown here:
I/O Parameter Method of setting ================= ====================================== Access method Fortran open statement File format Fortran open statement Buffer size Unicos/mp assign command (assign -b) Chunk size Fortran source code logic Blocking scheme Unicos/mp assign command (assign -s)
In graphing the results, I only show the maximum speed obtained for each test over several runs. The maximum was chosen because it was felt that this best represented the potential for high bandwidth. Also by taking the maximum bandwidth and discarding other results I was able to ignore abnormally slow times resulting from sharing disk access with other users.
The four graphs all show I/O bandwidth as a function of buffer size, and have five curves plotted, one for each of five different chunk sizes. Each graph shows results for one combination of access method and file format:Sequential, Formatted:
I found three primary contributors to high I/O performance:
- unformatted files,
- arge buffer sizes,
- few large write statements as opposed to many small writes.
The first effect is particularly dramatic. Writing unformatted files is typically 100-200 times faster than writing their formatted equivalents.
When the computer encounters a formatted write statement, it must interpret the format before outputting results. As you can see by comparing the graphs for unformatted I/O to their counterparts, factors like buffer size and chunk size do not have quite the same effect on formatted I/O as they do on unformatted I/O. This is because the overhead created by a format statement overwhelms the other factors in terms of significance. The results is that formatted output is slow regardless of the circumstances. To achieve top performance, only use formatted I/O if you absolutely have to.
Direct access when combined with unformatted data is even faster. I have been unable to figure out exactly why direct unformatted produced the best results, but it was consistently faster than the other formats, often as much as twice as fast as sequential unformatted.
I tried changing the record access order, thinking that this speed up was a results of the somewhat artificial conditions of our test (direct access records were, in fact, accessed in sequential order). However, direct unformatted ran at the same speed regardless of the order in which the records were written. Thus, if you are looking for every possible tweak to cut down on I/O access time on klondike, try direct access, unformatted files.
Buffer size is another important consideration. As you can see from all four graphs, larger buffer sizes always result in better performance. This was anticipated from the knowledge that memory is much faster than disk access and by storing results in a large buffer one can cut down on the frequency of disk accesses. It is worth noting as well that the Cray default buffer size of 64 KB works well in most circumstances.
It is also interesting that while large buffers can partially compensate for small chunk sizes, they do not completely solve the problem (compare the curves for "whole array" and "10,000 elem chunks" on the two graphs of unformatted I/O). The data also show that more, smaller chunks always degrade performance.
I have chosen to describe, rather than graph, the effects of the final parameter in this study, the Fortran file blocking scheme. The reason is that, in all tests, the different blocking schemes, f77, f90, COS, and unblocked file I/O all performed within 3% or so of each other. If you need to specify a blocking scheme for portability, then go ahead and do it, otherwise using the Cray defaults works fine.
In conclusion, mostly what this study did is to confirm what we already suspected about the slowness of mechanical disk drives. Few large writes will run faster than many small writes.
If possible, store your data in large arrays and output it all at once using unformatted I/O. If you absolutely cannot use large arrays and write statements, at least allocate a large buffer size in order to compensate.
I hope that the production of this hard data on I/O performance will aid the users of klondike to improve their computational performance.
--PROGRAM CODE --
program test_prog implicit none include 'mpif.h' real (kind=4), dimension(:), allocatable :: array real, dimension(10) :: writetime integer :: nelts, chunksz, ntests, i, rep, j, ierr real :: start, end ntests= 20 nelts = 250000 chunksz = 250000 open(unit=1, status='replace', form='formatted', file='results') write(unit=1,fmt='(I4)') ntests allocate(array(nelts)) do i=1, nelts array(i) = mod(i,10) / 10.0 end do call mpi_init(ierr) do i=1,ntests open(unit = 11, status='replace', form = 'unformatted', & access = 'sequential', file = 'outfile1') start = mpi_wtime() do rep = 1, 100 do j=1, nelts, chunksz write(unit = 11) array(j:min(j+chunksz-1,nelts)) end do end do end = mpi_wtime() writetime(i) = end-start close(unit = 11) end do write(unit = 1, fmt='(f10.6)') writetime print *, 'SEQUENTIAL UNFORMATTED' print *, 'Fastest Time : ',minval(writetime),' sec' print *, 'Slowest Time : ',maxval(writetime),' sec' print *, 'Average Time : ',sum(writetime)/ntests,' sec' do i=1,ntests open(unit = 11, status='replace', form = 'formatted', & access = 'sequential', file = 'outfile2') start = mpi_wtime() do rep = 1, 10 do j=1, nelts, chunksz write(unit = 11, fmt='(100f4.1)') & array(j:min(j+chunksz-1,nelts)) end do end do end = mpi_wtime() writetime(i) = end-start close(unit = 11) end do write(unit = 1, fmt='(f10.6)') writetime print *, 'SEQUENTIAL FORMATTED' print *, 'Fastest Time : ',minval(writetime),' sec' print *, 'Slowest Time : ',maxval(writetime),' sec' print *, 'Average Time : ',sum(writetime)/ntests,' sec' do i=1,ntests open(unit = 11, status = 'replace', form = 'unformatted', & access='direct', recl=4*chunksz, file = 'outfile3') start = mpi_wtime() do rep = 1, 100 do j=1,nelts/chunksz write(unit = 11, rec=nelts*(rep-1)/chunksz+j) & array(j:min(j+chunksz-1,nelts)) end do end do end = mpi_wtime() writetime(i) = end-start close(unit = 11) end do write(unit = 1, fmt='(f10.6)') writetime print *, 'DIRECT UNFORMATTED' print *, 'Fastest Time : ',minval(writetime),' sec' print *, 'Slowest Time : ',maxval(writetime),' sec' print *, 'Average Time : ',sum(writetime)/ntests,' sec' do i=1,ntests open(unit = 11, status = 'replace', form = 'formatted', & access='direct', recl=4*chunksz, file = 'outfile4') start = mpi_wtime() do rep = 1,10 do j=1,nelts/chunksz write(unit = 11, rec=nelts*(rep-1)/chunksz+j, & fmt='(250000f4.1)') array(j:min(j+chunksz-1,nelts)) end do end do end = mpi_wtime() writetime(i) = end-start close(unit = 11) end do write(unit = 1, fmt='(f10.6)') writetime print *, 'DIRECT FORMATTED' print *, 'Fastest Time : ',minval(writetime),' sec' print *, 'Slowest Time : ',maxval(writetime),' sec' print *, 'Average Time : ',sum(writetime)/ntests,' sec' call mpi_finalize(ierr) end program
Scripted Chaining of Batch Jobs and File Checks
[ Many thanks to Kate Hedstrom of ARSC, for yet another article! ]
In the new ARSC storage environment, files are now purged on the work areas. I'll be describing one way to deal with this on the IBM in your loadleveler job script. Something similar should work on the Crays.
The new mode of working might involve copying files from $ARCHIVE to the work area, for instance configurations files for a model. This can be done by hand before the job is submitted, or it can happen in a batch script. However, it can't happen in the main batch script since the compute nodes can't see $ARCHIVE. The purpose of the "data" loadleveler class on iceberg and the "work" class on iceflyer are to allow $ARCHIVE to be visible from batch jobs. We can manage this by job chaining:
- First phase fetches the files from $ARCHIVE and submits the second phase.
- Second phase does the big computation and submits the third phase.
- Third phase moves files to $ARCHIVE.
If the computation is really long and takes more time than the eight hours allowed by the standard class, the first job could check to see if the forcing files are still there from the previous stage before fetching them.
If the fetch fails, the batch script should not try to submit the second phase.
If the files aren't there for the second phase, the big computation should not try to run.
"if" in Shell Scripts
Let us review the syntax of "if" statements in shell scripts. Although I use tcsh for my interactive shell, I use the Korn shell (ksh) for scripts. In this case, it is exactly like the Bourne shell (sh). The only reason I switched to ksh is because you can export variables all in one line:
rather than the sh version:
MP_SHARED_MEMORY=yes export MP_SHARED_MEMORY
Back to the "if" statement. The general form is:
if <some check> then do something fi
with optional elif and else clauses. The simplest form of <some check> is simply a Unix command, which returns a value to the shell indicating success or failure:
if date then echo "date" ran else echo "date" didn't run fiAnother form of <some check> involves a test on some condition. There are two equivalent forms of this:
if [ -d /usr ] then ls /usr fior
if test -d /usr then ls /usr fi
Note that the shell is fussy about whitespace in this example.
Here, "-d" is a test for whether the argument is a directory. There are other tests. An incomplete list is:
-d directory -f regular file -r readable file -w writable file -a file exists -s file exists and is not empty
Putting "if" to Work
The script for the first phase (in the data/work class) could contain lines such as:
# Copy file to current directory then submit the big job phase if cp $ARCHIVE/my_dir/my_file . then /var/loadl/bin/llsubmit phase_2 fi
If more than one file has to be copied, we could check for success after each one (here, "!" is a logical "NOT"):
# Copy files to current directory or die if ! cp $ARCHIVE/my_dir/my_file_1 . then echo problem copying my_file_1 exit fi if ! cp $ARCHIVE/my_dir/my_file_2 . then echo problem copying my_file_2 exit fi /var/loadl/bin/llsubmit phase_2
For phase two, we want to make sure the files are there:
# Check for needed files if [ ! -f my_file_1 ] then echo my_file_1 not found exit fi if [ ! -f my_file_2 ] then echo my_file_2 not found exit fi # Run the job and submit the cleanup phase if successful if ./my_big_job then /var/loadl/bin/llsubmit phase_3 fi
Phase three, moving results back to $ARCHIVE, is left as an exercise for the reader (if you get stuck, contact ARSC consulting which knows how to find me).
X1: Don't Forget to Link for SSP Mode
If your code multi-streams poorly, you should test it in single-streaming, or SSP, mode. To do so, compile all source files with "ftn -O ssp ..." or "cc -h ssp ...", and don't forget to link with "-O ssp" or "-h ssp" as well. If you link without the "ssp" option, your application will run in the default MSP mode using only one SSP per MSP.
A user had this problem last week and discovered it as follows. His was an OpenMP program, and he expected to run it with 16 OpenMP threads, one per SSP on a single shared-memory X1 node. The following command is correct, and tells the scheduler to run the application with 16 threads:
aprun -d 16 ./a.outIt, however, reported the error message:
"aprun -d cannot exceed node size"
The problem: while he'd compiled for SSP mode, he'd linked for MSP mode, and there are only 4 MSPs per node. Relinking with "-O ssp" immediately solved the problem.
The "file" Unix command will immediately tell you if an executable file was linked in SSP or MSP mode:
% ftn -O ssp -c tt.f % ftn -o tt tt.o % % file tt tt: ELF 64-bit MSB executable (not stripped) MSP application NV1 - version 1 % % ftn -O ssp -o tt tt.o % file tt tt: ELF 64-bit MSB executable (not stripped) SSP application NV1 - version 1 %
Quick-Tip Q & A
A:[[ C and Fortran compilers let me define pre-processor macros on the [[command line, like this, for instance: [[ [[ cc -D VERBOSE -c mysource.c [[ [[ But I use makefiles, and would prefer this: [[ [[ make -D VERBOSE myapp [[ [[ Is there a way to pass macro setting "through" a make command to [[ be used as compiler options? # # Many thanks to Brad Chamberlain: # My strategy is to define a Makefile variable like "MYFLAGS". You can then set Makefile variables on the command line. For example, consider the following Makefile: test: @echo $(MYFLAGS) Running this normally results in a blank line: > make Running it with a command-line assignment to MYFLAGS yields: > make MYFLAGS=-foo -foo You can also give multiple options using quotes: > make "MYFLAGS=-foo -bar" -foo -bar Thus, you can take your Makefile commands for compiling a .c file and add a variable name like this to the cc command specification in order to add additional flags (like your -D definitions) at a make command line. Note that command-line settings override those in files, thus I could have added a line like: MYFLAGS=-default-flags to the Makefile so that if I didn't specify anything on the command-line, I would've gotten the default behavior: > make -default-flags The other two examples would work as before, though. # # From Jed Brown: # The quick and portable option is to do % make DEFS='-DFOO=1 -DBAR=0 -DQUX=1' target where makefile contains something like file.o : file.c $(CC) $(CFLAGS) $(DEFS) -c file.c or CFLAGS+= $(DEFS) if applicable. If you really want to use syntax as stated, then put .for VAR in FOO BAR BAZ QUX .ifdef $(VAR) DEFS+= -D$(VAR)=1 .endif .endfor towards the top of the makefile. This will work with BSD style make, but GNU make has different syntax and does not support defining variables using "make -D VAR" syntax since "make VAR=1" is equivalent. A:[[ I've noticed that I can sometimes get colors to work with emacs, but [[ other times I cannot. Colors are great for syntax and variable [[ highlighting. [[ [[ A remote linux system + a Mac terminal window ($TERM=linux) does great [[ with colors, but many other combinations do not (including the emacs [[ that ships with the Macs). Does anyone know good ways to find out [[ whether color is available in a particular emacs installation, and if [[ so how to get colors to display? # # Better late than never! In fact, if you've got a really good answer to # a question from 7 years ago, send it in. # # Thanks first to Martin Luthi: # Make sure that global-font-lock-mode is enabled, as many distributions have it disabled as default. M-x global-font-lock-mode toggles that behaviour. You can either customize the variable global-font-lock-mode, change it in the Options menu (Syntax highlighting) in modern Emacs (>22), or set it in your .emacs as: (setq global-font-lock-mode t) If you run emacs within a terminal (emacs -nw), the font lock capabilities depend on the terminal's capabilities. # # And another thanks to Brad Chamberlain: # Try using "M-x list-colors-display". This should open a *Colors* buffer that lists all the color names and how they look as foreground and background colors. I don't have a Mac, and all my emacs buffers support color, but I suspect that this would be one way of proofing whether colors are supported by an implementation. Q: Uh oh... I just deleted some files that I don't even own! How'd that happen? Why'd it let me do that! (Oh boy... I think I'm in trouble.) % ls -l total 2048 -rw------- 1 fred puffball 30 Aug 11 16:57 file.junk -rw------- 1 horace heatrash 21922042 Mar 18 10:01 file.priceless -rw------- 1 fred puffball 30 Aug 11 16:57 file2.junk -rw------- 1 horace heatrash 4808440 Mar 19 11:21 file2.priceless % % rm -f file* % ls -l total 0 %
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.