ARSC HPC Users' Newsletter 355, January 26, 2007
Volcano Research and ARSC Post-Doc
Here's an interesting feature on one of the ARSC Post-Doctoral Fellows and his work:
Passing Environment Variables to Loadleveler Jobs
One of the things that surprised me when I started using LoadLeveler was that you could not pass command line arguments to a script when it was submitted. If you try, you will get an error like this:
iceberg2 1% llsubmit myjob.ll 100 200 llsubmit: 2512-034 File 200 not found. llsubmit: 2512-055 Unable to process the job command file "200" for input, the error is: A file or directory in the path name does not exist.. llsubmit: 2512-051 This job has not been submitted to LoadLeveler.
I've used a couple of methods to solve this:
- hard code the information into the job script, or
- read the information from a text file.
Neither of the options is really elegant. With option (a), you need a new script for each set of command line arguments you use. With option (b), you need a new input file for each additional job you run.
However there's a slightly more graceful way to pass information to Loadleveler scripts, using environment variables.
If the LoadLeveler keyword "environment" is set to "COPY_ALL", all environment variables from your login environment will be copied to your LoadLeveler script when you submit it. With this feature you can set an arbitrary environment variable to be used by your job script:
iceberg2 2% export NUM_ITERATIONS=19 iceberg2 3% llsubmit myjob.ll ...
Where the job script looks like this:
iceberg2 4% cat myjob.ll #!/bin/bash # @ error = $(executable).$(jobid).eo # @ output = $(executable).$(jobid).eo # @ notification = never # @ job_type = parallel # @ environment = COPY_ALL # @ node = 1 # @ tasks_per_node = 8 # @ network.MPI = sn_all,shared,us # @ node_usage = not_shared # @ class = standard # @ wall_clock_limit=3600 # @ queue # Check to see if $NUM_ITERATIONS is set: if [ ! -z "$NUM_ITERATIONS" ]; then echo Job input=$NUM_ITERATIONS # Run a job using $NUM_ITERATIONS as a command line argument poe ./a.out $NUM_ITERATIONS exit 1 else echo "Error \$NUM_ITERATIONS not set" 1>&2 echo "exiting..." 1>&2 exit 1 fi
There's one minor annoyance with this design. If you forget to set the environment variables (e.g., $NUM_ITERATIONS) before you submit the job, it will exit immediately. One improvement would be to write a simple wrapper script to ensure that all of the necessary variables are set, e.g.,:
iceberg2 5% run_job.bash #!/bin/bash if [ $# -eq 1 ]; then export NUM_ITERATIONS=$1 llsubmit myjob.ll else BASE=$(basename $0) echo "Usage: $BASE iterations" 2>&1 echo "" 2>&1 exit 1 fi
Using this wrapper script, if you happen to forget to set the command-line options, an error message will be displayed:
iceberg2 6% ./run_job.bash Usage: run_job.bash iterations
Environment variables can also be used in chained jobs to pass information between successive runs. Just be sure to include the "environment=COPY_ALL" keyword and option combination in all of the Loadleveler scripts and update the environment variables you are using before submitting the chained script.
Iceberg, Multiple Compiler Versions Available
Occasionally you may want to reproduce an earlier compilation of your code, using an old compiler version (for comparison with a new compiler, for instance). More often, you may want to try the very latest compiler, which has not yet been made the general default.
"Modules," which Cray has used for almost a decade to manage multiple versions of programming environments and packages on its systems, are gaining popularity elsewhere. E.g., they're now included in the SUSE Linux distribution.
ARSC has installed modules on both iceflyer and iceberg as a local modification.
To give you the flavor of modules on iceberg, here's a sample session in which the user does the following:
- Checks the current mpxlf90 version (mpxlf90 -qversion)
- Learns about the modules command (news modules)
- Initializes modules for ksh user (. /usr/local/pkg/modules/init/ksh)
- Determines which modules are available (module avail)
- Determines which modules are already loaded (module list)
- Loads the latest xlf module (module load xlf.10.1.0.3)
- Checks which modules are already loaded, again (module list)
- Checks the current mpxlf90 version, again (mpxlf90 -qversion)
b1n1 % mpxlf90 -qversion IBM XL Fortran Enterprise Edition V10.1 for AIX Version 10.01.0000.0001 b1n1 % news modules [... cut ...] b1n1 % . /usr/local/pkg/modules/init/ksh b1n1 % module avail --------------- /usr/local/pkg/modules/modules-3.2.2/Modules/versions ---------------- 3.2.2 ----------- /usr/local/pkg/modules/modules-3.2.2/Modules/3.2.2/modulefiles ----------- dot module-cvs module-info modules null use.own ---------------------------- /usr/local/pkg/modulefiles/ ----------------------------- SciPy idl ncview tau vac.220.127.116.11 ferret idl-6.2 ncview-1.92e tau_64 xlf.10.1.0.0 ferret-5.41b ncl ncview-1.93b vac.18.104.22.168 xlf.10.1.0.1 gnuplot ncl-4.2.0.a030_64 numpy-0.9.8 vac.22.214.171.124 xlf.10.1.0.2 gnuplot-4.0.0 ncl-4.2.0.a033_64 python-2.4.3 vac.126.96.36.199 xlf.10.1.0.3 b1n1 % module list No Modulefiles Currently Loaded. b1n1 % module load xlf.10.1.0.3 b1n1 % module list Currently Loaded Modulefiles 1) /xlf.10.1.0.3 b1n1 % mpxlf90 -qversion IBM XL Fortran Enterprise Edition V10.1 for AIX Version 10.01.0000.0003
Stay tuned to "news compilers" for announcements of updates to the default compilers. We recommend you avail yourself of the "modules" command to test new compilers, prior to upgrades. If you use "modules" frequently, you might add the appropriate initialization step to your shell startup file.
Quick-Tip Q & A
A:[[ Too many times, I've wanted to clean out a few files with a command [[ like this: [[ % rm 200612*.dat old *.txt junk* [[ [[ but was typing too fast and entered something like this, [[ % rm 200612*.dat old * txt junk* [[ [[ thus deleting everything. [[ [[ I absolutely do not have the patience to make a habit of "rm -i". I'd [[ just like to disable "rm *". Is there a way to do this? Any other [[ great ideas to keep me from deleting everything with the flub of [[ a finger? # # Thanks to Liam Forbes: # Write your own rm command to first echo the globbed command line and then prompt only once whether to continue or not. A simple script placed in one's own bin directory that does this can then be aliased to "rm". #!/bin/bash echo /bin/rm $* echo "Are you sure [y/N]" read answer if [ ":y:" == ":$answer:" ]; then /bin/rm $* else echo "okay then" fi Here's an example; I want to delete all the numbered ones but keep the original, unnumbered copy. nelchina$ alias rm="~/bin/rm" nelchina$ ls -l total 8 drwx------ 2 lforbes staff 4096 2007-01-12 18:11 ./ drwxr-xr-x 15 lforbes staff 4096 2007-01-12 18:10 ../ -rw------- 1 lforbes staff 0 2007-01-12 18:10 test -rw------- 1 lforbes staff 0 2007-01-12 18:11 test.1 -rw------- 1 lforbes staff 0 2007-01-12 18:11 test.2 -rw------- 1 lforbes staff 0 2007-01-12 18:11 test.3 -rw------- 1 lforbes staff 0 2007-01-12 18:11 test.4 -rw------- 1 lforbes staff 0 2007-01-12 18:11 test.5 nelchina$ rm * .* /bin/rm test test.0 test.1 test.2 test.3 test.4 test.5 . .. Are you sure [y/N] n okay then The nice thing is that any command line options you would use with rm can be used this way because they are passed to the command by $*. So if you wanted to use "-i" with rm, you would be prompted one extra time. nelchina$ rm -i *.* /bin/rm -i test.0 test.1 test.2 test.3 test.4 test.5 Are you sure [y/N] y /bin/rm: remove regular empty file `test.0'? y /bin/rm: remove regular empty file `test.1'? y /bin/rm: remove regular empty file `test.2'? y /bin/rm: remove regular empty file `test.3'? y /bin/rm: remove regular empty file `test.4'? y /bin/rm: remove regular empty file `test.5'? y # # Rich Griswold gives us the ZSH answer: # ZSH handles rm as a special case: % rm 200612*.dat old * txt junk* zsh: sure you want to delete all the files in /home/richard/data [yn]? n % # # And Don Bahls stumbled on this tcsh answer: # If you are using tcsh, you can set rmstar=1 to have tcsh prompt you when "rm *" is used: nelchina% set rmstar=1 nelchina% touch 1 2 3 4 5 nelchina% rm * Do you really want to delete all files? [n/y] y If you don't want tcsh to prompt, simply do this: nelchina% unset rmstar Q: I'd like to pipe "ls -l" into "cut" to extract certain fields, like group, size, and name, and I'd like to use spaces as delimiters, because it's easer to count the fields. So with this "ls -l" output: -rw------- 1 robert jstme 3183 2007-01-23 14:57 file1 -rw------- 1 robert them 973 2006-09-15 08:19 file3 -rw------- 5 bert justm 15096 2006-11-16 13:04 file2 I tried: ls -l cut -f4,5,8 -d' ' but it gave me this garbage: robert jstme 2007-01-23 robert 15096 Am I doing something wrong?
[[ Answers, Questions, and Tips Graciously Accepted ]]
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.