ARSC HPC Users' Newsletter 395, Sep 26, 2008

Midnight Programming Environment Updates

There will be several programming environment updates on midnight during the scheduled maintenance in October. The following updates will be made to the PathScale and PGI programming environments:



Module Name      Will Alias To          Previously Aliased To
---------------  -------------------    ----------------------
PrgEnv.old       PrgEnv.path.old        PrgEnv.path-2.5
PrgEnv.path.old  PrgEnv.path-3.0        PrgEnv.path-2.5
PrgEnv           PrgEnv.path            PrgEnv.path-3.0
PrgEnv.path      PrgEnv.path-3.1        PrgEnv.path-3.0 
PrgEnv.new       PrgEnv.path.new        PrgEnv.path-3.1
PrgEnv.path.new  PrgEnv.path-3.2        PrgEnv.path-3.1

PrgEnv.pgi.old   PrgEnv.pgi-7.0.2       nothing -- new module
PrgEnv.pgi       PrgEnv.pgi-7.1.6       PrgEnv.pgi-7.0.2
PrgEnv.pgi.new   PrgEnv.pgi-7.2.2       nothing -- new module

If you use the default Programming Environment and you do not wish to use the newer version of the PathScale compiler you will need to update your ~/.login (csh/tcsh users) or ~/.profile (ksh/bash users) to use PrgEnv.path-3.0. E.g.


# module load PrgEnv
# explicitly load PrgEnv.path-3.0
module load PrgEnv.path-3.0

For more information on the Midnight PrgEnv modules, see these resources:

Combining Gridded Data and Grid Data Using ncatted and ncks

[ By: Patrick Webb ]

Many analysis tools and visualization tools prefer to have the grid and the gridded data together in the same file. However, it is not always the case that the data is in the same file as the file containing the grid. In this article I will show how to put the grid and data together into the same file. The example I will show uses the netCDF file format, two netCDF file manipulation utilities ncatted and ncks, and two simple steps.

The step-by-step flow starts with two files, a grid file and a data file. 1) Use ncks to copy the grid data from the grid file into the data file. 2) Use ncatted to modify the attributes of variables in the data file to reference the grid data.

Here is the example: We have two files, grid.nc and data.nc. The grid.nc file has our grid data in terms of latitudes and longitudes of points, and the data.nc has our data. The dimensions are the same for the grid and the data. If we did an ncdump of the header of the grid.nc and data.nc file we would expect to see dimensions and variables something like this:


    netcdf grid {
    dimensions:
        x_point = 100;
        y_point = 100;
    variables: 
        lat(x_point, y_point);
            lat:units = "degree_north";
        lon(x_point, y_point);
            lon:units = "degree_east";
    ...
    
    netcdf data {
    dimensions:
        x_point = 100;
        y_point = 100;
    variables:
        temp(x_point, y_point);
            temp:units = "Celsius";
    ...

What we want to do is copy the lat and lon variables from the grid.nc file into the data.nc file, and make sure that the variables in the data.nc file have a properly defined coordinates attribute.

Step one is to copy the lat and lon variables from grid.nc to data.nc. The utility used to do this is ncks (which stands for NC Kitchen Sink), which has a wide variety of options for working with netCDF files. This is how the ncks command is used.


    ncks -v lat,lon -c -h grid.nc data.nc

Let's break that down. The '-v' option defines a list of variables to be operated on. The list of variables are comma-separated, with no spaces. '-c' controls which variables are extracted, in this case the variables in the argument and no other related variables. Leaving '-c' out will copy the variables in the argument as well as any related variables. The '-h' option turns off writing to the history attribute, which if many operations are done can grow to be very large. The last two arguments are the source file (grid.nc) and the destination file (data.nc).

If you do an ncdump of data.nc, this is what you will see:


    netcdf data {
    dimensions:
        x_point = 100;
        y_point = 100;
    variables:
        temp(x_point, y_point);
            temp:units = "Celsius";
    
        lat(x_point, y_point);
            lat:units = "degree_north";
    
        lon(x_point, y_point);
            lon:units = "degree_east";
    ....

Now the grid data is copied to the data.nc file. The next step is to edit the attributes of the variables in the data.nc file that will use the grid. Once this is complete, the data.nc file will contain both the grid data and the necessary variable attributes. The command used is ncatted, and here's how it works.


    ncatted -h -a coordinates,temp,o,c,"lon lat" data.nc

The '-h' option again turns off writing to the history attribute. '-a' starts the attribute description. The attribute description is composed of five parts in this order: attribute name, variable name, mode, attribute type, attribute value. For this example:


    Attribute name = "coordinates"
    Variable name = "temp"
    Mode = "o", which stands for "overwrite", overwriting the named
        attribute, if it exists.
    Attribute Type = "c", which denotes the "char" type, since the
        attribute in this case is a character string.
    Attribute Value = "lon lat", a character string.

The last argument is the input file. When the command finishes, you will end up with a header that looks like this:


    netcdf data {
    dimensions:
        x_point = 100;
        y_point = 100;
    variables:
        temp(x_point, y_point);
            temp:units = "Celsius";
            temp:coordinates = "lon lat";
    
        lat(x_point, y_point);
            lat:units = "degree_north";
        
        lon(x_point, y_point);
            lon:units = "degree_east";
    ....

The new 'coordinates' attribute added to temp will allow it to use the grid data that has been copied into the data.nc file.

File Links and the readlink Command

[ By: Ed Kornkven ]

Linux (or Unix) links provide a mechanism for referring to a file by more than one name. Links come in two varieties, hard links and soft links (aka symbolic links or symlinks).

For more introductory discussion of hard and soft links, I'll refer the interested reader to a couple of links (pun intended):

Functionally, soft links differ from hard links in a couple of ways. First, we can't make hard links to directories


    mg57> ln .. hardparent
    ln: `..': hard link not allowed for directory

Second, we can't make a hard link across file systems, whereas a soft link is permitted. If we try to make a link called "myhome" to our $HOME directory, the soft link will succeed but the hard link will fail:


    mg57> ln $HOME myhome
    ln: `/u1/uaf/kornkven': hard link not allowed for directory
    mg57> ln -s $HOME myhome
    mg57> ls -g
    total 0
    lrwxrwxrwx  1 staff 16 2008-09-16 12:09 myhome -> /u1/uaf/kornkven/

These functional differences are due to differences in the way the two links are implemented. Hard links are simply alternate names to the file. Along with a pointer to the actual file data, the file system saves information about the file, including a count of the number of links to the file (the original name counts as one). One name is as good as the other and it doesn't matter which name was the original. When one of the names is deleted, the count is decremented. When the count goes to zero, the file contents are irretrievable (there are no more names for it) and the file is therefore "deleted".

A soft link on the other hand, can be thought of as a file which contains the path to the original file (the actual implementation may differ from this simplified description). The important thing is, to access the file, the contents of the link file must be followed, akin the indirect access provided by C or Fortran pointers. Like a C or Fortran pointer, the distinction between the link and the target is important. If a symbolic link is deleted, the original file exists by its original name. If the original file is deleted instead, the symlink is "dangling" -- it still exists but points to nothing.

Let's try out some file link operations. Many operations can operate on symbolic links as if they were the original files -- that is what makes them useful. Look at this example, in which we create a dummy file in a subdirectory of our $WORKDIR and then create some links to it. To create the file, we just arbitrarily redirect the output of an "ls -g" just to give us something to look at.


    mg57> ls -g > dummy
    mg57> ln dummy hard
    mg57> ln -s dummy soft
    mg57> ls -g
    total 0
    -rw-------  2 staff 53 2008-09-25 18:12 dummy
    -rw-------  2 staff 53 2008-09-25 18:12 hard
    lrwxrwxrwx  1 staff  5 2008-09-25 18:13 soft -> dummy

We see that dummy and hard are both 53-byte files, whereas soft is only five bytes long. If we display the contents of hard we see the same thing as when we look into dummy:


    mg57> cat hard
    total 0
    -rw-------  1 staff 0 2008-09-16 11:27 dummy

But in spite of the apparent file size difference, the same is true for soft:


    mg57> cat soft
    total 0
    -rw-------  1 staff 0 2008-09-16 11:27 dummy

The Linux "cat" understands the difference between link types and tries to do the "right" thing for soft links by using the link reference directly or indirectly. In the example of "ls" above, the direct reference was used and we saw the name of the target of the symlink. On the other hand, the "cat" command followed the link and displayed the contents of the source file. The default behavior of such commands is often, but not always, what we want when using symbolic links. For those exceptions, many commands offer options that say, "don't (or do) follow the link". For example, if a soft link points to a directory and we do an ls on it, we get an "ls" listing similar to the previous one:


    mg57> ln -s .. parent
    mg57> ls -g parent
    lrwxrwxrwx  1 staff 2 2008-09-25 18:20 parent -> ../

But what if we want to see the contents of the directory that parent references? For that situation, the "-H" option to "ls" gives us what we want:


    mg57> ls -gH parent
      [... long listing of my $WORKDIR comes out here ...]

We can get the same effect by appending a "/" at the end of the directory name:


    mg57> ls -g parent/

One command that does not have an option for changing the way a soft link is handled is "mv". Let's see what happens to the links when we move our original file to the parent directory and then reference it using the links.


    mg57> mv dummy ..
    mg57> ls -g
    total 4
    -rw-------  2 staff 53 2008-09-25 18:12 hard
    lrwxrwxrwx  1 staff  2 2008-09-25 18:20 parent -> ../
    lrwxrwxrwx  1 staff  5 2008-09-25 18:13 soft -> dummy
    mg57> cat hard
    total 0
    -rw-------  1 staff 0 2008-09-25 18:12 dummy
    mg57> cat soft
    cat: soft: No such file or directory

The hard link is still an alias for the file "dummy" even though we moved it and we can still access "dummy" through it. However the soft link is still pointing to a file named dummy in the current directory. Since we moved the target file, "soft" can no longer reference it. Another way to look at it is that the user has to keep symbolic links and their targets consistent, but the OS keeps track of the hard links to a file. If you look at the above listing, the entry for "hard" has a "2" after the permissions. For a hard link, that is the number of links to the file that we mentioned earlier. When that number goes to 0, the file has been deleted -- i.e., there are no more links to it.

Now delete the original and see what happens:


    mg57> rm ../dummy
    mg57> ls -g
    total 4
    -rw-------  1 staff 53 2008-09-25 18:12 hard
    lrwxrwxrwx  1 staff  2 2008-09-25 18:20 parent -> ../
    lrwxrwxrwx  1 staff  5 2008-09-25 18:13 soft -> dummy
    mg57> cat hard
    total 0
    -rw-------  1 staff 0 2008-09-25 18:12 dummy
    mg57> rm hard
    mg57> ls -g
    total 0
    lrwxrwxrwx  1 staff 2 2008-09-25 18:20 parent -> ../
    lrwxrwxrwx  1 staff 5 2008-09-25 18:13 soft -> dummy

Note that "soft" still points to a file named "dummy", even though it no longer exists. If we were to create a new file "dummy", "soft" would again be valid:


    mg57> echo "This is a new dummy file" > dummy
    mg57> cat soft
    This is a new dummy file

That behavior can be welcome or not, depending on what one is expecting. The point is, it is up to the user to keep soft links consistent with their source files. But it definitely presents a problem if we really do want to mv the file that a symlink points to, and not the link itself. For example, suppose we create a file in another place, and we have a link to it, and we want to move the file using the link. First, create a new dummy file in $HOME and make a new soft link to it called "soft2":


    mg57> echo "One more dummy file" > $HOME/dum2
    mg57> ln -s $HOME/dum2 soft2
    mg57> cat soft2
    One more dummy file
    mg57> ls -g
    total 4
    -rw-------  1 staff 25 2008-09-25 18:32 dummy
    lrwxrwxrwx  1 staff  2 2008-09-25 18:20 parent -> ../
    lrwxrwxrwx  1 staff  5 2008-09-25 18:13 soft -> dummy
    lrwxrwxrwx  1 staff 23 2008-09-25 18:46 soft2 -> /u1/uaf/kornkven/dum2

Now try to move it:


    mg57> mv soft2 ./dum2
    mg57> ls -g
    total 4
    -rw-------  1 staff 25 2008-09-25 18:32 dummy
    lrwxrwxrwx  1 staff 23 2008-09-25 18:46 dum2 -> /u1/uaf/kornkven/dum2
    lrwxrwxrwx  1 staff  2 2008-09-25 18:20 parent -> ../
    lrwxrwxrwx  1 staff  5 2008-09-25 18:13 soft -> dummy

The "mv" didn't work! All "mv" does to a symlink is rename it. What we need is the "readlink" command which returns the name of the file that a symlink points to, for example:


    mg57> readlink dum2
    /u1/uaf/kornkven/dum2

Now use the name returned by "readlink" as the source file in our "mv" command:


    mg57> mv `readlink dum2` ./dum2
    mg57> ls -g
    total 4
    -rw-------  1 staff 25 2008-09-25 18:32 dummy
    -rw-------  1 staff 20 2008-09-25 18:45 dum2
    lrwxrwxrwx  1 staff  2 2008-09-25 18:20 parent -> ../
    lrwxrwxrwx  1 staff  5 2008-09-25 18:13 soft -> dummy
    mg57> cat dum2
    One more dummy file

Voila! We moved the file the symlink points to, not the symlink itself.

Quick-Tip Q & A


A:[[ My advisor just asked me to run a hundred different simulations.
  [[ Is there an easy way to generate a PBS script for each of the input
  [[ files he gave me?  If it helps, the input file is passed on the 
  [[ command line to the program.
  [[
  [[ e.g.
  [[ ./a.out input000.nc
  [[
  [[ And each input file is in a separate directory.
  [[
  [[ input001/input001.nc
  [[ input002/input002.nc
  [[ ...
  [[ input100/input100.nc
  [[
  [[ So how do I do make this happen without wearing out my keyboard?


#
# Thanks to Jed Brown for sharing the following bash solution which
# puts all of the simulations in a single file.
#


A quick and dirty bash solution:

$ for n in {1..100}; do \
    printf './a.out input%03d/input%03d.nc\n' $n $n; done > joblist


#
# Ryan Czerwiec shared this solution which uses vi to edit the individual
# batch files.
#

This is a straightforward and fairly simple problem to solve with a
great variety of methods.  I can see quick solutions with awk, sed, and
the like, but I thought I'd submit a more oddball approach that can
prove quite useful sometimes.

I assume: parent directory contains a.out and a template batch file for
case 000 (I'm calling it batchfile once the number is truncated), file
naming convention always pads with zeroes as in the example file names
given.

#!/bin/csh -f
echo >! temp.vi
set istart = $1
set iend = $2
set idigit = `printf $iend 
 wc -c`
@ index = $istart + 1
set istart = `printf "%${idigit}s" $istart 
 tr " " "0"`
while ( $index <= $iend )
  @ idum = $index + -1
  set idum = `printf "%${idigit}s" $idum 
 tr " " "0"`
  set idum2 = `printf "%${idigit}s" $index 
 tr " " "0"`
  echo :1,\$ s/input${idum}/input${idum2}/g >>! temp.vi
  echo :w\! input${idum2}/batchfile${idum2} >>! temp.vi
  @ index++
end
echo :q\! >>! temp.vi
vi input${istart}/batchfile${istart} < temp.vi
\rm temp.vi
exit


This script will take two arguments, the start (template file) and end
numbers, so "./script.csh 0 100" in this case.


#
# Here's a solution from the editor which uses a template PBS file and sed
# to build a PBS script for each input file.
#

This solution uses sed to replace the string "INPUT_FILE" in a template
file with the input filename in a directory.  This technique builds
a PBS script for each input file so you can run the jobs independently.


mg56 % cat template.pbs 
#!/bin/bash
#PBS -l walltime=8:00:00
#PBS -j oe
#PBS -q standard

cd $PBS_O_WORKDIR

./a.out INPUT_FILE


mg56 % for d in *; do if [ -d $d ]; then cd $d; f=$d.nc; \
   cat ../template.pbs 
 sed -e "s/INPUT_FILE/$f/" > $d.pbs; cd ..; fi; done



Q: I am running on a multicore Opteron processor.  I need to know which one
   of the processor cores is running my program.  How can I print from my
   program which core is executing it?
   

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top