ARSC HPC Users' Newsletter 363, June 08, 2007

ARSC Summer Science Seminar Series, 2007, Begins Next Tuesday

ARSC is hosting a weekly, public science seminar series. The focus is on the application of computer and information-based technologies to solving real-world problems.

Dates: Tuesdays, June 12 - July 31 (except July 3) Time: 1:00 pm Location: West Ridge Research Building (WRRB), Room 010   (See map, at: http://www.arsc.edu/ )

First Seminar of 2007:


  "IPY Explorations via Virtual Globes." 
    Presented by: Institute of Northern Engineering Associate 
      Research Professor, Matt Nolan
    June 12, 1:00 pm, WRRB 010.

Extended presentation for UAF researchers, this Tuesday:


  "Getting IPY Research Highlighted in GoogleEarth"
    Presented by: Matt Nolan 
    June 12, 2:30 pm, WRRB 010.

For more information on the science seminar series, contact ARSC Chief Scientist Greg Newby at newby@arsc.edu .

Resolving Linker Errors - Part I


  [by: Don Bahls]

If you've been programming for any amount of time, it's likely that you've run into a linking error at one time or another. Sometimes it's apparent why the linking error is occurring. Perhaps the simplest error is forgetting to include a required library or object file during linking. Here's a simple demonstration of this error:


   mg56 % pathcc dot_product.c -I . -c
   mg56 % pathcc myprog.c -I . -c
   mg56 % pathcc myprog.o -o myprog
   myprog.o(.text+0x3b): In function `main':
   : undefined reference to `dot_product'
   collect2: ld returned 1 exit status

If "dot_product.o" is added to the linking line above, the undefined reference error will go away. This example is really simple, but for codes with numerous object files it can become a difficult task to track down linking errors.

Tools of the Trade

There are a few tools that are useful for inspecting libraries and object files. These programs can be a great place to start when you have no idea why a linking error is occurring. The commands below should be available on any Linux system. In cases where the command is different on an AIX system, the AIX version is listed in parenthesis.

nm: The "nm" command will list the symbols from an object file or archive file.

E.g.:

   
   mg56 % nm dot_product.o
   0000000000000000 T dot_product

   mg56 % nm myprog.o
                    U dot_product
   0000000000000000 T main

Generally, all routines in a source file will produce a corresponding text symbol, denoted with "T" above. Any routines not defined in a file will result in an undefined symbol, denoted with a "U". When linking occurs, the undefined symbols need to find a corresponding text symbol. There are slightly different rules for dynamic libraries, which we won't discuss in this article.

file: The "file" command displays information about a file. For object files, it will show you whether or not a file uses the 32 bit or 64 bit ABI.

E.g.:

   mg56 % file dot_product.o
   dot_product.o: ELF 64-bit LSB relocatable, AMD x86-64, version 1 (SYSV), not stripped

   mg56 % file dot_product.o
   dot_product.o: ELF 32-bit LSB relocatable, Intel 80386, version 1 (SYSV), not stripped

When linking occurs, all of the files need to use the same ABI. The example above shows files using an AMD x86-64 object and then the standard Intel 32 bit object format.

objdump: The "objdump" command will display information from an object file or library.

(AIX command: dump )

An archive file can contain multiple object files. If you aren't sure what ABI was used to compile the object files, the objdump command will list this information:

E.g.:

   mg56 % objdump -a /usr/local/pathscale/lib/libnetcdf.a           
   In archive /usr/local/pathscale/lib/libnetcdf.a:

   attr.o:     file format elf64-x86-64
   rw------- 2640/206  33368 Sep 15 08:12 2006 attr.o


   dim.o:     file format elf64-x86-64
   rw------- 2640/206   9536 Sep 15 08:12 2006 dim.o
   ...
   ...

On AIX system "dump -a" will provide similar information.

Finding Undefined Symbols

When a linking error occurs, I usually start by trying to determine whether or not the symbol is defined in an object file or library within the source tree.

Here's a linking error I ran into recently:


   mpif90  HIM_error_handler.o user_initialization.o ...
   mpp.o(.text+0x3692): In function `MPP_NODE.in.MPP_MOD':
   : undefined reference to `mld_id__'
   collect2: ld returned 1 exit status
   gmake: *** [fms.x] Error 1

The find command along with the nm command can show the information that we're looking for. This command finds all object files in the directory "exec" and displays any file with the string "mld_id" in it.

E.g.:

   mg56 % find exec -name \*.o 
 while read f; do
     if [ $(nm $f 
 grep -i mld_id 
 wc -l ) -gt 0 ]; then
       echo $f;
       nm $f 
 grep -i mld_id;
     fi
   done
   exec/mpp.o
                    U mld_id__
   exec/threadloc.o
   0000000000000000 T mld_id_

I'm actually looking for "mld_id" rather than "mld_id__" just in case there's a name mangling issue of some sort. Many Fortran compilers will add one or more underscores to the end of a subroutine name to make the symbol names different than C symbols. The search above also ignores case since it is fairly common for Fortran compilers to make the symbol name all lower case or all upper case.

Notice in the example above, the undefined symbol has two underscores, while the text symbol has a single underscore. So it is likely that this is a name mangling issue. At this point, since we know the name of the object file, we can look for the source file that created it. In the case of this code, the object files all get placed in a single build directory, while the source is left in the original directory.

The find command also helps in the task of searching for the source file which generated this symbol.

E.g.:

   mg56 % find $WRKDIR/code/src -name threadloc.\*
   /wrkdir/bahls/code/src/shared/mpp/threadloc.c

When I looked in this file I found that indeed there was a C routine called "mld_id_". I could add a second underscore to the routine name and recompile threadloc.c, however it isn't always practical to make code modifications like this. In the next part of this series, we will discuss various name mangling schemes and how this particular linking error might be handled.

If the symbol was not found during the first search, the search could be repeated looking at library files instead. Here's an example:


   mg56 % pathcc simple_wr.c  -c simple_wr
   mg56 % pathcc simple_wr.o -L/usr/local/pathscale/lib -o simple_wr
   simple_wr.o(.text+0x85): In function `main':
   : undefined reference to `nc_create'
   simple_wr.o(.text+0x91): In function `main':
   : undefined reference to `nc_strerror'
   ...
   ...

   mg56 % nm simple_wr.o 
 grep nc_create
                    U nc_create

   mg56 % find /usr/local/pathscale/lib/ -name \*.a  
 while read f; do
     if [ $(nm $f 
 grep -i nc_create 
 wc -l ) -gt 0 ]; then
       echo $f;
       nm $f 
 grep -i nc_create;
     fi
   done
   /usr/local/pathscale/lib/libnetcdf.a
   0000000000001856 T nc_create
                    U nc_create
                    U nc_create

The find command above shows that the symbol "nc_create" is in libnetcdf.a. I simply forgot to include the netcdf library (-lnetcdf) on the linking line.

In the next part of this series we'll look at ways to handle name mangling issues.

When is No-Data Zero?


  [ by: Lee Higbie ]

I have been rewriting a code that has some Fortran syntax that bothered me. One routine called another using array parameters, but there was an array size mismatch between the actual and dummy arguments. I encapsulated the issue in a short program:


      program main
           implicit none
           integer, dimension (5,3,2) :: x, y
           integer i, j, k            
           do k = 1, 5
              do j = 1, 3
                 do i = 1, 2
                    x(k,j,i) = k*100 + j*10 + i    ! so it's easy to see where from
                    y(k,j,i) = x(k,j,i)
                 enddo      ! end loop on i
              enddo      ! end loop on j
           enddo         ! end loop on k
           do k = 1, 5
              print 1, k, (j, (x(k, j, i), i= 1,2), j= 1,3)
        1     format(i3, ': ', 3(i6, 2i4, ',') )
           enddo      ! end loop on k

           do j = 1, 3
              do k = 1, 5
                 call see (x(k, j, :), k, j)       ! 2 non-contiguous memory elements
              enddo      ! end loop on k
           enddo      ! end loop on j

        end program

        subroutine see(x, i, j)
           implicit none
           integer x(7)               ! means that x has more elements than in caller
           integer i, j
           print 1, 'See ', i, j, ': ', x      
        1  format(a, i2, ',', i2, a, 7i7)
        end subroutine

One interesting aspect of this program is that it behaves differently on our various compilers. I ran this code on Iceflyer (using xlf90), Midnight (using pathf90), Nelchina (using pgf90). One run was done with compiler defaults and one with array bounds checking enabled. In addition, the program above and one without the array y were compiled and run.

None of the 12 cases complained about the array size change across the call. The first two elements of x in subroutine "see" were correct in all cases. The undefined elements of x were zero in some cases and other values in others.

Machine With Defaults With Bounds Checking
Iceflyer all 5 vals = 0 various small values
Midnight various values various values
Nelchina all 5 vals = 0 all 5 vals = 0

On both Iceflyer/xlf90 and Nelchina/pgf90, the results happened to be the same with or without the declaration of array y. On Midnight/pathf90 the results were different when y was removed from the program.

The specific code that encouraged me to try this little test program is a production code running on an XD1. I suspect that the users of the badly-written program were lucky in that the off-the-end values of the arrays were values that did not affect their computations noticeably. Many programs can tolerate setting missing values to zero. Many others will produce rather more interesting and surprising results.

More seriously, the conclusion is that you can't rely on compilers to catch out-of-bounds errors. When passing arrays as arguments, you can use modules, use automatic arrays, or be very careful about array lengths.

Quick-Tip Q & A


A: [[ What's your favorite shell alias or shell one-liner.


#
# Greg Newby
#

This sets my prompt for tcsh and variants:

  set hn=`hostname -s`
  alias sp  'set prompt="${hn}(${USER}) [\\!] `dirs
sed -e '\''s
 .*

'\'' -e '\''s
.*[^/]\(/[^/]*/[^/]*\)
..\1
'\''` } "'
  alias cd 'cd \!*;sp'

In zsh, something similar in cyan:

  PS1=$'%{\e[1;36m%}%m(%n) %~ [%!] > %{\e[1;0m%}'

I have a variation for ksh, too.


#
# Ryan Czerwiec
#

My favorite alias would be:

  alias CLS 'CLEAR ; echo "            Turn off the Caps Lock, dummy\! "'


#
# One from the Editor
#

Here's how to make your own chipotle sauce.  

1) Open a can of Chipotle Peppers in Adobo Sauce (Embasa is the better
brand, because their peppers are skinned). 2) Mash or blend everything
to the desired consistency. 3) Store in a small jar in the fridge.
Awesome on burritoes and vegie burgers.

Hey, wait a minute, what was the question?


#
# Kate Hedstrom
#

I mostly use aliases to remember common options:

  alias xfig 'xfig -dontshowballoons -startg 2 -fg black -bg LightYellow'

I like to start up xfig without the tooltip hints because I learned
it before the tooltips - I could probably handle the tips now. I also
like to have it start up with a background grid.

  alias knitfig 'xfig -metric -startposnmode 3 -startgridmode 2'

In addition to the above, this will put xfig in metric mode and also
constrain the positioning of objects to fit with its library of
knitting symbols.

  alias ispell 'aspell -c'

There was once a program called ispell which has been replaced by
aspell. The -c option makes it behave much like the old ispell.

  alias psg           'ps -elf 
 grep \!* 
 grep -v grep'

This has probably been mentioned before. Use it like "psg matlab" to
see all the matlab processes on a system - except the grep for matlab.
I think I got it from an old Sun tutorial.


#
# Ed Kornkven
#

I use aliases for shortcuts and ease of typing, but also for
remembering stuff.  In the latter case, it isn't so much that
I type the alias frequently, but that I can list my aliases and
remember how to do something.

Examples of useful (for me) shortcuts:
  alias amke=make
  alias mkae=make

In the memory category, I sometimes want to know what are the
macros that are predefined by the C preprocessor:
  alias list_predef_macros='touch n_o__n_o__.h; cpp -dM n_o__n_o__.h; rm n_o__n_o__.h'


#
# Martin Luthi
#

  alias psg='ps -elf 
 grep $1 '

Example usage: 

  mybox> psg ntop

shows all running processes (on Linux) that contain a certain
pattern. On different systems (BSD syntax), ps might need other
parameters, for example:

  alias psg='ps -aux 
 grep $1'"


#
# Follow-up from Martin Luthi to last week's answer: 
#

> #
> # Thanks to Martin Luthi for this python solution. 
> #
> If you have Python 2.4 or 2.5 available, the built-in set type does
> all you need: 
> 
> ====
> #!/usr/bin/env python
> 
> file1 = set([line for line in file('old.txt')])

  This line of course should be:
  
    file1 = set(file('old.txt').readlines())
  
  If the order of lines is important, here is a straight-forward but
  maybe slow (for *huge* files) version
  
  ====
  #!/usr/bin/env python
  
  lines = file('old.txt').readlines()
  for line in file('additions.txt'):
      if not line in lines:
          print line
  
  ====        
  
  Thinking about the problem somewhat more, the obvious solution is
  to use diff. If you are using Emacs, ediff (M-x ediff) is especially
  nice to highlight and easily modify two somewhat different files.




Q: I share a directory with my group. It contains files which are all
   in the same Unix group, but with several different owners (including
   my girlfriend AND ex-girlfriend!).

   Now I need to copy the entire directory to a new host, but 
   "scp -rp" changes the ownership of all the files to **ME** and so
   does tar/untar.  Is there any way to preserve file ownership across
   this move!?

   Everyone's mad at me!




  [[ Answers, Questions, and Tips Graciously Accepted ]]

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Editors:
--------
   Tom Baring, ARSC HPC Specialist, baring@arsc.edu, 907-450-8619
   Don Bahls, ARSC HPC Specialist, bahls@arsc.edu, 907-450-8674

Subscription Information:
-------------------------
   Subscribing: send this message to: "majordomo@arsc.edu":
     subscribe hpc_users
     end
   Unsubscribing: send this:
     unsubscribe hpc_users
     end
   For help with majordomo, send this:
     help
     end
   In all cases, leave the "subject" line of your message blank.

   Messages sent to "owner-hpc_users@arsc.edu" will be forwarded to 
   the editors.  Let us know if you have problems with majordomo.

Back Issues are Available:
--------------------------
   - Web edition:   http://www.arsc.edu/support/news/HPCnews.shtml
   - E-mail edition archive:
                    ftp://ftp.arsc.edu/pub/publications/newsletters/

-----------------------------------------------------------------------
Arctic Region Supercomputing Center          ARSC HPC Users' Newsletter
-----------------------------------------------------------------------

[[ Answers, Questions, and Tips Graciously Accepted ]]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top