ARSC T3E Users' Newsletter 195, May 12, 2000

SHMEM_BROADCAST vs MPI_Bcast

[ Brad Chamberlain (U. Washington)'s answer to last week's Quick-Tip is worthy of a full-blown article. Here it is: ]

Both SHMEM_BROADCAST and MPI_Bcast are designed for the same purpose, but have different assumptions.

For shmem_broadcast , the processors to which you are broadcasting must have my_pe() values that are strided by two. The reason for this is that the interface requires you to specify the processor set as a starting index and the log_2 stride for the indices.

This is a piece of cake if you are broadcasting to everyone -- simply specify 0 as the start and 0 as the stride (2^0 = 1, or every processor from 0..p-1.

It also means that if you've thought of your processors as being laid out in a 2D grid, where the size of each grid dimension is a power of two, that you can broadcast to all processors in your "row" or "column" pretty easily. For example, given 16 processors in a virtual 4x4 configuration:


         0  1  2  3
         4  5  6  7
         8  9 10 11
        12 13 14 15
You can broadcast to your row by specifying 0, 4, 8, or 12 as the start and 0 as the stride. Or broadcast to your column by specifying 0, 1, 2, or 3 as the start and 2 as the stride (2^2 = 4). (This same idea applies to higher dimensional grids with this constraint as well).

However, imagine that you were using 9 processors and thought of them as being in a 3x3 configuration:


        0  1  2
        3  4  5
        6  7  8
There's now no simple way to broadcast only to your column due to the fact that consecutive processors are strided by 3 (which isn't an exact power of two, so log_2 of it is not a clean integer).

MPI_Bcast on the other hand, is completely general. It uses the MPI notion of an MPI_Communicator ( MPI_Comm ) which is an arbitrary set of processors. Thus, there is a slight overhead of learning how to set up a new communicator (and MPI supplies lots of interesting ways to do this), but it can include any subset of the processors. And once its set up, you can use it for the duration of your program.

Since MPI_Bcast is more general than shmem_broadcast , it's safe to say that MPI_Bcast was not implemented using a simple shmem_broadcast in the general case, and therefore will incur some extra overhead. Thus, for a given program, you should decide whether shmem's constraints are something you can live with. If they are, use it. If not, and you want to be fully flexible, use the MPI call.


  # Editor's note- See issues #124 and #146 for more on MPI 
  #  Communicators:
  #     
    

    /arsc/support/news/t3enews/t3enews124/index.xml
    

    
  #     
    

    /arsc/support/news/t3enews/t3enews146/index.xml
    

Clarification: NQS Limits Article (Previous Newsletter)

The article, "Specifying Front-End NQS Limits Unnecessary on Yukon," in newsletter #194 , created a little confusion. We rewrote, and hopefully improved, the article for the "Web Edition."

Basically, every T3E NQS job does some work on a single processor (the "front-end"). This might include commands like cd, mv, f90, cc, etc. It also does work in the parallel processors. Such work is initiated with an mpprun or mpirun command.

It is VERY important to specify time and number-of-PEs requests for the parallel portion of your work. These requests are made with the qsub options:


  -l mpp_t=
  -l mpp_p=
The optional requests, which can cause problems, are those applied to the single-processor portion of the job (typically, these are -lM, -lm, -lt, -lT ).

Reminder to ARSC users:

Once your qsub script completes it parallel work, it should call "qalter -l mpp_p=0" and then submit a null job. This releases the parallel processors to other users while your job finishes its single processor work on the front-end.

The technique, and examples, are described in the article: "Yukon is busy! Help Us Provide More Cycles!" in issue #191:

/arsc/support/news/t3enews/t3enews191/index.xml

HUG2000


>            HUG2000: The 4th Annual HPF User Group meeting
>           
(http://www.tokyo.rist.or.jp/jahpf/hug2000)

> 
> The Fourth Annual High Performance Fortran User Group (`HUG') meeting
> will be held on October 19-20, 2000 at Hotel Intercontinental Tokyo
> Bay, Tokyo, Japan. This meeting follows the first three meetings in the
> series held in 1997 in Santa Fe, New Mexico, USA, in 1998 in Porto,
> Portugal, and 1999 in Redondo Beach, California, USA.
> 
> It provides an opportunity for users of HPF to meet each other, share
> ideas and experiences, and obtain up-to-date information on current HPF
> implementations and future plans.
> 
> Important dates
>   July   31, 2000:  deadline for submission of abstracts
>   August 23, 2000:  notification of acceptance
>   September 22, 2000:  deadline for camera ready manuscripts

BTW: We'd love to print your reviews of the MPP related conferences you attend.

Quick-Tip Q & A



A:{{ Shouldn't I use SHMEM_BROADCAST instead of MPI_Bcast?  Don't they
  {{ do the same thing?  And isn't SHMEM always fastest?


  To Brad's comments, above, we might add that MPI is the most common
  and portable communication library. If you replace your MPI_Bcasts
  with SHMEM_BROADCASTs, do it with pre-processor #ifdef's so the code
  will use the MPI version on non-Cray systems.

Also, in a stride one test, SHMEM_BROADCAST does appear to be faster than MPI_Bcast, especially for very small message sizes. For messages over 1000 words, the differences become negligible. Sample output from one run of the test program, and the test code, appeared in newsletter #166: /arsc/support/news/t3enews/t3enews166/index.xml See the article, "VAMPIR Images of MPI, SHMEM, and Co-Array Fortran Broadcast". Q: My computer center uses Kerberos/SecurID for authentication. Sometimes, I need to use my SecurID card to generate two or three valid cardcodes, all within a few minutes. Do I really have to re-enter my SecurID PIN over and over? Can't I just let the 6-digit code roll over? (I tried it once, and wound up in "next-token" code mode.)

[ Answers, questions, and tips graciously accepted. ]


Current Editors:
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669
Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678
Arctic Region Supercomputing Center
University of Alaska Fairbanks
PO Box 756020
Fairbanks AK 99775-6020
E-mail Subscriptions: Archives:
    Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.
Back to Top