ARSC T3D Users' Newsletter 54, September 30, 1995
New Charging Algorithm at ARSC
As of October 1st, 1995 ARSC will be changing its algorithm for calculating Service Units (SUs). The current algorithm is:
1.000*Y-MP(CPU-hours) + 0.050*T3D(CPU-hours) + 0.005*Memory(MWord-hours) + 0.003*your Denali files(GByte-hours) ------------------------ = total SU'sThe new algorithm as of October 1st is:
1.000*Y-MP(CPU-hours) + 0.010*T3D(CPU-hours) + 0.005*Memory(MWord-hours) + 0.003*your Denali files(GByte-hours) ------------------------ = total SU'sThe only change is the charge for using the ARSC T3D:
from: 0.050 *T3D(CPU-hours) to: 0.010 *T3D(CPU-hours)This reflects a depreciation of T3D time in line with our attempts to boost T3D utilization. It brings our valuations in line with other T3D sites and advertises our wish to become a production site for T3D applications. If you have questions about this change contact Mike Ess.
T3E Information at the Alaska CUGCUG is the best place to get information about CRI and their users. In the coming weeks I'll be sending out some of what I learned at this CUG but for this issue I'd like to share what I learned about the T3E. These are just my notes from various speakers and I couldn't write as fast as they could talk.
From Jeff Brook's talk on single PE optimization:
T3D T3E --- --- Dec's EV4 chip Dec's EV5 chip 6 clock pipeline 4 clock pipeline 150 Mhz 300 Mhz 8KB data cache first 8KB data cache direct mapped direct mapped write back write back secondary 96KB cache 3 way set associative write thru cache coherency by CRI 1 read ahead stream 6 read ahead streams From the streams benchmark (simulated results for T3E)(MBs) operation IBM 6000/590 T3D T3E copy 600 205 543 scale 533 151 514 summation 655 151 583 triad 655 106 606 (This benchmark measures the bandwidth of the memory system as timed by a DO loop executing the above operations.)From Steve Johnson's talk on new hardware from CRI:
Design started in 4Q 1992 Targeted to be 3X better that T3D in the components: CPU speed Memory Interconnect speed Entry level price $.5M 1 PE per node (currently 2 PE per node on T3D) 1 I/O controller per 4 PEs (currently 1 I/O node per 64 PEs on T3D) Can use CRI's SCX connection to a CRI PVP machine Available March 1996From Irene Qualters talk in the General Session:
Target is 3X the performance of the T3D From simulations: 5X faster on some applications 4.2X faster on the harmonic mean for the Livermore Loops Will have "serverized Unix" I/O nodes and system server nodesFrom Gary Geissler's talk "T3E Overview and Status:
The current status is that they are waiting for custom chips to come back from their IC vendor. The official product announcement will be at the end of November (so all that is in this report is subject to change.) CRI is on target with their long term goal for the T3X family (based on 2048 PEs):
1993 T3D Peak performance 300 Gflop/s 1996 T3E Peak performance 1 Tflop/s 1998 T3SN Sustained " 1 Tflop/s (SN = scalable node)All of the T3E is implemented in CMOS, which is a proven, mature technology. CMOS implementation permitted the implementation of an air cooled version. The T3E will be "binary compatible" with T3D executables but additional performance will come from recompiling the source. It is not intended to be a collection of workstations but can be used in that mode. Available in 16 to 2048 PEs in multiples of 4PEs. Available in late March, 1996.
Uses the Dec Alpha 21164, EV5 chip. The design starts with the 300 Mhz version but has the ability to use the 380 Mhz and 450 Mhz versions of the chip. This chip can execute four instructions simultaneously: 1 floating point add, 1 floating point multiply and 2 integer instructions. This is in contrast to the T3D which can execute only 2 instructions per clock period.
With a single PE per node, the memory is now 8 bank interleaved with a 1.2 GB/s peak bandwidth. This memory will have 6 streams as opposed to the 1 stream on the T3D. These high speed registers perform like the single stream on the T3D accessible with the rdahead flag on the loader. Their prefetch capability is shown to give better performance than an off chip cache in simulation.
The torus interconnect is more efficient and consistent than that on the T3D. It has an adaptive capability to deviate from the usual path between specific PEs to avoid contention at an intermediate PE.
With 4 PE's per board (or module) there will be 1 SCX channel to control I/O. (SCX is CRI's I/O technology beyond hippi, maybe more details in future newsletters.)
Hardware on DisplayThere was a T3E chassis on the floor in the commons area at the Alaska CUG. On the last day of CUG there were a table with the CPU board, memory daughter cards, power supply and cooling elements on display next to the chassis.
Announcement from the Pharoh: a T3D Optimization ConferenceThis message is addressed to those working on application development on the CRI T3D computer system. Please forward it freely to anyone who might be interested at your site.
Pittsburgh Supercomputing Center (PSC), Ohio Supercomputing Center (OSC), and Arctic Region Supercomputing Center (ARSC) (a MetaCenter Regional Alliance) ----together with---- CRAY Research, Inc. (CRI) are pleased to announce a *************************************************************** * * * Meeting on the Optimization of Codes for CRAY MPP Systems * * January 24-26, 1996 * * Pittsburgh, Pennsylvania * * * *************************************************************** The purpose of this meeting is to bring together developers of T3D code and promote discussions of their experiences on the T3D, enabling them to further optimize the performance of their codes on the T3D and T3E. Selected presenters will deliver brief talks describing T3D projects, implementation design decisions, optimization strategies, resulting code performance, and any circumstances inhibiting further optimization of the code. In addition to the talks, there will be opportunities for formal and informal discussions among the participants. For those participants interested in collaborating on code development or testing their newly acquired ideas, our state-of-the-art training facility will be available for the duration of the meetings. Registrations are currently being accepted from those interested in presenting and/or attending the meeting. ----- Application Deadline: October 16, 1995 ----- More details can be found by opening http://www.psc.edu/ and following the "Hotlist" link. =============================================================== REGISTRATION INFORMATION: The registration fee for this 3-day meeting is $75, which includes breakfast and lunch for the 3 days and the cost of handout materials. Housing and travel are the responsibility of participants, but we will assist you in making reservations. Group rates on local hotel accommodations are available on a first-come, first-served basis. If you are interested in being a presenter and/or attending the meeting, please return your completed registration form by October 16, 1995 to: Workshop Application Committee ATTN: Anne Marie Zellner Pittsburgh Supercomputing Center 4400 Fifth Avenue Pittsburgh, PA, 15213 You may also apply by sending the requested information via electronic mail to email@example.com or via fax (412/268-5832). Specific questions should be directed to Cathy Milligan, Pittsburgh Supercomputing Center's Education and Training Coordinator, at firstname.lastname@example.org (412/268-8263). =============================================================== REGISTRATION FORM: Name: Department: Univ/Ind/Gov Affiliation: Address: Telephone: Electronic Mail Address: Social Security Number: Citizenship: Academic Status (please mark one): F - Faculty PD - Postdoctorate GS - Graduate Student UG - Undergraduate UR - University Research Staff UN - University Non-Research Staff GV - Government I - Industrial O - Other Would you like assistance with arranging hotel accommodations (yes/no)? Briefly describe your computing background and research interests. Are you interested in being a presenter and attending the meeting, or just attending the meetings? ___ presenter ___ just attending If you are interested in being a presenter at this meeting, please submit a brief abstract of your talk describing your T3D project and your involvement in it. Be sure to include details on the project's goals, implementation design decisions, optimization strategies, performance, and any circumstances inhibiting further optimization of the code. Please note that if there are more offers of presentations than can be accommodated, the organizers will select those that they judge have interest to the widest audience.
List of Differences Between T3D and Y-MPThe current list of differences between the T3D and the Y-MP is:
- Data type sizes are not the same (Newsletter #5)
- Uninitialized variables are different (Newsletter #6)
- The effect of the -a static compiler switch (Newsletter #7)
- There is no GETENV on the T3D (Newsletter #8)
- Missing routine SMACH on T3D (Newsletter #9)
- Different Arithmetics (Newsletter #9)
- Different clock granularities for gettimeofday (Newsletter #11)
- Restrictions on record length for direct I/O files (Newsletter #19)
- Implied DO loop is not "vectorized" on the T3D (Newsletter #20)
- Missing Linpack and Eispack routines in libsci (Newsletter #25)
- F90 manual for Y-MP, no manual for T3D (Newsletter #31)
- RANF() and its manpage differ between machines (Newsletter #37)
- CRAY2IEG is available only on the Y-MP (Newsletter #40)
- Missing sort routines on the T3D (Newsletter #41)
- Missing compiler allocation flags (Newsletter #52)
- Missing compiler listing flags (Newsletter #53)
Ed Kornkven ARSC HPC Specialist ph: 907-450-8669 Kate Hedstrom ARSC Oceanographic Specialist ph: 907-450-8678 Arctic Region Supercomputing Center University of Alaska Fairbanks PO Box 756020 Fairbanks AK 99775-6020
Subscribe to (or unsubscribe from) the e-mail edition of the
ARSC HPC Users' Newsletter.
Back issues of the ASCII e-mail edition of the ARSC T3D/T3E/HPC Users' Newsletter are available by request. Please contact the editors.