Special Reprt

Riding the WAVE: I-WAY in Real Time at SC'95

by Holly Korab

In December 1995, the high-speed experimental distributed computing project called I-WAY (Information Wide Area Year) made its long-anticipated debut at SC'95. It was not flawless, but it propelled collaboration technology years ahead by presenting a workable prototype for distributed computing.

Speaker 1
Ian Foster, ANL computer scientist, describes the software effort at SC'95. Foster led 30 software designers in developing I-WAY's distributed computing environment that includes Point of Presense (I-POP) machines and I-Soft software.

Though not with the power of a tsunami, the next wave of high-performance networking that debuted at SC'95 in December proved, to anyone who still held doubts, that distributed metacomputing is on its way. During three-and-a-half frantic days of the conference, researchers and networkers demonstrated all the essential functions for distributed computing that they had built from scratch during the previous nine months. For instance they used a single interface to schedule and initiate runs on remote computing resources, and they used Message Passing Interface (MPI) -- a software library for porting applications to other computer architecturest -- run applications in heterogeneous computing environments. They accomplished these by winning the support and cooperation of telecommunications carriers and 30 different academic and research institutions [see access, Fall 1995].

In his keynote address, William A. Wulf, professor of engineering and applied science at the University of Virginia and former assistant director of CISE, crowned I-WAY "a national treasure." Another person likened it to inventing the internal combustion engine, automobile, and interstate highway system all at once.

"We wanted to show that we could push the concept of high-speed ATM networks across the country and create the framework that would make this happenÐall at the same time," I-WAY cofounder Tom DeFanti said. "We knew it was possible, but others said it was not. We built the pieces, and this was the test. And it worked." DeFanti is associate director of the Virtual Environments Group at NCSA and director of the Electronic Visualization Laboratory at UIC.

Lessons learned
Exactly what worked -- and particularly what did not work -- will be dissected and researched this next year. More than just a demonstration, I-WAY was a testbed with two goals: to present a prototype for high-speed distributed computing and to identify its weaknesses.

Distributed computing links remote visualization and computing resources via high-speed networks into single virtual computing systems. Because this schema is a faster, more realistic means of assembling computing power than is trying to amass it in one place, many people believe distributed computing is the future of high-performance computing. The problem is that high-speed networks are not interoperable. Like the railroads of the 19th century that were built to different gauges, high-speed networks are of varying protocols and routing and switching technologies. I-WAY was the "first draft" of standard gauges and switches for distributed computing.

"I've always viewed I-WAY as a process, not a thing," said Gary Minden, program manager for the Advanced Research Projects Agency's (ARPA) Information Technology Office and a principal supporter of I-WAY. "Its purpose is to find out what the right questions are."

Although the right questions will be winnowed out during a follow-up meeting being arranged by Minden and representatives from NSF -- I-WAY's other primary backer -- a few of the likely topics already can be gleaned from the experiences at SC'95.

A topic likely to appear at the top of the list, because it embraced so many aspects of the project, is promoting better collaboration among all layers and entities of a distributed system. Of the 60 2D and 3D large-scale applications scheduled to run over the I-WAY (jointly called the GII Testbed and HPC Challenge events), only about half did so. Some researchers changed their plans and ran the applications locally using precomputed data. Of those who attempted to use the I-WAY, a handful were forced to resort to precomputed data or run their applications on the Internet. Many of their problems were traced to glitches in their applications. Others were due to inconsistencies elsewhere in the network, such as the incorrectly configured router at one institution that caused one researcher's application to creep across the I-WAY at only 5% of its anticipated bandwidth. More often than not, problems with the I-WAY stemmed from fledgling interconnections between the networks -- pieces of which had been assembled in less than nine months and cobbled together in a week.

The amount of information required for engineering the final stage of network assembly was enormous. Many of the configurations were complicated by the need to access remote resources from showfloor booths as well as from the GII Testbed. Many of these requirements were uncovered once the WAVE team assembled in San Diego. (WAVE -- short for Wide Area Visualization ExperimentalÑwas the name given to the portion of the I-WAY inside the convention hall.) With time at a premium and complications mounting, the vBNS (very high-speed Backbone Network Services) was the only one of 10 networks operating at the start of the GII Testbed. The other networks came up over the next three days; eventually all carried GII applications. Still the vBNS was used most extensively.

That the vBNS worked as well as it did owed much to collaborations, said Linda Winkler, a computer scientist at Argonne National Laboratory (ANL) and one of the primary architects of the I-WAY network. She is convinced vBNS would not have been running by the first day of the GII Testbed if specialists from the NSF supercomputing centers and MCI had not pitched in. "MCI made sure we understood the connectivity, and the centers made sure their resources were accessible. When two of the routers werenÕt responding, they all poked their heads into the problem to solve it," said Winkler.

Collaboration among 30 software designers led by Ian Foster at ANL produced one of the most creative components of I-WAY. The I-WAY Point-of-Presence (I-POP) machines and I-Soft software that comprised the I-WAY distributed computing environment shielded researchers from many of the intricacies of the networks. Researchers in the GII Testbed would log into any one of 17 I-POPs, schedule time on "I-WAY virtual machines," and initiate runs on required resources without being concerned with such issues as authorization, scheduling, and network interfaces at each remote site. That was what happened when everything worked as planned. As with the networks, circumstances often conspired to complicate this scenario. With application requirements and network connections changing hourly, Foster's team often found themselves manually reconfiguring virtual machines and network interfaces as well as chasing bugs in user programs and I-Soft software.

These kinds of on-the-fly changes had their benefits, such as an improvement to the I-Soft MPI developed for the I-WAY Asynchronous Transfer Mode (ATM) networks. NCSA astrophysicist Ed Seidel and NCSA computational biologist Marcus Wagner both used the specialized MPI. In so doing, Seidel discovered two MPI bugs, which Foster's team corrected in time for Wagner's demonstration.

Interplays like those between Seidel, Foster, and Wagner -- which produced a better MPI -- attest to why collaboration is essential for development of distributed computing. Networkers need feedback. "Researchers will have to delve more deeply into the workings of these networks if their applications are to perform successfully," said Rick Stevens, director of the Math and Computer Science Division and Computing and Communications Center at ANL, and who, along with NCSA Director Larry Smarr, is the third cofounder of I-WAY. "Researchers have to understand that the network is introducing a whole new set of variables into their application. It's like the adjustments they had to make when switching from vector to massively parallel programmingÑthere are more things that can go wrong."

Another topic certain to dominate follow-up discussions about I-WAY is management tools to simplify interactions with the network. Foster envisions automated programming tools that will perform many of the I-POP tasks now done manually, such as scheduling resources and configuring virtual machines. "Eventually you want the applications people to specify the computing resources they need and have the automated scheduler reserve time and resources rather than requiring the researchers to select the machines and capabilities they want, which was an interim measure we adopted for I-WAY," said Foster.

Remy Evard would like to see tools developed for ATM, the transfer protocol being tested on the I-WAY. He spent hours manually debugging the ATM-based network. Evard, director of technology at Northeastern University who was team leader for WAVE, likened debugging ATM to editing a book written on index cards. "Imagine if someone can only give you a yes or no answer as to whether they like your book. If they answer no, then you have to go back through the stack of cards one-by-one to discover where the error lies. If a conjunction were missing and you put it in, suddenly the book works. That's debugging with ATM. If in the middle of all this a piece of hardware goes down, that is like someone flicking 10 cards out of the middle of the stack," said Evard.

I-WAY Heroes
In addition to teaching everyone a lot about constructing a distributed system, I-WAY generated its own brand of legends. Few people operating behind the scenes will forget the researchers and programmers who debugged all day and all night. These people earned I-WAY hero reputations for their endurance. Others acquired reputations for grace under pressure. When the network finally came up at 10:05 a.m. on the opening day of the conference, researchers scheduled for the first three slots in the GII Testbed had only one hour and 50 minutes to test and debug their applications. They did it.

Special praise is due those responsible for networking during the conference. In addition to I-WAY, they integrated three networks: SCinet (the onsite production network), a wireless local area network, and WAVE. Strung through the rafters were 150 miles of fiber optic cables with 840 tails (or connections). Connectivity within the convention center was compared to that of a small city.

Networking crews began arriving the week before the conference. These were the tactical assault units from the five supercomputing centers as well as ANL and Northeastern University. As the week progressed, the population inside the Plexiglas -- enclosed network operations center swelled to about 50 or 75 and included additional assistance, such as two network engineers from the National Institutes of Health and six students from the Naval Postgraduate School in Monterey, who sacrificed studying for their final exams to participate in marathon networking.

Twenty-hour days were the norm. Flexibility was the key as they bandaged connections and tried to satisfy ever-changing requirements and expectations. Evard summed up the experience as "10 days of nonstop problem solving and knowing exactly what your priorities were."

Cave Setup
Behind the scenes at SC'95, technical staff set up the CAVE for the GII Testbed exhibits.


Two Guys
Graduate Research Assistant Daniel Weber (left) and NCSA Research Scientist Ed Seidel (right) debug their gravitational wave application.


Birds of a Feather
Setting priorities will be essential in continuing the work begun with I-WAY. During a birds-of-a-feather session at SC'95, participants emerged from the GII Testbed to indicate what they thought should be the next steps in distributed computing. On their list was further automation of the I-POPs as was the development of network management tools. Also mentioned was network-based collaborative software for designing and managing distributed computing networks.

A priority for NSF and ARPA was maintaining the enthusiasm for high-bandwidth distributed computing that was evoked by I-WAY. Enthusiasm will be essential if science is to transform the spurts of flawless networking witnessed at SCÕ95 into a solid entity. DeFanti believes that is a goal within science's grasp. He reminded the participants of his earlier experiences with virtual environments. When three CAVEs running 40 applications from researchers across the country were demonstrated at SIGGRAPH '94, those systems, like I-WAY, were beset with problems. This year they performed flawlessly. "The virtual environments came a long way in a year," said DeFanti. "In five years distributed computing will be commonplace, and you will look back on this event as a watershed in networking."

Return to the Table of Contents.

NCSA Home Page


access / Spring 1996 / Email comments to NCSA Publications Group: pubs@ncsa.uiuc.edu