 |
Computational astrophysics code runs on 1,500 processors at SDSC and NCSA
The vision of scientific computing in the future relies on computational gridspowerful processors, research tools, and huge data archives linked by fast networks and advanced software. These grids will be as easy to use as the Web and as convenient as turning on your kitchen faucet to get water. In a tour de force of massively parallel computing, the San Diego Supercomputer Center (SDSC) at UC San Diego, the National Center for Supercomputing Applications (NCSA) at the University of Illinois at Urbana-Champaign, Argonne National Laboratory, and the Max Planck Institute for Gravitational Physics (Albert Einstein Institute) in Potsdam, Germany, collaborated in a grid computing demonstration that brings that vision one step closer to reality.
On three occasions in April, researchers in Germany and the United States ran a massive relativistic astrophysics simulation at the two supercomputer centers. "We used the Cactus Computational Toolkit to compute the evolution of gravitational waves according to Einstein's theory of General Relativity," said Thomas Dramlitsch, the researcher at the Max Planck Institute who coordinated the run. "Since the experimental proof of the existence of gravitational waves is a major challenge in theoretical and experimental physics and a truly exact computation of these waves is still not possible due to insufficient computational power, these simulation runs are very important for us."
The runs were the largest simulations involving Einstein's General Relativity equations to date, according to Ed Seidel, an astrophysicist at the Max Planck Institute and NCSA and head of the research team based in Germany. Each four-hour run of the Cactus code package was set up for three supercomputers at NCSA and one at SDSC and linked across the continent by an OC-12 network running data at 622 megabits per second. NCSA used 480 CPUs of three SGI Origin2000 computers running the Irix 6.5 operating system. One of the systems is a 256-processor array used exclusively for large compute runs and the other two are Origin arrays of 128 processors each. SDSC used 128 nodes of Blue Horizon1,020 processors of the big IBM machine's total of 1,152. The entire configuration involved 1,500 processors, each as powerful as a high-end desktop computer but far more useful due to the software that coordinated them.
Software toolkits make run possible
In addition to the Cactus Toolkit, two other pieces of advanced software made the distributed simulation run feasible. One was Globus, a toolkit for programming grid computing systems and the basic software infrastructure for systems that integrate geographically distributed computational and information resources. Globus development is centered at Argonne National Laboratory and the University of Southern California's Information Sciences Institute, and major partners include the National Computational Science Alliance, the National Partnership for Advanced Computational Infrastructure, the NASA Information Power Grid project, the University of Chicago, and the University of Wisconsin. Over the past several years, Globus has been deployed at more than 100 sites around the world.
The second software tool that enabled the runs was MPICH-G2, a grid-enabled implementation of Message Passing Interface (MPI) version 1.1. Message passing is a standard technique for coordinating applications run on massively parallel supercomputers or workstation clusters. MPICH-G2 allows MPI applications to run on multiple computer systems at the same time, including machines of different architectures that utilize different scheduling systems. It uses services provided by the Globus Toolkit to coordinate and manage work on multiple computer systems.
"We ran each of these very large simulations as a single Globus job, and they performed very well. Best of all, even though the code had been scaled up to run on 1,500 processors and utilized a long-distance high-performance network connection, it executed at better than 70 percent efficiency," said John Towns, director of NCSA's Scientific Computing division.
"There can be many reasons to distribute a large processing job among several computers on a grid instead of running it at a single site," said Phil Andrews, advanced systems manager at the San Diego Supercomputer Center. "The most common justification would be availabilityyou might get scheduled for 1,024 processors of a single big machine in three or four weeks, while smaller blocks of CPUs at three or four sites might be available immediately. Or it could turn out to be less expensive to run across two sites rather than at one big machine. Or a truly complex simulation might require 6,000 or 8,000 processors to achieve the necessary accuracy or to arrive at a result within the lifetime of the researcher."
Simulating events that have never been seen
The simulation calculated by the Cactus researchers involved the propagation of gravitational waves. According to General Relativity, violent events such as colliding black holes emit large amounts of gravitational radiation, which although predicted for a century, has not yet been seen. With the advent of new detection technology, scientists hope they will be able to detect gravitational waves resulting from the collision and merger of two black holes within the next several years. But such collisions are rare eventsaccording to current estimates, perhaps only one per year will be detectable with current technologyand it is important for scientists to be able to recognize their "signatures" when they do occur. The Max-Planck group and other researchers around the world are working to develop the capability to simulate these events to help experimental physicists know what signatures to look for.
Cactus is a computational science toolkit for scientists and engineers that can tackle complex three-dimensional simulations, from the effects of General Relativity to chemical reactor flows. The code runs on many architectures, and applications developed on standard workstations or even on laptops can be seamlessly run on clusters or supercomputers. The modular structure of Cactus encourages both parallel computation across different machine architectures and collaborative code development among different groups. Cactus provides easy access to many cutting-edge software technologies, including the Globus Metacomputing Toolkit, HDF5, parallel file I/O, the PETSc scientific library, adaptive mesh refinement, Web interfaces, and advanced visualization tools.
"The Cactus code should be viewed as a framework for all kinds of numerical simulations," Dramlitsch said. "It is useful not only in the theoretical physics of gravitational waves or in astrophysical simulations of cosmology, neutron stars, black holes, and so on, but also in hydrodynamics, quantum mechanics, and other fields. All the capabilities built into Cactus that allow it do our General Relativity runs can be used by other codes almost immediately."
The relativity simulation run across SDSC and NCSA was done without modifying the physics code; other codes inserted into the Cactus framework could be run in this manner as well. The code originated in the academic research community and is open source software.
"Together, Globus, MPICH-G2, and Cactus form a very powerful tool for distributed computing," Dramlitsch said. "At present many things still have to be adjusted by hand, but in the near future we will be able to run Cactus and similar codes on a regular basis. With the usage of resource brokers and portals, simulations like this will not be necessarily bound to a single machine. Users will just specify the requirements they neednumber of CPUs, amount of memory, execution time, et ceteraand then resource brokers will find the best match among all the machines in the grid."
"Although we didn't model collisions of black holes on this particular run, we proved what we could do with such distributed simulationsif we had regular access to such a machine," said Seidel. "We could run scenarios at least five times larger than we've ever done before! All of our proven, tested routines would actually run quite well in such an environment."
Access Online | Posted 5-22-2001
|