Cyberenvironments at NCSA
Imaginations unbound
A spark and your mind's on fire. The possibilities are endless. The resources are plentiful. From imagination's chaotic abundance, from this whir of ideas and tools, you'll mine fresh insights, illuminate intractable problems, and uncover new knowledge.
How? Cyberenvironments, built with the National Center for Supercomputing Applications at your side.
With our partners, NCSA creates secure, easy-to-use interfaces to instruments, data, computing systems, networks, applications, analysis and visualization tools, and services. Whether they're atmospheric scientists or astronomers or environmental engineers, people use these cyberenvironments to get a systemic view of their entire discipline, to manage complex projects, and to automate and combine processes.
Here, we show you some of the key parts of the cyberenvironment landscape. What researchers need. What some of the component parts are. And what researchers can make of the tools we offer.
Cyberenvironments at NCSA. Unite resources. Unleash your mind.
Keen insight
Cyberenvironment will assist with complex water issues
The goal of the National Science Foundation's Collaborative Large-Scale Engineering Analysis Network for Environmental Research (CLEANER) is to bring together sensors, data management and mining techniques, and modeling to enable scientists and engineers to collect, integrate, and analyze data and to better collaborate and share information regardless of geographic boundaries.
NCSA is taking a lead role in developing a cyberenvironment to support this research. The prototype CLEANER cyberenvironment includes the:
- CyberCollaboratory, a Web portal to provide access to tools and data and to enable individuals separated by geography to collaborate in a common digital lab.
- CyberIntegrator, which provides a mechanism for easy integration of heterogeneous software tools to support modeling and analysis of complex environmental systems.
- Metadata repository, which stores information on the activities in each component of the cyberenvironment.
- CI-Know, a tool that supports social and knowledge networking.
For example, consider hypoxia (oxygen depletion) in Corpus Christi Bay. Currently researchers are unable to adapt their monitoring efforts to unfolding events. Manual sampling should be increased when the possibility of hypoxia is high, but the researchers cannot integrate the diverse sensor data (some downloaded only once a week) and models to predict when they should send people into the field to collect samples. The CLEANER cyberenvironment addresses this need. The CyberCollaboratory will alert researchers when data received from sensors indicates that hypoxic conditions are expected. Scientists will then be able to discuss the predictions using the portal's chat and message board features, developing a plan to step up their data-gathering efforts. The data collected from the manual sampling effort could then be transmitted back to the cyberenvironment's data store, perhaps triggering simulations and models via the CyberIntegrator. And then these results could also be discussed through the CyberCollaboratory.
New vistas opened
Collaboration streamlines workflows
High-impact science often starts with a challenge in technology. But it can't stop there. Klaus Schulten, of the University of Illinois at Urbana-Champaign's Theoretical and Computational Biophysics Group, recognized this when his group set out to help the National Renewable Energy Laboratory understand how algae proteins could be redesigned so that they would permit hydrogen to bubble out without letting oxygen in.
A satisfactory answer would require the study of a large number of proteins. The simulations themselves don't require much computer time, but managing between 20 and 50 different proteins would take an enormous number of person-hours. Schulten's group knew that this problem would involve using multiple computing resources to perform thousands of calculations simultaneously.
With well more than a decade of working with Schulten's group and months of intense collaboration between the group and NCSA, the team developed a code that significantly streamlined the simulation workflow. The result, NAMD-G, automatically handles authentication, file transfer, jobs submission, and job-chaining work. It also alerts users when something goes wrong, notifies them when the job is completed, and transfers all files onto users' computers for analysis. Furthermore, NAMD-G interacts with queuing systems and can distribute jobs to multiple sites around the world.
This collaboration and its fruits illustrate the potential value of community cyberenvironments in two important ways. They provide the capability to do something that scientists have never been able to do before. They also streamline what they're currently doing. That gives them the opportunity to have more time to spend on the science instead of on mundane tasks.
Unencumbered outlook
Explorations of new technologies are a key strength and mandate
NCSA's Innovative Systems Lab recognizes that future advances can not rely on continuing increases in computer chips' clock speed to drive performance increases. Instead, tomorrow's computing systems will include processors with many cores on each chip, reprogrammable logic devices based on field-programmable gate arrays, heterogeneous computing elements interconnected with high-performance communications fabrics, or some other transformative technology that has yet to emerge.
With that in mind, the ISL is working with Microsoft to introduce academic high-performance computing's first production system based on the Windows Compute Cluster Server. Lincoln is a dual boot system capable of running both Linux and Windows. It will significantly increase the opportunity for commercial and technical applications to tap into high-performance computing.
"We understand that there are technical applications -- many of interest to our Private Sector Partners -- that run Windows. And we expect that the need for high-performance computing among these applications is going to be huge. It's an area that gets no respect, but it's an important area to explore. These sorts of explorations are a key strength and mandate of our Innovative Systems Laboratory," says Rob Pennington, NCSA's Chief Technology Officer and leader of the ISL.
Visionaries set free
High-impact science requires high-performance computing
NCSA's machine room is home to almost 50 teraflops of computing power. The center also provides the experts who keep that power turned on and help the nation's researchers exploit it. In April 2006, one of those researchers hit the pages of Nature with a report on the effectiveness of different mitigation strategies in the face of a global influenza pandemic. Using NCSA's Cobalt cluster, Neil Ferguson and his team simulate every individual in the United States or other countries, together with every school and workplace and the journeys people make.
Features of virus spread are based on historical data from previous influenza outbreaks. But, as the Nature paper points out, "Although...using data from past pandemics should be a priority, it will be impossible to predict the exact characteristics of any future pandemic virus...It will be imperative to collect the most detailed data on the clinical and epidemiological characteristics of a new virus and the impact of control measures early in the emergence of a pandemic and to analyze those data in real time to allow interventions to be tuned to match the virus the world faces."
With this in mind, NCSA is helping other members of the National Institutes of Health's Models of Infectious Disease Agent Study develop schemes for overflow computing, which allows new processors to be introduced into a calculation on the fly, thus making on-demand computing of the sort described easier and more powerful. NCSA is also assisting in profiling the models' performance and optimizing the code being used.
The future, close to home
To exploit new architectures, we experience new architectures
With the birth of the Innovative Systems Laboratory last year, NCSA has begun bringing itself closer than ever to some of the leading lights of the University of Illinois at Urbana-Champaign. Take the Cell processors recently released by IBM, Sony, and Toshiba. Cell technology was developed with a broad range of applications in mind -- from cryptography to gaming. The Innovative Systems Lab is working closely with Marc Snir, head of the university's computer science department, and his team. We're getting a few key scientific applications up and running on the system. We're also trying out the tools they're creating -- like compilers and debuggers -- for the new architecture.
Field-programmable gate arrays (FPGAs), another area of interest, have been around a bit longer and offer us the opportunity to work with the university's electrical and computer engineering department. Currently it takes our team at the Innovative Systems Lab several months to get a code up and running on our existing, experimental FPGA-based systems. That's a large investment, and it typically yields a relatively small improvement in performance. We're working with the ECE department's Wen-Mei Hwu and his team. They're experts in compiler technologies interested in building compilers that will make it easier to port code to FPGAs and to make it run better once it's been ported. Together, we're exploring ways to make applications work across multiple FPGA nodes and ways to measure the performance of those applications.
It is crucial that we invest in these collaborations now, while we're working with a relatively small number of applications and before we're attempting to scale these technologies up and inject them into production-quality systems.
Fresh prospects exposed
Orchestrated TeraGrid simulations and dedicated time at NCSA are key
An ambitious group of more than 40 institutions, together called the Southern California Earthquake Center (SCEC), is building earthquake modeling capabilities to transform seismology into a predictive science similar to weather forecasting. A series of simulations based on SCEC's grid-based scientific workflow tools -- TeraShake 1, TeraShake 2, and the most recent CyberShake -- began in 2004. They've run on TeraGrid resources across the country and are already yielding significant results.
TeraShake 2, for example, simulated a series of earthquakes along the San Andreas Fault. Run at NCSA and SDSC, it revealed a striking contrast in ground motion between ruptures that started at the northwestern end of the fault and those that started at the southeastern end. In earthquakes that start at the southeast end, a chain of sedimentary basins traps seismic energy and channels it into the Los Angeles area.
Simulations of that scale will require the computing power of the TeraGrid. CyberShake ran on 288 processors at NCSA, and each processor had 500 megabytes of RAM dedicated to it. NCSA also devoted more than 80 terabytes of storage space to the run, so that the team could stage data waiting for post-processing. These runs also require systems that can handle high-capacity input-output calculations, so specialized I/O nodes on NCSA's Mercury system are crucial.
NCSA gave the SCEC team dedicated time in its computing queues to debug the final implementation of the Condor glide-ins and to integrate them into the larger workflow. Tailored allocations that give computing time when and how it is needed are an NCSA specialty.
An eye on the future
Cyberenvironment transforms science into practice
NCSA technologists and earthquake engineers at the Mid-America Earthquake Center (MAE) have joined forces to develop the MAEviz loss assessment system, a cyberenvironment that gives decision makers access to tools aimed at helping them assess earthquake hazard and determine how to allocate resources for mitigating risk.
MAEviz's developers envision the system as a collaborative bridge between researchers, engineers, and planners. Researchers create new analysis modules and data sets, engineers create and post scientifically rigorous scenarios based on the new information, and planners download appropriate scenarios to assess the impact on plans. If the potential impact is significant, the groups can coordinate to speed additional research and validation efforts.
Already, collaborations with federal, state, and city governments, utility companies, key industries, and other decision makers are underway. For example, MAE is currently working with Memphis Light, Gas, and Water to evaluate the risk the New Madrid Seismic Zone poses to parts of MLGW's gas network. Now, MAEviz is about to go global, with modules being developed for use in Istanbul, Turkey, and Islamabad, Pakistantwo cities that are all too familiar with the need for earthquake readiness.
Panoramas revealed
Collaborative effort provides the backbone of an emerging cyberenvrionment
NCSA is collaborating with the National Optical Astronomy Observatory (NOAO) to develop solutions for managing the tens to hundreds of gigabytes of data generated each night by its observatories. To lay the foundation for a cyberenvironment for the Large Synoptic Survey Telescope, NCSA and NOAO are developing a prototype data pipeline using the vast stores of data generated by the ground-based observatories NOAO oversees.
For LSST, community access will be provided through a Web-based virtual observatory (VO), including an authentication and authorization framework for the NOAO portal, an online tool that enables users to find, access, and analyze the data available through multiple public archives. By leveraging grid technologies -- including the Globus Toolkit, NCSA's MyProxy, and PURSe (Portal-based User Registration System) -- NCSA and developers from Argonne National Laboratory and the National Virtual Observatory project have simplified many processes for users.
In July, the LSST project held its first Data Challenge, a test designed to evaluate the development of the data pipeline. The challenge provided feedback for the team and helped the collaborators further refine the requirements of the LSST pipeline.
View the multimedia presentation