An Interview with Argonne's Steve Tuecke
by Alan Beck, Editor-in-Chief
HPCwire 06.14.02, reprinted by permission
Steve Tuecke, the lead Software Architect in the Distributed Systems Laboratory, at Argonne National Laboratory, spoke with HPCwire about the issues and challenges relating to Grid development and technology.
HPCwire: What technical challenges to Grid development have you found most persistent and intractable? How have you approached them, and what results have appeared most encouraging?
Tuecke: I believe what defines the Grid, apart from other forms of distributed computing, is resource sharing across organizational boundaries, creating what we call "virtual organizations." In an often somewhat limited form, such boundaries may occur within a single enterprise (i.e., an intra-Grid), such as between departments within a company, where a primary source of heterogeneity may be policy on the use of resources. But in its full grandeur, the boundaries may occur across multiple institutions, such as we are seeing in the numerous large Grid deployments within the scientific community, where heterogeneity is not only in policy, but also in the significantly different capabilities of the technologies employed by each of those institutions.
The challenge is to define common, interoperable protocols without simply resorting to least-common-denominator designs that are unable to exploit the capabilities of the various technologies employed by the organizations. It is critical to the success of the Grid that there be a small number of ubiquitously available, interoperable Grid protocols. Without such protocols, growth of the Grid will be hampered by the need for an organization to potentially deploy different Grid infrastructures for each virtual organization that the organization wishes to join, as opposed to deploying just a single infrastructure which is simply configured with different policies for that organization's participation in each virtual organization.
Our approach is to define protocols that factor out as much commonality as possible from the various technologies, yet allow controlled extensibility of the protocol to allow exploitation of the unique features of each implementation. The difficulty is in finding the right balance between being proscriptive and extensible. If you lean too far toward proscriptive, it is difficult to allow the exploitation of the unique, and often important, features of a particular implementation. If you lean too far toward extensibility, such that most features are invoked through implementation-specific extensions, then interoperability is hindered.
We are very encouraged in this area by the successful use of GRAM, the Globus Toolkit's protocol for job submission and management, in very heterogeneous environments. On the server side, the Globus Toolkit's GRAM implementation interfaces to about a dozen different local scheduling systems, including LSF, Condor, PBS, Grid Engine, etc. While there is considerable commonality amongst these systems, there are also significant, unique features of each. Likewise, on the client side, various different job managers and brokers use GRAM in a variety of distinct ways to talk to these various scheduling systems. Yet, despite its successful use in large, heterogeneous Grid deployments, GRAM also provides a good lesson on the difficulties in striking the right balance between being proscriptive and extensible. Through our experiences in deploying large-scale, production Grid infrastructure and applications, we have found that GRAM still errs on the side of proscriptive, so we are in the midst of improving its extensibility to better accommodate local scheduling system capabilities.
HPCwire: What is the current status of Grid standards? How can the HPC community best effect standards that are sensible, efficient, and widely acceptable?
Tuecke: The open source Globus Toolkit has emerged as the de facto standard software for deploying Grid infrastructure and building Grid applications that can exploit this infrastructure. It addresses fundamental issues of security; resource monitoring, discovery, and management; and large-scale data transport and management. Along with this acceptance of the Globus Toolkit by the Grid community has come considerable consensus on the overall model, and the nature of the protocols required to build the Grid, as described in "The Anatomy of the Grid: Enabling Scalable Virtual Organizations" [Foster, Kesselman, Tuecke, International Journal of High Performance Computing Applications, 15(3), 2001]. Work has proceeded during the last couple years in the Global Grid Forum (GGF) to concisely document and standardize various Globus Toolkit protocols, including X.509 Proxy Certificates for delegated security, LDAP for resource discovery, and GridFTP for data transport.
Meanwhile, Web services (e.g., WSDL, SOAP, etc.) has been emerging as a ubiquitous basis for constructing messaging protocols between application services and clients. Beginning about a year ago, the Globus Project began work on refactoring its current Grid model and protocols to exploit Web services—that is, preserve the nature of the interactions used in today's Globus Toolkit, but retarget them to exploit Web services. This work gained considerable momentum during Fall 2001 when IBM began collaborating with us on this effort, leading to the joint announcement of the Open Grid Services Architecture (OGSA) and initial GGF drafts for OGSA in February 2002. Numerous vendors, including IBM, Platform Computing, Entropia, United Devices, and Avaki have endorsed OGSA. The OGSA specifications will be developed and standardized within the Global Grid Forum, and will exploit Web services and other standards defined in GGF and other standards organizations such as the W3C, IETF, and Oasis.
The HPC community can best effect Grid standards by getting involved in these GGF activities. In March the first OGSA-related working group was formed—the Open Grid Services Infrastructure (OGSI) working group, of which I am co-chair with Jeff Frey of IBM—to develop the core specifications of OGSA (see http://www.gridforum.org/ogsi-wg/). We expect other OGSA-related working groups to be proposed at GGF5 in July and for some other existing GGF working groups to shift their focus toward OGSA.
HPCwire: Do you believe that the Globus Toolkit will remain the leading Grid middleware toolkit for Grid programming for the foreseeable future? Why or why not?
Tuecke: The Globus Toolkit currently defines both a set of interoperable Grid protocols and an open source reference implementation of those protocols. With OGSA, we are making this distinction and separation much clearer. As described above, indications are that OGSA (which is heavily influenced by the current Globus Toolkit protocols) is on track to becoming the new standard for Grid protocols. The Globus Toolkit will, in turn, continue to evolve to support the OGSA protocols.
We will hopefully see multiple, independent, interoperable implementations of the OGSA protocols. Nonetheless, I believe the Globus Toolkit will remain a leading Grid middleware toolkit for three reasons. First, we already have a large installed base and considerable mind share as a provider of high-quality, open source Grid middleware. Second, the availability of commercial service and support for the Globus Toolkit from Platform Computing, IBM, and others makes the Globus Toolkit viable for a whole new set of users and should even further improve the quality of the software as they increasingly make contributions to the open source base. Third, as the Linux and Apache phenomenons have shown, there is a large market for open source software, and this market can peacefully coexist, and even be synergistic with, commercial implementations of similar products. So I expect to see the Globus Toolkit continue to grow in adoption, while at the same time I hope to see a vibrant market for independent, interoperable, commercial implementations of OGSA.
HPCwire: What will Grid technology look like in five years?
Tuecke: To the average user, Grid technology will be invisible. It will be part of the common, ubiquitous infrastructure that is used by a myriad of applications. To a user, the Grid will look like a B2B application, or a scientific portal, or a job management system for a supercomputing center, or an online multiplayer game.
Under the covers, Grid technology in five years will be based on a small set of standard, interoperable protocols, much like the Internet and Web are today. Of course, I think the protocols will come from OGSA. There will hopefully be multiple implementations, some of which ship as a standard feature with every platform, and some of which are sold based on differentiators such as the ability to deliver higher qualities of service.
HPCwire: How has your selection as one of Technology Review's 100 Top Young Innovators changed your perspectives and plans?
Tuecke: I am honored to have been recognized by Technology Review for my work over the last seven years on Grid computing. However, my plans have not changed. My immodest goal all along with the Globus Project has been to change the way that computing and other resources are used. We have been working on this mission since before it was called "Grid" computing, and it is amazing to see this vision starting to become a reality. I continue to believe I am in the best place (Argonne National Laboratory) and best role (lead software architect) to make this vision a reality. I see this award as an affirmation of the hard work and decisions that I, and the rest of the Globus Project team, have made to get this far.
HPCwire: Is there anythimg else you would like our readers to know?
Tuecke: The journey is far from over. OGSA is a big step in the right direction, toward standard, interoperable Grid protocols. But the Grid marriage with Web services is only just one step. As I said at the start of this interview, the heart of the Grid problem is resource sharing—that is, resource monitoring, discovery, and management, with the goal of dynamic provisioning of resources, to run real applications with the required qualities of service. The use of Web services will hopefully allow us to stop worrying about some of the lower level messaging details and instead focus on these hard resource sharing problems. We have several layers to go in the protocol stack above Web services before we can call the Grid problem solved. And it is going to take a lot of hard work by a lot of smart people to get there.
Copyright 1993-2002 HPCwire. Redistribution of this article is forbidden by law without the expressed written consent of the publisher. For a free trial subscription to HPCwire, email trial@hpcwire.tgc.com.