NCSA Home
Contact Us | Intranet | Search

data link Story: Alliance on Track to Enhance Services

News
datalink
0005
Current issue
Archives

Editor's Note: The Alliance is currently developing and deploying technologies to build and support the National Technology Grid. The three thrust activities involve collaborations with other groups establishing similar grid environments. The Alliance is working with NPACI, NASA IPG, and Argonne National Lab to increase the services available to the national high-performance user community. The next three issues of data link will give background information about the thrust areas: Virtual Machine Room in May, the user portal in June, and the Access Grid in July.

"Our goal is to help researchers manage their science,
not the resources they use to do their science."
--John Towns
NCSA Division Director for Scientific Computing
April 2000

Alliance on Track to Enhance Services

Virtual Machine Room

NCSA and its Alliance partners are currently deploying the infrastructure necessary to establish the Alliance Virtual Machines Room (VMR). The goal is to make the many resources available at Alliance sites appear to be managed in a single coherent fashion as if they were co-located and under the direct management of a single center. The VMR consists of six sites – from Boston to Maui – with 24x7 operations and resource and usage monitoring. Distributed research environments like the VMR will likely emerge as the collaborative spaces in which science and interdisciplinary research will be conducted in the future.

Grid Security Infrastructure

One of the essential pieces to enable use of a wide variety of resources in a distributed environment is solid security that has mechanisms permitting access to all resources. The Alliance is participating in the development of and building on the Globus project's Grid Security Infrastructure (GSI) package. The authentication mechanisms of choice is the Public Key Infrastructure (PKI). An Alliance certificate authority, which authorizes access to Alliance systems, is already providing limited production services.

Layered on top of this are some basic utilities necessary for VMR users. The first is a mechanism for users to manage their certificates, which function as passports to Alliance resources. A significant part of this is simply providing adequate documentation that is currently under development. Also necessary is deployment of server and client software for FTP (File Transfer Protocol) and ssh (secure shell) on VMR systems. Versions of this software have been identified but they require additional development.

Allocation and Account Management

A set of base infrastructure services are needed to support the management of allocations and individual user accounts within the VMR. One example is the need to track the usage of various resources by individual users. PIs also need to review users on their projects and check usage against the allocation total. This involves a considerable amount of work in identifying allocated VMR users in this distributed set of resources and reporting their usage appropriately.

More technically difficult is account creation on remote systems through a centralized mechanism. Within the PACI program, as in many others, resources are allocated on a peer review basis. Allocations to a VMR require a centralized process by which allocations are awarded and access provided.

VMR Operations

Just as any machine room requires a variety of operational support services, the VMR requires an integrated set of support services from all of the member sites:
  • shared help desk
  • common software installed
  • shared policies and procedures
In order for the VMR to appear as a single integrated computing environment, a single entry point for users and operators to report problems or pose questions is needed. Creating a VMR help desk based on a new NCSA-developed, web-based ticket system will give users and staff a single point to report problems and give VMR triage staff a more simple way to route problem tickets to the appropriate site for resolution. Using a common ticket system gives the Alliance a way to centrally track the status of all problems related to the VMR.

Having each participating site in the VMR running a suite of common infrastructure software is also critical to smooth operations. Under development is a small database that collect and communicate the current list of infrastructure software needed at each VMR site. This database will eventually migrate to an Alliance RIB (Repository in a Box) framework running on top of Sybase. Part of this database will be organized and available to system administrators; another part will have information about software for end users.

Centers participating in the VMR have, over time, developed their own policies, processes, and specialized software to ensure a smooth- running environment on behalf of their users. The Alliance challenge is to integrate these various practices to establish the VMR and make it work for users and staff. Working toward integration first consists of accessing and understanding site-specific system documentation about local sites' practices. Next comes the development of processes and practices for coordinated operational activities. This will form the basis of a prototype set of policies, procedures, and tools for managing such distributed environments.

Storage

As is the case with nearly all developing grid environments, easy access to data is one of the most important components to users. Efforts are initially focused on remote access to various mass store systems that exist within the partner sites. The first step will be providing remote access to NCSA's mass store. A single, command line interface that may be used on any of the VMR systems to access and use mass store systems at any of the partner sites is under development. GASS (Globus Access to Secondary Storage) software will be used for data movement around the VMR. Although GASS is not fully developed, Alliance staff are collaborating with Globus team to harden this infrastructure software.

Currently, NCSA's mass store system is based on the UniTree software from UniTree, Inc. As part of this long-term project, we also plan to explore the capabilities of a product called Distributed UniTree, which would allow for remote data servers with local caches at partner sites to be integrated with potentially multiple archives sites within a single filename space.


--John Towns and Ginny Hudak-David