Editor's Note: The Alliance is currently developing and deploying
technologies to build and support the National
Technology Grid. The three thrust activities
involve collaborations with other groups
establishing similar grid environments. The
Alliance is working with NPACI, NASA IPG, and
Argonne National Lab to increase the services
available to the national high-performance user
community. The next three issues of data link will give
background information about the thrust areas:
Virtual Machine Room in May, the user portal in
June, and the Access Grid in July.
"Our goal is to help researchers manage their science,
not the resources they use to do their science."
--John Towns
NCSA Division Director for Scientific Computing
April 2000
Alliance on Track to Enhance Services
Virtual Machine Room
NCSA and its Alliance partners are currently deploying the
infrastructure necessary to establish the Alliance Virtual Machines
Room (VMR). The goal is to make the many resources available at
Alliance sites appear to be managed in a single coherent fashion as if
they were co-located and under the direct management of a single
center. The VMR consists of six sites –
from Boston to Maui
– with
24x7 operations and resource and usage monitoring. Distributed
research environments like the VMR will likely emerge as the
collaborative spaces in which science and interdisciplinary research
will be conducted in the future.
Grid Security Infrastructure
One of the essential pieces to enable use of a wide variety of resources
in a distributed environment is solid security that has mechanisms
permitting access to all resources. The Alliance is participating in the
development of and building on the Globus project's Grid Security
Infrastructure
(
GSI) package.
The authentication mechanisms of choice
is the Public Key Infrastructure
(
PKI).
An Alliance certificate authority,
which authorizes access to Alliance systems, is already providing
limited production services.
Layered on top of this are some basic utilities necessary for VMR
users. The first is a mechanism for users to manage their certificates,
which function as passports to Alliance resources. A significant part of
this is simply providing adequate documentation that is currently under
development. Also necessary is deployment of server and client
software for FTP (File Transfer Protocol) and ssh (secure shell) on
VMR systems. Versions of this software have been identified but they
require additional development.
Allocation and Account Management
A set of base infrastructure services are needed to support the
management of allocations and individual user accounts within the
VMR. One example is the need to track the usage of various resources
by individual users. PIs also need to review users on their projects and
check usage against the allocation total. This involves a considerable
amount of work in identifying allocated VMR users in this distributed
set of resources and reporting their usage appropriately.
More technically difficult is account creation on remote systems
through a centralized mechanism. Within the PACI program, as in
many others, resources are
allocated on a peer
review basis. Allocations
to a VMR require a centralized process by which allocations are
awarded and access provided.
VMR Operations
Just as any machine room requires a variety of operational support
services, the VMR requires an integrated set of support services from
all of the member sites:
- shared help desk
- common software installed
- shared policies and procedures
In order for the VMR to appear as a single integrated computing
environment, a single entry point for users and operators to report
problems or pose questions is needed. Creating a VMR help desk based
on a new NCSA-developed, web-based ticket system will give users
and staff a single point to report problems and give VMR triage staff a
more simple way to route problem tickets to the appropriate site for
resolution. Using a common ticket system gives the Alliance a way to
centrally track the status of all problems related to the VMR.
Having each participating site in the VMR running a suite of common
infrastructure software is also critical to smooth operations. Under
development is a small database that collect and communicate the
current list of infrastructure software needed at each VMR site. This
database will eventually migrate to an Alliance RIB
(Repository in a
Box) framework running on top of Sybase. Part of this database will be
organized and available to system administrators; another part will have
information about software for end users.
Centers participating in the VMR have, over time, developed their own
policies, processes, and specialized software to ensure a smooth-
running environment on behalf of their users. The Alliance challenge is
to integrate these various practices to establish the VMR and make it
work for users and staff. Working toward integration first consists of
accessing and understanding site-specific system documentation about
local sites' practices. Next comes the development of processes and
practices for coordinated operational activities. This will form the basis
of a prototype set of policies, procedures, and tools for managing such
distributed environments.
Storage
As is the case with nearly all developing grid environments, easy
access to data is one of the most important components to users. Efforts
are initially focused on remote access to various mass store systems
that exist within the partner sites. The first step will be providing
remote access to NCSA's mass store. A single, command line interface
that may be used on any of the VMR systems to access and use mass
store systems at any of the partner sites is under development.
GASS
(Globus Access to Secondary Storage) software will be used for data
movement around the VMR. Although GASS is not fully developed,
Alliance staff are collaborating with Globus team to harden this
infrastructure software.
Currently,
NCSA's mass store
system is based on the UniTree software
from UniTree, Inc.
As part of this long-term project, we also plan to
explore the capabilities of a product called Distributed UniTree, which
would allow for remote data servers with local caches at partner sites
to be integrated with potentially multiple archives sites within a single
filename space.
--John Towns and Ginny Hudak-David