NCSA Home
Contact Us | Intranet | Search

data link Story: Archive Storage Trends And Storage Strategies At NCSA

News
datalink
0004
Current issue
Archives

Archive Storage Trends
And Storage Strategies At NCSA

What a difference fifteen years makes!

No user data were stored when NCSA opened its doors in 1985. Over the last decade and a half, however, the amount of data stored has grown from zero to 150 terabytes (TB). The center initially used Hyper- channel connectivity at a then good speed of greater than 2.5 kilobytes per second. Now the center moves data around at lightening speeds of 40 megabytes per second.

Volume Discount

The volume of stored files NCSA manages increases every year. For the last several years, the center has annually doubled its archive of files. In addition the average file size is increasing. With larger and faster systems, research scientists are producing much more data and doing it more frequently. Next year staff in NCSA's High Performance Data Management (HPDM) group, which administers the center's storage facilities, are predicting that they will be overseeing more than 300 TB of stored files in the NCSA archive.

The storage habits of NCSA users are fairly stable according to analysis performed by HPDM staff. As files grow older, the rate of access plummets. Staff know that many stored files are never retrieved. The statistics are interesting: during the first three months files live in storage, only 33% are ever retrieved. The percentage retrieved drops to 18% in the next six months, and a mere three percent of the files are ever accessed after one year in storage! HPDM staff say that these numbers have not changed a great deal in the last decade.

Weighing Options

Given the statistics on file retrieval over time, what is NCSA's strategy for the future? The main considerations in designing the next storage system are cost and speed.

Tape prices are quite inexpensive. But the real price tag for a tape system must include the expenses for all the system components. These include the physical library of 60,000+ tapes, duplicate copies maintained on tape as a fail-safe, and the reliable fast drives needed to automate the tape system. The cost of a tape system moves closer to the price of disks when these components are included. As more automation and extremely fast tape drives are purchased, the gap narrows even more. And, because tapes are easier to destroy, replacement expenses also need to be included in the total price.

HPDM staff expect tape drive transfer speeds to continue to get faster but, understandably, tapes will never be able to keep up with the speed of disks. Naturally, users prefer that NCSA transfer rates fall into the speed-of-disk category. Tapes, however, will probably always be slower than disk unless follow-on technologies such as RAIT become more stable and reliable. RAIT (redundant array of inexpensive tapes) - - as opposed to RAID (redundant array of inexpensive disks) -- is a hard technology to make happen in practice.

Ideas for the Future

Because of the substantial difference observed in file retrieval rates over time, HPDM staff are evaluating different technologies for the first few months of a file's life in the archive. Storage area networks (SAN) are high-speed special-purpose networks that interconnect different kinds of data storage devices with data servers. Staff are investigating technologies to establish a SAN fast-disk archive for use during the first six months a file is in storage, when retrieval odds are highest. After that point, files would be archived to less expensive commodity tapes. Users would then experience a delay for file retrieval with data over six months old.

Where is NCSA on this path? SAN technologies have been shown to perform well when connected to a single type of system, which is referred to as disk sharing. In disk sharing, each system gets its own piece of disk. The systems do not share files, however, which is referred to as data sharing. NCSA wants its SAN implementation to be data sharing, not just disk sharing. But making such a file system shareable across different platforms is another story. SAN vendors consider this a real challenge. Most are not ready to begin implementation, and some are still in the evaluation process.

At NCSA, development is in the initial planning stages and constitutes only one option for the future. SAN technology is very young and has great potential to doing exactly what NCSA needs. HPDM staff are looking at SAN technology to work initially on the heterogeneous cluster systems. As the technology matures, other environments can be added.

Other archive storage strategies are being considered including distributed disk servers to parallelize the data I/O of the archive system. Distributed disk servers could be implemented in a wide-area fashion, bringing in other Alliance sites. HPDM staff are also examining the idea of file system "managed" by an archive server. In this case, a remote file system would have a preset percentage of utilization. When the utilization level for the that disk was reached, data would automatically flow off to tape.

Checking a Crystal Ball

What kind of mass storage system will you see at the Alliance in a few years? The systems described above are currently the front runners but industry developments and research initiatives continue to keep this a dynamic question. NCSA's HPDM staff are not making their predictions public at this point!


--Michelle Butler and Ginny Hudak-David