Folk Addresses the HDF Project Approach to Software Engineering
By Herbert Morgan, NCSA
Storing data in ever-increasing amounts, processing and analyzing it, and accessing it efficiently are crucial tasks—and crucial goals of the Hierarchical Data Format (HDF) Project at NCSA. The HDF project involves developing and supporting software and file formats for scientific data management. The HDF software includes I/O libraries and tools for analyzing, visualizing, and converting scientific data. HDF's users range far and wide, throughout academia, industry and government; NASA alone estimates that it has 1.6 million users for its HDF-based products.
Engagement with users and a solid understanding of their needs is key to the success of the HDF Group, according to its leader, Mike Folk. During a talk about HDF's approach to software engineering at NCSA in October, he described a number of factors that contributed to that success. In addition to strong, responsible, and continuing relationships with users, he cited HDF's approach to needs identification, software design, and software implementation based on sound principles of software engineering, combined with effective technical processes for developing, testing, integrating, and maintaining software.
"Our user community is really important," says Folk, "and we consider them part of our team."
Folk said that when it comes to measuring success among HDF's users, they look not only at the number of happy/unhappy users, but also at the level of user satisfaction. "When you have funding," says Folk, "the users who are paying you are important, and [they] need to be important." The group conducts an annual workshop with NASA, HDF's largest sponsor, from whom they receive about $800,000 annually. The value of the workshop lies not only in its ability to let the group discover where its user's frustrations are, but in the opportunity to find information about the user's newest needs.
The Testing Challenge
Testing is a major part of the HDF group's responsibilities. Supporting the languages that users require can increase the amount of testing that needs to be done. There are different machines and operating systems (Solaris 2.7, 2.8; IRIX6.5/64 6.5; HPUX 11.00; AIX 5.1; OSF1; FreeBSD; Linux SUSE, RH8, RH9; Altix; Crays T3E, SV1, T90IEEE; Linux clusters; DOE Labs machines; Windows 2000, XP; and Mac OS X), different compilers, serial, parallel, etc. This diversity leads to a testing challenge. To drive home this point, Folk presented the following equation: machines x operating systems x compilers x languages x serial and parallel x compression options x configuration options x virtual file options x backward compatibility = a very large number. "We are not talking about how many features do you test," says Folk, "but how many architectural environments do you have to think about." The group conducts both weekly and daily tests on the major software packages, distributing them on various machines at NCSA and elsewhere. In itself, this is a major effort. Consider the breakdown of HDF5 component categories that Folk presented to his audience:
- Documentation: 33%
- Libraries: 30%
- Library tests: 13%
- Tools: 4%
- Tools tests: 4%
- Configuration: 15%
- Examples: 1%
By Folk's admittedly rough estimate, HDF5 source code distribution alone consists of more than 2,000 files, close to a million lines of code. The entire HDF project contains three million lines of code.
Software Development
Last but not least, the development process requires careful planning. According to the example that Folk presented, an HDF project plan may consist of around 600 tasks. A successful plan requires the group to start by discovering a need, identifying a sponsor, entering tasks into the project plan, estimating time and resources, assigning a priority to each task, and identifying a lead and team member(s).
To ensure a timely release, Folk emphasized that the priority of each task is rigorously evaluated to determine whether its completion is necessary to a particular software release, or whether it is merely desirable and can be delayed until the next release. Prioritizing tasks is vital. So much so that the HDF group developed four different types of priorities in which to categorize their tasks.
The group then writes two types of requests for comments (RFCs): one for needs and one for design. They actively seek feedback from developers and sponsors and make revisions until all are satisfied. The project plan is revised according to the RFC results, and the RFCs are archived for future reference. Only after of all of these steps are accomplished does the group move into the implementation and maintenance phases of the project.
This fervent attention to planning is what makes the HDF Group, which consists of 15 full-time employees and three to five students, so successful.