Screaming through samples, screaming for Condor

With the new mass spectrometer online, the Kelleher team is eschewing tradition and using their top-down approach. Kelleher, then a PhD candidate, was part of the Cornell University team that first proposed the method in a 1999 Journal of the American Chemical Society article.

The Fourier-transform mass spectrometer used by the Kelleher team. This instrument includes a 9.4 tesla superconducting magnet and is one of only a half dozen such instruments in the world.

Inside the mass spectrometer, an infrared laser cuts each protein ion into two pieces (in almost every case). Predicted and derived masses are compared as they are in traditional methods. Because the fragments are much larger and, most importantly, because the fragments include one of the protein sequence's terminal ends, it is easier to determine what modifications have taken place where. By comparing data gathered from cutting different copies of the same protein in different places, the team can characterize multiple modifications more readily and do so with markedly better accuracy.

The Kelleher team first used the system to study the proteins of Methanococcus jannaschii, an autotrophic bacteria that lives at the bottom of the ocean, and Saccharomyces cerevisiae, also known as baker's yeast. Results were published in 2002 in the journals Nature Biotechnology and Analytical Chemistry, respectively.

Today, the targets are proteins from human cells. The team hopes to process some 100 million cells, identifying and characterizing the modifications of any protein that occurs more than 1,000 times in each cell and is less than 600 amino acids long. A Web portal, called ProSightPTM, serves as a clearinghouse for the data and provides tools for others doing protein analysis.

ProSightPTM website.
(click here to visit website)

Once it's running at full capacity, the team's mass spectrometer will create about one gigabyte of data per day and will operate 24-7. With numbers like that, traditional methods of converting the newly produced data into masses and equating those masses to particular proteins and protein fragments are not an option.

The calculations that go into the analysis of an individual sample are not intense or time-consuming. "You can do it on a fast desktop in less than 10 minutes," says Brooks. "But, by the end of the year, they're going to be producing five to seven hundred datasets a day…At that rate, you're looking at one to four days of computing time if the calculations are run serially."

Rather than letting all that excess mass spectrometer capacity go to waste, Kelleher and company teamed up with NCSA to port the analysis software, called THRASH, to run on the Alliance's Condor system, which pools idle time on desktop systems to allow for high-throughput computing. One of Kelleher's graduate students, Jeff Johnson, completed part of the work--working with Brooks and Peter Andrews of Eastern Illinois University--in a matter of days.

"It was a problem that just screamed out for Condor," says Brooks. "You don't need a huge amount of memory. The jobs have short run times and are easily crunched. What you want here is not one computer that can crunch one big problem, but a lot of computers that can crunch lots of little ones."

As a result of this natural fit, the portion of the analysis that converts the raw mass spectrometric data into database-ready queries is screaming along on Condor. With some 200 processors working in tandem, analyses that might have taken days to complete are now finished in 30 minutes. Other portions of the analysis process are likely to be moved to Condor in the future, according to Kelleher and Brooks, providing plenty of opportunities for interaction between NCSA and Kelleher's rapidly maturing group.

This research is supported by the Packard Foundation, the Burroughs-Wellcome Fund, the Chicago Community Trust's Searle Scholars Program, the Research Corporation's Cotrell Scholars Award, the University of Illinois at Urbana-Champaign, and the Sloan Foundation.

Team members

Andrew Birck
Ian Brooks
Yi Du
Jonathan T. Ferguson
Andy Forbes
Nicole Friel
Leslie M. Hicks
Lihua Jiang
Yong-Bin Kim
Neil Kelleher
Ryan McCarthy
Fanyu Meng
Leah Miller
Steven Patrie
Dana Robinson
Michael Roth
Gregory K. Taylor
Dan Wright