New data algorithm can identify twice as many proteins in mass spectrometry data

Written by Cameron Low, Future Science Group

A businessman holding a digital gear icon symbolizing the ETL process in data management, surrounded by related technology icons.

A team of researchers from the University of California, San Diego (CA, USA), the Lunenfeld-Tanenbaum Research Institute (Canada) and SCIEX (MA, USA) have developed a method that can identify up to twice as many proteins and peptides in mass spectrometry data compared with conventional approaches. The method’s improved performance is owing to its ability to compare data to spectral libraries, rather than individual spectra or a database of sequences. Their findings were recently published in Nature Methods.

The advance is important as many research teams now perform data-independent acquisitions, which create a vast amount of raw data, when compared with running data analyses of a few elements at random. The increase in the amount of data collected has led to a bottleneck for existing computational tools. Nuno Bandeira, the study’s senior author of UC San Diego, commented: “We needed a new data analysis method.”

Bandeira’s research group focuses on creating spectral network algorithms, which analyze pairs of overlapping peptide spectra produced during MS experiments. The spectra are produced when an enzyme breaks down a protein into its subcomponents, including peptides. The algorithms detect patterns that the pairs have in common and then searchers for these patterns in other spectra. This method speeds up the process of indentifying peptides, thus proteins, considerably, as traditional methods compare spectra against databases or against already identified spectra.

The researchers demonstrated that they were capable of identifying twice as many peptides in human samples, compared with traditional methods. Additionally, they observed 40% more protein–protein interactions. “The results are more stable and easier to reproduce,” Bandeira added.

The next steps involve speeding up the process, which takes approximately twice as long as traditional methods and fine-tuning the method for the next generation of mass spectrometers. The team also wants to see if they can capitalize on the technique to analyze data cohorts.

Sources: This New Method Identifies Up to Twice as Many Proteins and Peptides in Mass Spectrometry Data; Wang J, Tucholska M, Knight JD et al. MSPLIT-DIA: sensitive peptide identification for data-independent acquisition. Nature Methods. doi:10.1038/nmeth.3655 (Epub ahead of print) (2015).