To: tuck who wrote (199 ) 2/29/2004 4:59:33 PM From: tuck Read Replies (1) | Respond to of 510 More from Baggerly's group regarding reproducibility: >>Proteomics. 2003 Sep;3(9):1667-72. A comprehensive approach to the analysis of matrix-assisted laser desorption/ionization-time of flight proteomics spectra from serum samples. Baggerly KA, Morris JS, Wang J, Gold D, Xiao LC, Coombes KR. Department of Biostatistics, UT M.D. Anderson Cancer Center, Houston, TX 77030, USA. kabagg@mdanderson.org For our analysis of the data from the First Annual Proteomics Data Mining Conference, we attempted to discriminate between 24 disease spectra (group A) and 17 normal spectra (group B). First, we processed the raw spectra by (i) correcting for additive sinusoidal noise (periodic on the time scale) affecting most spectra, (ii) correcting for the overall baseline level, (iii) normalizing, (iv) recombining fractions, and (v) using variable-width windows for data reduction. Also, we identified a set of polymeric peaks (at multiples of 180.6 Da) that is present in several normal spectra (B1-B8). After data processing, we found the intensities at the following mass to charge (m/z) values to be useful discriminators: 3077, 12 886 and 74 263. Using these values, we were able to achieve an overall classification accuracy of 38/41 (92.6%). Perfect classification could be achieved by adding two additional peaks, at 2476 and 6955. We identified these values by applying a genetic algorithm to a filtered list of m/z values using Mahalanobis distance between the group means as a fitness function.<< >>Clin Chem. 2003 Oct;49(10):1615-23. Quality control and peak finding for proteomics data collected from nipple aspirate fluid by surface-enhanced laser desorption and ionization. Coombes KR, Fritsche HA Jr, Clarke C, Chen JN, Baggerly KA, Morris JS, Xiao LC, Hung MC, Kuerer HM. Department of Biostatistics, University of Texas M. D. Anderson Cancer Center, 1515 Holcombe Blvd., Box 447, Houston TX 77030, USA. krc@odin.mdacc.tmc.edu BACKGROUND: Recently, researchers have been using mass spectroscopy to study cancer. For use of proteomics spectra in a clinical setting, stringent quality-control procedures will be needed. METHODS: We pooled samples of nipple aspirate fluid from healthy breasts and breasts with cancer to prepare a control sample. Aliquots of the control sample were used on two spots on each of three IMAC ProteinChip arrays (Ciphergen Biosystems, Inc.) on 4 successive days to generate 24 SELDI spectra. In 36 subsequent experiments, the control sample was applied to two spots of each ProteinChip array, and the resulting spectra were analyzed to determine how closely they agreed with the original 24 spectra. RESULTS: We describe novel algorithms that (a) locate peaks in unprocessed proteomics spectra and (b) iteratively combine peak detection with baseline correction. These algorithms detected approximately 200 peaks per spectrum, 68 of which are detected in all 24 original spectra. The peaks were highly correlated across samples. Moreover, we could explain 80% of the variance, using only six principal components. Using a criterion that rejects a chip if the Mahalanobis distance from both control spectra to the center of the six-dimensional principal component space exceeds the 95% confidence limit threshold, we rejected 5 of the 36 chips. CONCLUSIONS: Mahalanobis distance in principal component space provides a method for assessing the reproducibility of proteomics spectra that is robust, effective, easily computed, and statistically sound.<< Cheers, Tuck