SI
SI
discoversearch

We've detected that you're using an ad content blocking browser plug-in or feature. Ads provide a critical source of revenue to the continued operation of Silicon Investor.  We ask that you disable ad blocking while on Silicon Investor in the best interests of our community.  If you are not using an ad blocker but are still receiving this message, make sure your browser's tracking protection is set to the 'standard' level.
Biotech / Medical : Bioinformatics

 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext  
From: tuck12/16/2004 10:43:01 AM
   of 80
 
[Assessing Chance Correlation and Prediction Confidence Using "Decision Forest"]

>>Environ Health Perspect. 2004 Nov;112(16):1622-7.

Using Decision Forest to Classify Prostate Cancer Samples on the Basis of SELDI-TOF MS Data: Assessing Chance Correlation and Prediction Confidence.

Tong W, Xie Q, Hong H, Shi L, Fang H, Perkins R, Petricoin EF.

Center for Toxicoinformatics, Division of Biometry and Risk Assessment, National Center for Toxicological Research, Food and Drug Administration, Jefferson, Arkansas, USA.

Class prediction using "omics" data is playing an increasing role in toxicogenomics, diagnosis/prognosis, and risk assessment. These data are usually noisy and represented by relatively few samples and a very large number of predictor variables (e.g., genes of DNA microarray data or (Italic)m(/Italic)/(Italic)z(/Italic) peaks of mass spectrometry data). These characteristics manifest the importance of assessing potential random correlation and overfitting of noise for a classification model based on omics data. We present a novel classification method, decision forest (DF), for class prediction using omics data. DF combines the results of multiple heterogeneous but comparable decision tree (DT) models to produce a consensus prediction. The method is less prone to overfitting of noise and chance correlation. A DF model was developed to predict presence of prostate cancer using a proteomic data set generated from surface-enhanced laser deposition/ionization time-of-flight mass spectrometry (SELDI-TOF MS). The degree of chance correlation and prediction confidence of the model was rigorously assessed by extensive cross-validation and randomization testing. Comparison of model prediction with imposed random correlation demonstrated biologic relevance of the model and the reduction of overfitting in DF. Furthermore, two confidence levels (high and low confidences) were assigned to each prediction, where most misclassifications were associated with the low-confidence region. For the high-confidence prediction, the model achieved 99.2% sensitivity and 98.2% specificity. The model also identified a list of significant peaks that could be useful for biomarker identification. DF should be equally applicable to other omics data such as gene expression data or metabolomic data. The DF algorithm is available upon request. (Italic)Key words: (/Italic)bioinformatics, chance correlation, class prediction, classification, decision forest, prediction confidence, prostate cancer, proteomics, SELDI-TOF. (Italic)Environ Health Perspect (/Italic)112:1622-1627 (2004). [Online 5 August 2004].<<

There's a joke here somewhere . . . about not seeing the trees for the forest?

Cheers, Tuck
Report TOU ViolationShare This Post
 Public ReplyPrvt ReplyMark as Last ReadFilePrevious 10Next 10PreviousNext