The differentially expressed gene selection algorithms for unbalancedgene datasets by maximize the area under ROC

XIE Juanying, WANG Mingzhao, HU Qiufeng

PDF(2809 KB)
Welcome to visit Journal of Shaanxi Normal University(Natural Science Edition)!
Journal of Shaanxi Normal University(Natural Science Edition) ›› 2017, Vol. 45 ›› Issue (1) : 13-22.

The differentially expressed gene selection algorithms for unbalancedgene datasets by maximize the area under ROC

  • XIE Juanying*, WANG Mingzhao, HU Qiufeng
Author information +
History +

Abstract

ARCO(AUC and rank correlation coefficient optimization) algorithm may cause information loss when it values the redundancy of selected features in Spearman′s correlation coefficient, and the ranges of ARCO are different for evaluating the correlation of features to classification and redundancy between features. To overcome these shortcomings of ARCO, it is proposed the revised Pearson correlation coefficient to assess the correlation between features, and uniformed the ranges of correlation and redundancy, then it is got the APCO(AUC and improved Pearson correlation coefficient optimization) algorithm. Both MAUCD (using MAUC as the relevance metric to rank features directly) and MDFS (MAUC decomposition based feature selection method) algorithms for features selection of multiclass problems do not consider the redundancy between features, and furthermore MDFS easily converges to the locally optimal solution of the differentially expressed genes. To avoid the deficiencies of MAUCD and MDFS algorithms, it is proposed to measure the redundancy of features in Pearson coefficient revised by us for multiclass problems, and the MAUCP and MDFSP algorithms based on the framework of mRMR (maximal relevance-minimal redundancy). SVM, NB and KNN classifiers are adopted as the classification tools, and AUC (or MAUC for multiclass classification problems) is used to assess the performance of the classifiers built on the selected feature subsets. Experimental results on seven two-class unbalanced gene datasets and three multi-class unbalanced gene datasets demonstrate that the proposed APCO, MAUCP and MDFSP algorithms are superior to the original algorithms including ARCO, MAUCD and MDFS, and outperform others classic gene selection algorithms.

Key words

gene selection / differentially expressed genes / AUC / mRMR / unbalanced datasets

Cite this article

Download Citations
XIE Juanying, WANG Mingzhao, HU Qiufeng. The differentially expressed gene selection algorithms for unbalancedgene datasets by maximize the area under ROC. Journal of Shaanxi Normal University(Natural Science Edition). 2017, 45(1): 13-22

References

PDF(2809 KB)

54

Accesses

0

Citation

Detail

Sections
Recommended

/