1 数据与方法
1.1 数据收集
表1 数据集详细信息Tab.1 The details of the datasets |
| 数据集 | 特征数 | |||
|---|---|---|---|---|
| mRNA | DNA methylation | miRNA | 临床 | |
| LGG | 19 962 | 17 028 | 1 881 | 5 |
| SKCM | 22 149 | 24 533 | 827 | 5 |
| OV | 46 691 | 27 578 | 1 181 | 5 |
| BRCA | 20 531 | 22 124 | 740 | 5 |
| LUSC | 20 232 | 24 776 | 739 | 5 |
1.2 数据预处理
表2 预处理后的数据集详细信息Tab.2 The details of the datasets after preprocessing |
| 数据集 | 类别 | 特征数 | |||
|---|---|---|---|---|---|
| mRNA | DNA methylation | miRNA | 临床 | ||
| LGG | IDHmut-Codel:165、IDH wt:92、 IDHmut-Non-codel:239 | 16 615 | 8 049 | 434 | 5 |
| SKCM | keratin:98、immune:163、 MITF-low:58 | 20 111 | 20 055 | 528 | 5 |
| OV | MES:84、IMM:79、 DIF:98、PRO:100 | 14 850 | 13 915 | 463 | 5 |
| BRCA | Normal:115、Basal:131、 Her2:46、LumA:436、 LumB:147 | 19 227 | 20 106 | 503 | 5 |
| LUSC | Basal:32、Classical:41、Secretory:44、Primitive:33 | 20 232 | 15 510 | 739 | 5 |
1.3 卡方检验
1.4 基因注意力网络
1.5 嵌入网络
1.6 密集深度神经网络
1.7 性能指标
1.8 模型训练
2 实验结果与讨论
2.1 模型参数选择
2.1.1 密集层(dense layer)层数的选择
2.1.2 卡方检验中k的选择
2.2 单组学和多组学数据三分类的性能比较
2.3 消融实验
表3 融合不同模块后不同组学数据组合的亚型分类结果Tab.3 Subtype classification results of different omics data combinations after integrating different modules |
| 设定 | 指标 | miRNA | mRNA | Meth | miRNA+ mRNA | miRNA+ Meth | mRNA+ Meth | miRNA+ mRNA+ Meth |
|---|---|---|---|---|---|---|---|---|
| DNN(no_Chi) | ACC | 0.854 | 0.876 | 0.892 | 0.899 | 0.895 | 0.905 | 0.914 |
| F1_weighted | 0.853 | 0.876 | 0.890 | 0.899 | 0.894 | 0.905 | 0.913 | |
| F1_macro | 0.847 | 0.857 | 0.884 | 0.887 | 0.873 | 0.902 | 0.908 | |
| DNN | ACC | 0.854 | 0.898 | 0.902 | 0.905 | 0.908 | 0.913 | 0.924 |
| F1_weighted | 0.853 | 0.898 | 0.9 | 0.905 | 0.906 | 0.913 | 0.924 | |
| F1_macro | 0.847 | 0.897 | 0.902 | 0.902 | 0.908 | 0.912 | 0.919 | |
| DNN+基因注意力 | ACC | 0.923 | 0.938 | 0.945 | 0.947 | 0.95 | 0.955 | 0.966 |
| F1_weighted | 0.915 | 0.938 | 0.948 | 0.947 | 0.932 | 0.947 | 0.963 | |
| F1_macro | 0.927 | 0.937 | 0.927 | 0.938 | 0.916 | 0.943 | 0.969 | |
| DNN+密集块 | ACC | 0.893 | 0.908 | 0.915 | 0.917 | 0.92 | 0.925 | 0.936 |
| F1_weighted | 0.885 | 0.908 | 0.918 | 0.917 | 0.902 | 0.917 | 0.933 | |
| F1_macro | 0.897 | 0.907 | 0.897 | 0.908 | 0.886 | 0.913 | 0.941 | |
| DNN+临床数据 | ACC | 0.903 | 0.918 | 0.925 | 0.927 | 0.93 | 0.935 | 0.946 |
| F1_weighted | 0.895 | 0.918 | 0.928 | 0.927 | 0.912 | 0.927 | 0.943 | |
| F1_macro | 0.907 | 0.917 | 0.907 | 0.918 | 0.896 | 0.923 | 0.949 | |
| DNN+基因注意力+ | ACC | 0.94 | 0.951 | 0.961 | 0.963 | 0.966 | 0.97 | 0.975 |
| 密集块 | F1_weighted | 0.94 | 0.949 | 0.958 | 0.963 | 0.965 | 0.967 | 0.975 |
| F1_macro | 0.938 | 0.957 | 0.963 | 0.958 | 0.961 | 0.963 | 0.961 | |
| DNN+基因注意力+ | ACC | 0.933 | 0.948 | 0.955 | 0.957 | 0.96 | 0.965 | 0.972 |
| 密集块 | F1_weighted | 0.925 | 0.948 | 0.958 | 0.957 | 0.942 | 0.957 | 0.971 |
| F1_macro | 0.937 | 0.947 | 0.937 | 0.948 | 0.926 | 0.953 | 0.968 | |
| DNN+密集块+ | ACC | 0.903 | 0.92 | 0.918 | 0.925 | 0.927 | 0.93 | 0.931 |
| 临床数据 | F1_weighted | 0.895 | 0.92 | 0.918 | 0.928 | 0.909 | 0.922 | 0.928 |
| F1_macro | 0.907 | 0.911 | 0.917 | 0.907 | 0.893 | 0.918 | 0.919 | |
| DNN+基因注意力+ | ACC | 0.943 | 0.958 | 0.965 | 0.968 | 0.971 | 0.977 | 0.982 |
| 密集块+临床数据 | F1_weighted | 0.935 | 0.958 | 0.968 | 0.968 | 0.953 | 0.969 | 0.982 |
| F1_macro | 0.947 | 0.957 | 0.947 | 0.959 | 0.937 | 0.965 | 0.978 |
2.4 对比实验
表4 不同分类方法在LGG数据集分类性能结果Tab.4 classification performance results of different classification methods in the LGG dataset |
| 方法 | ACC | F1_weighted | F1_macro |
|---|---|---|---|
| SVM | 0.767 | 0.770 | 0.742 |
| KNN | 0.751 | 0.751 | 0.766 |
| DeepMO | 0.821 | 0.821 | 0.813 |
| P-NET | 0.922 | 0.922 | 0.913 |
| MOMA | 0.939 | 0.939 | 0.923 |
| MOGONET | 0.943 | 0.947 | 0.928 |
| MODILM | 0.975 | 0.960 | 0.945 |
| MODAA | 0.982 | 0.982 | 0.978 |
2.5 外部数据集验证MODDA分类的泛化性能
表5 不同癌症数据集的分类结果Tab.5 Classification results of different cancer datasets |
| 数据集 | 指标 | miRNA | mRNA | Meth | miRNA+ mRNA | miRNA+ Meth | mRNA+ Meth | miRNA+ mRNA+ Meth |
|---|---|---|---|---|---|---|---|---|
| SKCM | ACC | 0.889 | 0.905 | 0.913 | 0.919 | 0.928 | 0.939 | 0.942 |
| F1_weighted | 0.897 | 0.891 | 0.913 | 0.920 | 0.927 | 0.932 | 0.939 | |
| F1_macro | 0.879 | 0.886 | 0.912 | 0.902 | 0.925 | 0.926 | 0.928 | |
| OV | ACC | 0.864 | 0.878 | 0.892 | 0.903 | 0.917 | 0.923 | 0.933 |
| F1_weighted | 0.863 | 0.878 | 0.892 | 0.902 | 0.917 | 0.923 | 0.930 | |
| F1_macro | 0.857 | 0.877 | 0.890 | 0.902 | 0.910 | 0.912 | 0.925 | |
| BRCA | ACC | 0.823 | 0.838 | 0.845 | 0.847 | 0.850 | 0.855 | 0.884 |
| F1_weighted | 0.823 | 0.838 | 0.845 | 0.847 | 0.852 | 0.855 | 0.879 | |
| F1_macro | 0.817 | 0.837 | 0.827 | 0.838 | 0.846 | 0.848 | 0.843 | |
| LUSC | ACC | 0.873 | 0.888 | 0.895 | 0.897 | 0.900 | 0.907 | 0.915 |
| F1_weighted | 0.865 | 0.888 | 0.898 | 0.897 | 0.900 | 0.907 | 0.915 | |
| F1_macro | 0.867 | 0.887 | 0.887 | 0.888 | 0.896 | 0.903 | 0.913 |
表6 不同方法在不同癌症数据集上的分类结果比较Tab.6 Comparison of classification results of different methods on different cancer datasets |
| 方法 | SKCM | OV | BRCA | LUSC | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ACC | F1_weighted | F1_macro | ACC | F1_weighted | F1_macro | ACC | F1_weighted | F1_macro | ACC | F1_weighted | F1_macro | |
| SVM | 0.813 | 0.812 | 0.805 | 0.831 | 0.831 | 0.827 | 0.729 | 0.702 | 0.64 | 0.774 | 0.771 | 0.754 |
| KNN | 0.852 | 0.852 | 0.835 | 0.853 | 0.853 | 0.853 | 0.782 | 0.782 | 0.78 | 0.776 | 0.776 | 0.758 |
| DeepMO | 0.879 | 0.873 | 0.859 | 0.897 | 0.895 | 0.887 | 0.789 | 0.787 | 0.775 | 0.785 | 0.785 | 0.782 |
| P-NET | 0.905 | 0.903 | 0.901 | 0.901 | 0.901 | 0.899 | 0.816 | 0.814 | 0.809 | 0.84 | 0.838 | 0.834 |
| MOMA | 0.908 | 0.905 | 0.908 | 0.912 | 0.911 | 0.908 | 0.829 | 0.829 | 0.824 | 0.851 | 0.851 | 0.847 |
| MOGONET | 0.919 | 0.919 | 0.917 | 0.92 | 0.919 | 0.917 | 0.836 | 0.832 | 0.815 | 0.853 | 0.853 | 0.842 |
| MODILM | 0.928 | 0.927 | 0.927 | 0.93 | 0.93 | 0.928 | 0.845 | 0.84 | 0.804 | 0.865 | 0.855 | 0.833 |
| MODAA | 0.942 | 0.939 | 0.928 | 0.933 | 0.93 | 0.925 | 0.884 | 0.879 | 0.843 | 0.915 | 0.915 | 0.913 |