山东大学学报 (医学版) ›› 2025, Vol. 63 ›› Issue (8): 51-60.doi: 10.6040/j.issn.1671-7554.0.2025.0510
• 临床研究 • 上一篇
张润泽1,薛付忠1,2,3,杨帆1,2,3
ZHANG Runze1, XUE Fuzhong1,2,3, YANG Fan1,2,3
摘要: 目的 基于癌症基因组图谱(the cancer genome atlas, TCGA)中5种癌症的多组学数据,提出一种融合图卷积网络、自注意力机制与解耦对比学习的癌症亚型聚类模型。 方法 模型以TCGA数据库中5种癌症的4种组学数据为输入,分别构建每类组学中样本之间的关系网络,利用图卷积网络提取组学内部的结构信息,更好地保留样本之间的特征差异。将不同组学下的特征进行拼接,并通过注意力机制进行加权融合,自动学习各组学的重要程度与互补关系。最后采用解耦对比学习方法,利用样本增强后的不同视角进行无监督训练,引导模型在没有真实标签的情况下识别出潜在的癌症亚型。 结果 模型在5种癌症数据中均表现出良好的聚类效果,能够将样本有效划分为不同的亚型。在生存分析中,各亚型之间的生存曲线呈现显著分离,说明模型识别的亚型预后存在差异。部分亚型在临床特征上也表现出较强的区分能力。与多种现有方法相比,本研究模型在多项评价指标上均取得良好结果,聚类结果具有更高的稳定性,同时展现出更强的生物学解释能力。 结论 本研究提出的癌症亚型聚类模型通过图卷积网络、自注意力机制与对比学习的协同作用,有效整合多组学数据,显著提升了癌症亚型聚类的准确性和临床解释力,该模型为癌症异质性研究提供了新思路,有助于精准医疗的个性化治疗策略制定。
中图分类号:
| [1] Cao W, Qin K, Li F, et al. Comparative study of cancer profiles between 2020 and 2022 using global cancer statistics(GLOBOCAN)[J]. J Natl Cancer Cent, 2024, 4(2): 128-134. [2] Duan R, Gao L, Gao Y, et al. Evaluation and comparison of multi-omics data integration methods for cancer subtyping[J]. PLoS Comput Biol, 2021, 17(8): e1009224.doi: 10.1371/journal.pcbi.1009224 [3] Ellrott K, Wong CK, Yau C, et al. Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets[J]. Cancer Cell, 2025, 43(2): 195-212. [4] 司呈坤. 面向组学数据的癌症亚型分类及特征选择技术研究[D]. 济南: 齐鲁工业大学, 2024. [5] Lipkova J, Chen RJ, Chen B, et al. Artificial intelligence for multimodal data integration in oncology[J]. Cancer Cell, 2022, 40(10): 1095-1110. [6] Wang YX, Zhang YJ. Nonnegative matrix factorization: a comprehensive review[J]. IEEE Trans Knowl Data Eng, 2012, 25(6): 1336-1353. [7] Vahabi N, Michailidis G. Unsupervised multi-omics data integration methods: a comprehensive review[J]. Front Genet, 2022, 13: 854752.doi: 10.3389/fgene.2022.854752 [8] Shen R, Olshen AB, Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis[J]. Bioinformatics, 2009, 25(22): 2906-2912. [9] Lim KL, Jiang X, Yi C. Deep clustering with variationalautoencoder[J]. IEEE Signal Process Lett, 2020, 27: 231-235. doi: 10.1109/LSP.2020.2965328 [10] Rong Z, Liu Z, Song J, et al. MCluster-VAEs: an end-to-end variational deep learning-based clustering method for subtype discovery using multi-omics data[J]. ComputBiol Med, 2022, 150: 106085. doi: 10.1016/j.compbiomed.2022.106085 [11] Zhou T, Li Q, Lu H, et al. GAN review: models and medical image fusion applications[J]. Inf Fusion, 2023, 91: 134-148. doi:10.1016/j.inffus.2022.10.017 [12] Ganini C, Amelio I, Bertolo R, et al. Global mapping of cancers: The Cancer Genome Atlas and beyond[J]. Mol Onco, 2021, 15(11): 2823-2840. [13] Mermel CH, Schumacher SE, Hill B, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers[J]. Genome biol, 2011, 12: 1-14. doi: 10.1186/gb-2011-12-4-r41 [14] 李阳. 基于自注意力机制和多组学数据整合的癌症亚型识别与分类研究[D]. 重庆: 中国人民解放军陆军军医大学, 2024. [15] 宁斌. 基于深度学习的多组学癌症亚型识别方法研究[D]. 长沙: 湖南大学, 2023. [16] Veena EV, Pushpalatha KP. Enhanced KNN imputation for missing data[C] //International Conference on Information Technology and Applications. Singapore: Springer Nature Singapore, 2024: 583-592. [17] Ponzi E, Thoresen M, Haugdahl Nøst T, et al. Integrative, multi-omics, analysis of blood samples improves model predictions: applications to cancer[J]. BMC bioinformatics, 2021, 22: 1-17. doi: 10.1186/s12859-021-04296-0 [18] Hasan BMS, Abdulazeez AM. A review of principal component analysis algorithm for dimensionality reduction[J]. Journal of Soft Computing and Data Mining, 2021, 2(1): 20-30. [19] Zhao S, Zhang B, Yang J, et al. Linear discriminant analysis[J]. Nature Reviews Methods Primers, 2024, 4(1): 70. doi: 10.1038/s43586-024-00346-y [20] Steck H, Ekanadham C, Kallus N. Is cosine-similarity of embeddings really about similarity?[EB/OL].(2024-03-08)[2025-04-26]. http://arxiv.org/abs/2403.05440 [21] Wang T, Shao W, Huang Z, et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification[J]. Nat commun, 2021, 12(1): 3445.doi: 10.1038/s41467-021-23774-w [22] Wang X, Qi GJ. Contrastive learning with stronger augmentations[J]. IEEE Trans Anal Mach Intell, 2022, 45(5): 5549-5560. [23] Zhao J, Zhao B, Song X, et al. Subtype-DCC: decoupled contrastive clustering method for cancer subtype identification based on multi-omics data[J]. Brief Bioinform, 2023, 24(2): bbad025.doi: 10.1093/bib/bbad025 [24] Li Y, Hu P, Liu Z, et al. Contrastive clustering[EB/OL].(2020-09-21)[2025-04-26]. http://arxiv.org/abs/2009.09687 [25] Wang B, Mezlini AM, Demir F, et al. Similarity network fusion for aggregating data types on a genomic scale[J]. Nat Methods, 2014, 11(3): 333-337. [26] Ikotun AM, Ezugwu AE, Abualigah L, et al. K-means clustering algorithms: a comprehensive review, variants analysis, and advances in the era of big data[J]. Inf Sci, 2023, 622: 178-210. doi: 10.1016/j.ins.2022.11.139 [27] Yang H, Chen R, Li D, et al. Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data[J]. Bioinformatics, 2021, 37(16): 2231-2237. [28] Madhumita, Dwivedi A, Paul S. Recursive integration of synergised graph representations of multi-omics data for cancer subtypes identification[J]. Sci Rep, 2022, 12(1): 15629. doi: 10.1038/s41598-022-17585-2 [29] International Cancer Genome Consortium. International network of cancer genome projects[J]. Nature, 2010, 464(7291): 993-998. [30] Li Y, Dou Y, Leprevost FDV, et al. Proteogenomic data and resources for pan-cancer analysis[J]. Cancer Cell, 2023, 41(8): 1397-1406. [31] Li A, Huang W, Lan X, et al. Boosting few-shot lear-ning with adaptive margin loss[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE. 2020: 12576-12584. doi: 10.1109/CVPR42600.2020.01259 |
| [1] | 李恒渠, 王登海. 外周血液涂片镜检在血常规检查中作用[J]. 山东大学学报(医学版), 2014, 52(S2): 27-28. |
| [2] | 张艳丽, 刘新风, 张欣, 王海燕, 杨咏梅, 杜鲁涛, 王丽丽, 李培龙, 王传新. 循环miR-128在结直肠癌患者血清中的表达及其对细胞迁移侵袭能力的影响[J]. 山东大学学报(医学版), 2014, 52(8): 57-62. |
| [3] | 刘慧,杜鲁涛,杨咏梅,董召刚,李娟,刘益民,张欣,王丽丽,郑桂喜,王传新. MiR-182在结直肠癌中的表达及其对结直肠癌细胞迁移能力的影响[J]. 山东大学学报(医学版), 2013, 51(12): 70-74. |
| [4] | 阎树昕1,宋贞荣2,李克成3 . 髓过氧化物酶染色阴性且无颗粒的急性早幼粒细胞白血病1例[J]. 山东大学学报(医学版), 2010, 48(12): 158-159. |
| [5] | 方茜, 曲爱林, 张欣, 杜鲁涛, 杨咏梅, 王传新. 血清miR-210在结直肠癌患者血清中的表达及临床意义[J]. 山东大学学报(医学版), 2015, 53(6): 77-81. |
| [6] | . 干细胞标记物LGR5在结直肠癌发生发展中的表达及意义[J]. 山东大学学报(医学版), 2009, 47(8): 85-88. |
| [7] | . DcR3蛋白和caspase3在结直肠癌和癌前病变中的表达及其意义[J]. 山东大学学报(医学版), 2009, 47(8): 79-84. |
|
||