Journal of Shandong University (Health Sciences) ›› 2025, Vol. 63 ›› Issue (8): 51-60.doi: 10.6040/j.issn.1671-7554.0.2025.0510

• Clinical Research • Previous Articles    

Cancer subtype clustering via multimodal decoupled contrastive learning

ZHANG Runze1, XUE Fuzhong1,2,3, YANG Fan1,2,3   

  1. 1. Department of Medical Dataology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, Shandong, China;
    2. National Institute of Health and Medical Big Data, Jinan 250003, Shandong, China;
    3. Qilu Hospital of Shandong University, Jinan 250012, Shandong, China
  • Published:2025-08-25

Abstract: Objective To propose a cancer subtype clustering model that integrates graph convolutional networks, self-attention mechanisms, and decoupled contrastive learning, based on multi-omics data from five cancer types in the cancer genome atlas(TCGA). Methods The model took four types of omics data from five cancer types in the TCGA database as input. For each omics type, it constructed a sample-wise relational graph and employed a graph convolutional network(GCN)to extract intra-omics structural information, thereby better preserving inter-sample feature differences. The features from different omics were concatenated and further fused through an attention mechanism, which automatically learned the relative importance and complementary relationships among omics modalities. Finally, a decoupled contrastive learning strategy was applied, and different augmented views of the same sample were used for unsupervised training, guiding the model to identify potential cancer subtypes in the absence of ground-truth labels. Results The model demonstrated good clustering performance across five cancer datasets, effectively dividing samples into distinct subtypes. In survival analysis, the survival curves of different subtypes showed significant separation, indicating that the identified subtypes were associated with different prognoses. Some subtypes also exhibited strong differentiation in clinical characteristics. Compared with several existing methods, the proposed model achieved favorable results on multiple evaluation metrics, yielding more stable clustering outcomes and demonstrating stronger biological interpretability. Conclusion This study proposes a cancer subtype clustering model that effectively integrates multi-omics data through the synergistic use of GCN, self-attention mechanisms, and contrastive learning. The model significantly improves the accuracy and clinical interpretability of cancer subtype clustering, offering a new perspective for cancer heterogeneity research and contributing to the development of personalized treatment strategies in precision medicine.

Key words: Cancer subtype clustering, Multi-omics, Graph convolutional network, Self-attention mechanism, Decoupled contrastive learning

CLC Number: 

  • R730.43
[1] Cao W, Qin K, Li F, et al. Comparative study of cancer profiles between 2020 and 2022 using global cancer statistics(GLOBOCAN)[J]. J Natl Cancer Cent, 2024, 4(2): 128-134.
[2] Duan R, Gao L, Gao Y, et al. Evaluation and comparison of multi-omics data integration methods for cancer subtyping[J]. PLoS Comput Biol, 2021, 17(8): e1009224.doi: 10.1371/journal.pcbi.1009224
[3] Ellrott K, Wong CK, Yau C, et al. Classification of non-TCGA cancer samples to TCGA molecular subtypes using compact feature sets[J]. Cancer Cell, 2025, 43(2): 195-212.
[4] 司呈坤. 面向组学数据的癌症亚型分类及特征选择技术研究[D]. 济南: 齐鲁工业大学, 2024.
[5] Lipkova J, Chen RJ, Chen B, et al. Artificial intelligence for multimodal data integration in oncology[J]. Cancer Cell, 2022, 40(10): 1095-1110.
[6] Wang YX, Zhang YJ. Nonnegative matrix factorization: a comprehensive review[J]. IEEE Trans Knowl Data Eng, 2012, 25(6): 1336-1353.
[7] Vahabi N, Michailidis G. Unsupervised multi-omics data integration methods: a comprehensive review[J]. Front Genet, 2022, 13: 854752.doi: 10.3389/fgene.2022.854752
[8] Shen R, Olshen AB, Ladanyi M. Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis[J]. Bioinformatics, 2009, 25(22): 2906-2912.
[9] Lim KL, Jiang X, Yi C. Deep clustering with variationalautoencoder[J]. IEEE Signal Process Lett, 2020, 27: 231-235. doi: 10.1109/LSP.2020.2965328
[10] Rong Z, Liu Z, Song J, et al. MCluster-VAEs: an end-to-end variational deep learning-based clustering method for subtype discovery using multi-omics data[J]. ComputBiol Med, 2022, 150: 106085. doi: 10.1016/j.compbiomed.2022.106085
[11] Zhou T, Li Q, Lu H, et al. GAN review: models and medical image fusion applications[J]. Inf Fusion, 2023, 91: 134-148. doi:10.1016/j.inffus.2022.10.017
[12] Ganini C, Amelio I, Bertolo R, et al. Global mapping of cancers: The Cancer Genome Atlas and beyond[J]. Mol Onco, 2021, 15(11): 2823-2840.
[13] Mermel CH, Schumacher SE, Hill B, et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers[J]. Genome biol, 2011, 12: 1-14. doi: 10.1186/gb-2011-12-4-r41
[14] 李阳. 基于自注意力机制和多组学数据整合的癌症亚型识别与分类研究[D]. 重庆: 中国人民解放军陆军军医大学, 2024.
[15] 宁斌. 基于深度学习的多组学癌症亚型识别方法研究[D]. 长沙: 湖南大学, 2023.
[16] Veena EV, Pushpalatha KP. Enhanced KNN imputation for missing data[C] //International Conference on Information Technology and Applications. Singapore: Springer Nature Singapore, 2024: 583-592.
[17] Ponzi E, Thoresen M, Haugdahl Nøst T, et al. Integrative, multi-omics, analysis of blood samples improves model predictions: applications to cancer[J]. BMC bioinformatics, 2021, 22: 1-17. doi: 10.1186/s12859-021-04296-0
[18] Hasan BMS, Abdulazeez AM. A review of principal component analysis algorithm for dimensionality reduction[J]. Journal of Soft Computing and Data Mining, 2021, 2(1): 20-30.
[19] Zhao S, Zhang B, Yang J, et al. Linear discriminant analysis[J]. Nature Reviews Methods Primers, 2024, 4(1): 70. doi: 10.1038/s43586-024-00346-y
[20] Steck H, Ekanadham C, Kallus N. Is cosine-similarity of embeddings really about similarity?[EB/OL].(2024-03-08)[2025-04-26]. http://arxiv.org/abs/2403.05440
[21] Wang T, Shao W, Huang Z, et al. MOGONET integrates multi-omics data using graph convolutional networks allowing patient classification and biomarker identification[J]. Nat commun, 2021, 12(1): 3445.doi: 10.1038/s41467-021-23774-w
[22] Wang X, Qi GJ. Contrastive learning with stronger augmentations[J]. IEEE Trans Anal Mach Intell, 2022, 45(5): 5549-5560.
[23] Zhao J, Zhao B, Song X, et al. Subtype-DCC: decoupled contrastive clustering method for cancer subtype identification based on multi-omics data[J]. Brief Bioinform, 2023, 24(2): bbad025.doi: 10.1093/bib/bbad025
[24] Li Y, Hu P, Liu Z, et al. Contrastive clustering[EB/OL].(2020-09-21)[2025-04-26]. http://arxiv.org/abs/2009.09687
[25] Wang B, Mezlini AM, Demir F, et al. Similarity network fusion for aggregating data types on a genomic scale[J]. Nat Methods, 2014, 11(3): 333-337.
[26] Ikotun AM, Ezugwu AE, Abualigah L, et al. K-means clustering algorithms: a comprehensive review, variants analysis, and advances in the era of big data[J]. Inf Sci, 2023, 622: 178-210. doi: 10.1016/j.ins.2022.11.139
[27] Yang H, Chen R, Li D, et al. Subtype-GAN: a deep learning approach for integrative cancer subtyping of multi-omics data[J]. Bioinformatics, 2021, 37(16): 2231-2237.
[28] Madhumita, Dwivedi A, Paul S. Recursive integration of synergised graph representations of multi-omics data for cancer subtypes identification[J]. Sci Rep, 2022, 12(1): 15629. doi: 10.1038/s41598-022-17585-2
[29] International Cancer Genome Consortium. International network of cancer genome projects[J]. Nature, 2010, 464(7291): 993-998.
[30] Li Y, Dou Y, Leprevost FDV, et al. Proteogenomic data and resources for pan-cancer analysis[J]. Cancer Cell, 2023, 41(8): 1397-1406.
[31] Li A, Huang W, Lan X, et al. Boosting few-shot lear-ning with adaptive margin loss[C] //Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle, WA, USA: IEEE. 2020: 12576-12584. doi: 10.1109/CVPR42600.2020.01259
[1] . [J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2014, 52(S2): 27-28.
[2] ZHANG Yanli, LIU Xinfeng, ZHANG Xin, WANG Haiyan, YANG Yongmei, DU Lutao, WANG Lili, LI Peilong, WANG Chuanxin. Expression of circulating miR-128 in serum of colorectal cancer and its effect on migration and invasion of colorectal cancer cells [J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2014, 52(8): 57-62.
[3] LIU Hui, DU Lu-tao, YANG Yong-mei, DONG Zhao-gang, LI Juan, LIU Yi-min, ZHANG Xin, WANG Li-li, ZHENG Gui-xi, WANG Chuan-xin. Expression of miR-182 in colorectal cancer and its effect on the migration of colorectal cancer cells [J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2013, 51(12): 70-74.
[4] . [J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2010, 48(12): 158-159.
[5] FANG Qian, QU Ailin, ZHANG Xin, DU Lutao, YANG Yongmei, WANG Chuanxin. Expression and clinical significance of miR-210 in the serum of patients with colorectal cancer [J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2015, 53(6): 77-81.
[6] . Expression and significance of stem cell marker LGR5 proteins in the  development  and progression of human colorectal carcinoma [J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2009, 47(8): 85-88.
[7] . Expressions of DcR3 and caspase3 and their significance in colorectal adenocarcinoma and precancerous lesions [J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2009, 47(8): 79-84.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!