Journal of Shandong University (Health Sciences) ›› 2025, Vol. 63 ›› Issue (8): 79-85.doi: 10.6040/j.issn.1671-7554.0.2025.0118

• Clinical Research • Previous Articles    

Multi-cancer risk prediction model based on multi-modal data fusion

LI Qian1, YANG Fan1,2,3, XUE Fuzhong1,2,3   

  1. 1. Department of Medical Dataology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, Shandong, China;
    2. National Institute of Health and Medical Big Data, Jinan 250003, Shandong, China;
    3. Qilu Hospital of Shandong University, Jinan 250012, Shandong, China
  • Published:2025-08-25

Abstract: Objective To develop a multi-cancer risk prediction model using data from 15 common cancers in the UK Biobank, employing a multi-modal data fusion approach, so as to explore the application of genomic and clinical data in cancer risk prediction, with the goal of enhancing early cancer detection accuracy and providing valuable insights for personalized medicine. Methods The rigorous quality control was performed to the data. High-dimensional genomic data were then transformed into image representations and processed using convolutional neural networks, while clinical data were modeled using multi-layer perceptron. An attention mechanism was incorporated to perform weighted fusion of features from both genomic and clinical modalities, aiming to optimize predictive performance. Results The integration of genomic and clinical data through a multi-modal fusion model resulted in a significant improvement in cancer prediction accuracy. Features extracted by convolutional neural networks from genomic data and by multi-layer perceptron from clinical data effectively augmented the predictive capability of the model, enhancing both the accuracy and robustness of the predictions. Conclusion This study introduces a novel multi-cancer risk prediction framework that integrates genomic and clinical data. The application of multi-modal deep learning techniques, including convolutional neural networks, multi-layer perceptrons, and attention mechanisms, significantly enhances early cancer prediction accuracy. The findings provide robust early support for cancer diagnosis and personalized treatment strategies, demonstrating the potential of multi-modal approaches in precision oncology.

Key words: Multi-cancer risk prediction, Multi-modal data, Genomics, Clinical data, Convolutional neural networks, Multi-layer perceptron

CLC Number: 

  • TP391
[1] Bodmer WF. Cancer genetics[J]. Br Med Bull, 1994, 50(3): 517-526.
[2] Zhou MG, Wang HD, Zhu J, et al. Cause-specific mortality for 240 causes in China during 1990-2013: a systematic subnational analysis for the global burden of disease study 2013[J]. Lancet, 2016, 387(10015): 251-272.
[3] Ferlay J, Colombet M, Soerjomataram I, et al. Cancer statistics for the year 2020: an overview[J]. Int J Cancer, 2021. doi:10.1002/ijc.33588
[4] Song QX, Merajver SD, Li JZ. Cancer classification in the genomic era: five contemporary problems[J]. Hum Genomics, 2015, 9: 27. doi:10.1186/s40246-015-0049-8
[5] Ravì D, Wong C, Deligianni F, et al. Deep learning for health informatics[J]. IEEE J Biomed Health Inform, 2017, 21(1): 4-21.
[6] Caleyachetty R, Littlejohns T, Lacey B, et al. United Kingdom biobank(UK biobank): JACC focus seminar 6/8[J]. J Am Coll Cardiol, 2021, 78(1): 56-65.
[7] Roca-Fernandez A, Banerjee R, Thomaides-Brears H, et al. Liver disease is a significant risk factor for cardiovascular outcomes-A UK Biobank study[J]. J Hepatol, 2023, 79(5): 1085-1095.
[8] Louhelainen J. SNP arrays[J]. Microarrays, 2016, 5(4): 27. doi:10.3390/microarrays5040027
[9] Adler KG. ICD-10: our newest documentation dilemma[J]. Fam Pract Manag, 2015, 22(5): 7.
[10] Chang CC. Data management and summary statistics with PLINK[J]. Methods Mol Biol, 2020: 49-65. doi:10.1007/978-1-0716-0199-0_3
[11] Gomes I, Collins A, Lonjou C, et al. Hardy-Weinberg quality control[J]. Ann Hum Genet, 1999, 63(6): 535-538.
[12] Petrazzini BO, Naya H, Lopez-Bello F, et al. Evaluation of different approaches for missing data imputation on features associated to genomic data[J]. BioData Min, 2021, 14(1): 44. doi:10.1186/s13040-021-00274-7
[13] Sharma A, Vans E, Shigemizu D, et al. DeepInsight: a methodology to transform a non-image data to an image for convolution neural network architecture[J]. Sci Rep, 2019, 9(1): 11399. doi:10.1038/s41598-019-47765-6
[14] Son CS, Kang WS. Multivariate CNN model for human locomotion activity recognition with a wearable exoskeleton robot[J]. Bioengineering, 2023, 10(9): 1082. doi:10.3390/bioengineering10091082
[15] 祝玉杰, 叶晟, 申利民. 基于t-SNE特征降维和K近邻的分类算法[J]. 电脑知识与技术, 2024, 20(34): 11-13.
[16] Liu YJ, Caglar T, Peterson C, et al. Integrating geometries of ReLU feedforward neural networks[J]. Front Big Data, 2023, 6: 1274831. doi:10.3389/fdata.2023.1274831
[17] de Pater I, Mitici M. A mathematical framework for improved weight initialization of neural networks using Lagrange multipliers[J]. Neural Netw, 2023, 166: 579-594. doi:10.1016/j.neunet.2023.07.035
[18] Hernández-Rodríguez JC, García-Muñoz C, Ortiz-Álvarez J, et al. Dropout rate in digital health interventions for the prevention of skin cancer: systematic review, meta-analysis, and metaregression[J]. J Med Internet Res, 2022, 24(12): e42397. doi:10.2196/42397
[19] 高宇, 李子昂, 魏正琦, 等. MR高分辨率血管壁成像影像组学联合注意力机制预测症状性颅内动脉粥样硬化狭窄患者卒中复发[J]. 中国医学影像技术, 2025, 41(2): 229-233. GAO Yu, LI Ziang, WEI Zhengqi, et al. MR high-resolution vessel wall imaging radiomics combined with attention mechanism for predicting stroke recurrence in patients with symptomatic intracranial atherosclerosis stenosis[J]. Chinese Journal of Medical Imaging Technology, 2025, 41(2): 229-233.
[20] Shao H, Wang SF. Deep classification with linearity-enhanced logits to softmax function[J]. Entropy, 2023, 25(5): 727. doi:10.3390/e25050727
[21] 王建涛, 邵一川, 孙海静, 等. 改进的Adam优化算法在阿尔茨海默病医学图像分类中的应用[J/OL]. 计算机应用与软件. https://link.cnki.net/urlid/31.1260.tp.20241230.1745.005 WANG Jiantao, SHAO Yichuan, SUN Haijing, et al. Application of improved Adam optimization algorithm in medical image classification of Alzheimers disease[J/OL]. China Industrial Economics. https://link.cnki.net/urlid/31.1260.tp.20241230.1745.005
[22] He SQ, Xiao B, Wei HJ, et al. SVM classifier of cervical histopathology images based on texture and morphological features[J]. Technol Health Care, 2023, 31(1): 69-80.
[23] 周晓燕, 魏申奥, 卢曼曼. 基于Lasso-logistic回归和随机森林模型的癌症患者抑郁影响因素分析[J]. 安徽医学, 2024, 45(9): 1177-1182.
[24] 付金露. 基于特征选择的乳腺癌患者预后模型研究[D]. 南昌: 江西财经大学, 2023.
[25] Wang ZH, Lin RC, Li YC, et al. Deep learning-based multi-modal data integration enhancing breast cancer disease-free survival prediction[J]. Precis Clin Med, 2024, 7(2): pbae012. doi:10.1093/pcmedi/pbae012
[26] Pereira B, Chin SF, Rueda OM, et al. The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes[J]. Nat Commun, 2016, 7: 11479. doi:10.1038/ncomms11479
[27] Network CGA. Comprehensive molecular portraits of human breast tumours[J]. Nature, 2012, 490(7418): 61-70.
[28] Mandal PK, Perry G. SWADESH: a comprehensive platform for multimodal data and analytics for advanced research in Alzheimers disease and other brain disorders[J]. J Alzheimers Dis, 2022, 85(1): 1-5.
[29] Bessadok A, Mahjoub MA, Rekik I. Graph neural networks in network neuroscience[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(5): 5833-5848.
[30] Lowet AS, Zheng Q, Matias S, et al. Distributional reinforcement learning in the brain[J]. Trends Neurosci, 2020, 43(12): 980-997.
[1] TANG Yuning, PAN Tianyue, DONG Zhihui, FU Weiguo. Research progress of deep learning in automatic segmentation of aortic images [J]. Journal of Shandong University (Health Sciences), 2024, 62(9): 66-73.
[2] XU Yaning, ZHANG Xianglin, LIU Xiaoyu, GUO Haiyang. Construction of endogenous epitope-tagged H1FX cell lines and the chromatin distribution mapping [J]. Journal of Shandong University (Health Sciences), 2023, 61(8): 1-9.
[3] SUN Shuyang, ZHANG Zhiyuan. Establishing the novel pharmacogenomics of head and neck cancer based on preclinical tumor models: necessity and prospect [J]. Journal of Shandong University (Health Sciences), 2021, 59(9): 57-63.
[4] GE Shujian, LIN Weiwei, CONG Lin, ZHANG Tao, HAN Xiaojuan, ZHANG Qinghua, DU Yifeng. Establishment of clinical database of Alzheimer's disease based on Web in Shandong Province [J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2015, 53(10): 42-45.
[5] SU Lei1, GUO Shu-ling1, FENG Yu-xin2, XIAO Ke-yuan1, ZHU Xiao-li1, YUAN Fang-shu1. Determination of Demodex genome size by flow cytometry [J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2013, 51(06): 57-60.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!