您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(医学版)》

山东大学学报 (医学版) ›› 2025, Vol. 63 ›› Issue (8): 79-85.doi: 10.6040/j.issn.1671-7554.0.2025.0118

• 临床研究 • 上一篇    

基于多模态数据融合的多癌种风险预测模型

李千1,杨帆1,2,3,薛付忠1,2,3   

  1. 1.山东大学齐鲁医学院公共卫生学院医学数据学系, 山东 济南 250012;2.国家健康医疗大数据研究院, 山东 济南 250003;3.山东大学齐鲁医院, 山东 济南 250012
  • 发布日期:2025-08-25
  • 通讯作者: 杨帆. E-mail:fanyang@sdu.edu.cn薛付忠. E-mail:xuefzh@sdu.edu.cn
  • 基金资助:
    国家自然科学基金(82273736)

Multi-cancer risk prediction model based on multi-modal data fusion

LI Qian1, YANG Fan1,2,3, XUE Fuzhong1,2,3   

  1. 1. Department of Medical Dataology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, Shandong, China;
    2. National Institute of Health and Medical Big Data, Jinan 250003, Shandong, China;
    3. Qilu Hospital of Shandong University, Jinan 250012, Shandong, China
  • Published:2025-08-25

摘要: 目的 基于英国生物银行15种常见癌症的数据,通过多模态数据融合的方法构建多癌种风险预测模型,探讨基因组数据与临床数据在癌症预测中的应用,旨在提高癌症早期预测的准确性,并为个性化医疗提供数据支持。 方法 首先对数据进行质量控制,此外,将高维的基因组数据转换为图像格式并应用卷积神经网络模型,将临床数据通过多层感知机进行建模;引入注意力机制,通过加权融合不同模态数据的特征,以优化预测效果。 结果 通过融合基因组数据与临床数据,本研究构建的多模态数据融合模型在癌症预测的准确性上得到显著提升。经过卷积神经网络提取的图像特征和多层感知机提取的临床特征有效增强预测模型的能力,提升预测结果的准确性和鲁棒性。 结论 提出一种基于基因数据与临床数据融合的多癌种风险预测方法,验证多模态深度学习方法在癌症早期预测中的效果。通过卷积神经网络、多层感知机及注意力机制等技术的结合,显著提高癌症预测的精度,为未来癌症的诊断和个性化治疗提供强有力的早期支持。

关键词: 多癌种风险预测, 多模态数据, 基因组, 临床, 卷积神经网络, 多层感知机

Abstract: Objective To develop a multi-cancer risk prediction model using data from 15 common cancers in the UK Biobank, employing a multi-modal data fusion approach, so as to explore the application of genomic and clinical data in cancer risk prediction, with the goal of enhancing early cancer detection accuracy and providing valuable insights for personalized medicine. Methods The rigorous quality control was performed to the data. High-dimensional genomic data were then transformed into image representations and processed using convolutional neural networks, while clinical data were modeled using multi-layer perceptron. An attention mechanism was incorporated to perform weighted fusion of features from both genomic and clinical modalities, aiming to optimize predictive performance. Results The integration of genomic and clinical data through a multi-modal fusion model resulted in a significant improvement in cancer prediction accuracy. Features extracted by convolutional neural networks from genomic data and by multi-layer perceptron from clinical data effectively augmented the predictive capability of the model, enhancing both the accuracy and robustness of the predictions. Conclusion This study introduces a novel multi-cancer risk prediction framework that integrates genomic and clinical data. The application of multi-modal deep learning techniques, including convolutional neural networks, multi-layer perceptrons, and attention mechanisms, significantly enhances early cancer prediction accuracy. The findings provide robust early support for cancer diagnosis and personalized treatment strategies, demonstrating the potential of multi-modal approaches in precision oncology.

Key words: Multi-cancer risk prediction, Multi-modal data, Genomics, Clinical data, Convolutional neural networks, Multi-layer perceptron

中图分类号: 

  • TP391
[1] Bodmer WF. Cancer genetics[J]. Br Med Bull, 1994, 50(3): 517-526.
[2] Zhou MG, Wang HD, Zhu J, et al. Cause-specific mortality for 240 causes in China during 1990-2013: a systematic subnational analysis for the global burden of disease study 2013[J]. Lancet, 2016, 387(10015): 251-272.
[3] Ferlay J, Colombet M, Soerjomataram I, et al. Cancer statistics for the year 2020: an overview[J]. Int J Cancer, 2021. doi:10.1002/ijc.33588
[4] Song QX, Merajver SD, Li JZ. Cancer classification in the genomic era: five contemporary problems[J]. Hum Genomics, 2015, 9: 27. doi:10.1186/s40246-015-0049-8
[5] Ravì D, Wong C, Deligianni F, et al. Deep learning for health informatics[J]. IEEE J Biomed Health Inform, 2017, 21(1): 4-21.
[6] Caleyachetty R, Littlejohns T, Lacey B, et al. United Kingdom biobank(UK biobank): JACC focus seminar 6/8[J]. J Am Coll Cardiol, 2021, 78(1): 56-65.
[7] Roca-Fernandez A, Banerjee R, Thomaides-Brears H, et al. Liver disease is a significant risk factor for cardiovascular outcomes-A UK Biobank study[J]. J Hepatol, 2023, 79(5): 1085-1095.
[8] Louhelainen J. SNP arrays[J]. Microarrays, 2016, 5(4): 27. doi:10.3390/microarrays5040027
[9] Adler KG. ICD-10: our newest documentation dilemma[J]. Fam Pract Manag, 2015, 22(5): 7.
[10] Chang CC. Data management and summary statistics with PLINK[J]. Methods Mol Biol, 2020: 49-65. doi:10.1007/978-1-0716-0199-0_3
[11] Gomes I, Collins A, Lonjou C, et al. Hardy-Weinberg quality control[J]. Ann Hum Genet, 1999, 63(6): 535-538.
[12] Petrazzini BO, Naya H, Lopez-Bello F, et al. Evaluation of different approaches for missing data imputation on features associated to genomic data[J]. BioData Min, 2021, 14(1): 44. doi:10.1186/s13040-021-00274-7
[13] Sharma A, Vans E, Shigemizu D, et al. DeepInsight: a methodology to transform a non-image data to an image for convolution neural network architecture[J]. Sci Rep, 2019, 9(1): 11399. doi:10.1038/s41598-019-47765-6
[14] Son CS, Kang WS. Multivariate CNN model for human locomotion activity recognition with a wearable exoskeleton robot[J]. Bioengineering, 2023, 10(9): 1082. doi:10.3390/bioengineering10091082
[15] 祝玉杰, 叶晟, 申利民. 基于t-SNE特征降维和K近邻的分类算法[J]. 电脑知识与技术, 2024, 20(34): 11-13.
[16] Liu YJ, Caglar T, Peterson C, et al. Integrating geometries of ReLU feedforward neural networks[J]. Front Big Data, 2023, 6: 1274831. doi:10.3389/fdata.2023.1274831
[17] de Pater I, Mitici M. A mathematical framework for improved weight initialization of neural networks using Lagrange multipliers[J]. Neural Netw, 2023, 166: 579-594. doi:10.1016/j.neunet.2023.07.035
[18] Hernández-Rodríguez JC, García-Muñoz C, Ortiz-Álvarez J, et al. Dropout rate in digital health interventions for the prevention of skin cancer: systematic review, meta-analysis, and metaregression[J]. J Med Internet Res, 2022, 24(12): e42397. doi:10.2196/42397
[19] 高宇, 李子昂, 魏正琦, 等. MR高分辨率血管壁成像影像组学联合注意力机制预测症状性颅内动脉粥样硬化狭窄患者卒中复发[J]. 中国医学影像技术, 2025, 41(2): 229-233. GAO Yu, LI Ziang, WEI Zhengqi, et al. MR high-resolution vessel wall imaging radiomics combined with attention mechanism for predicting stroke recurrence in patients with symptomatic intracranial atherosclerosis stenosis[J]. Chinese Journal of Medical Imaging Technology, 2025, 41(2): 229-233.
[20] Shao H, Wang SF. Deep classification with linearity-enhanced logits to softmax function[J]. Entropy, 2023, 25(5): 727. doi:10.3390/e25050727
[21] 王建涛, 邵一川, 孙海静, 等. 改进的Adam优化算法在阿尔茨海默病医学图像分类中的应用[J/OL]. 计算机应用与软件. https://link.cnki.net/urlid/31.1260.tp.20241230.1745.005 WANG Jiantao, SHAO Yichuan, SUN Haijing, et al. Application of improved Adam optimization algorithm in medical image classification of Alzheimers disease[J/OL]. China Industrial Economics. https://link.cnki.net/urlid/31.1260.tp.20241230.1745.005
[22] He SQ, Xiao B, Wei HJ, et al. SVM classifier of cervical histopathology images based on texture and morphological features[J]. Technol Health Care, 2023, 31(1): 69-80.
[23] 周晓燕, 魏申奥, 卢曼曼. 基于Lasso-logistic回归和随机森林模型的癌症患者抑郁影响因素分析[J]. 安徽医学, 2024, 45(9): 1177-1182.
[24] 付金露. 基于特征选择的乳腺癌患者预后模型研究[D]. 南昌: 江西财经大学, 2023.
[25] Wang ZH, Lin RC, Li YC, et al. Deep learning-based multi-modal data integration enhancing breast cancer disease-free survival prediction[J]. Precis Clin Med, 2024, 7(2): pbae012. doi:10.1093/pcmedi/pbae012
[26] Pereira B, Chin SF, Rueda OM, et al. The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes[J]. Nat Commun, 2016, 7: 11479. doi:10.1038/ncomms11479
[27] Network CGA. Comprehensive molecular portraits of human breast tumours[J]. Nature, 2012, 490(7418): 61-70.
[28] Mandal PK, Perry G. SWADESH: a comprehensive platform for multimodal data and analytics for advanced research in Alzheimers disease and other brain disorders[J]. J Alzheimers Dis, 2022, 85(1): 1-5.
[29] Bessadok A, Mahjoub MA, Rekik I. Graph neural networks in network neuroscience[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(5): 5833-5848.
[30] Lowet AS, Zheng Q, Matias S, et al. Distributional reinforcement learning in the brain[J]. Trends Neurosci, 2020, 43(12): 980-997.
[1] 王梦星,薛付忠,杨帆. 基于多模态交叉注意力机制融合的1型糖尿病血糖浓度预测方法[J]. 山东大学学报 (医学版), 2025, 63(8): 41-50.
[2] 孙相洁,尹海燕,刘春兰,赵风娥. 基于潜在剖面分析的临床护理带教老师感知上级社会支持类型及与工作投入的关系[J]. 山东大学学报 (医学版), 2025, 63(7): 109-115.
[3] 葛雪,赵红艳. 疱疹病毒感染对重症肺炎患者临床预后及呼吸道微生态的影响[J]. 山东大学学报 (医学版), 2025, 63(6): 27-37.
[4] 黄馨,王梦雪,付书璠,张琦悦,徐力. 代谢综合征及其组分与消化系统恶性肿瘤的因果关联:两样本孟德尔随机化研究[J]. 山东大学学报 (医学版), 2025, 63(5): 86-94.
[5] 王小磊,方骏,王安,朱武晖,史光军. 两样本孟德尔随机化分析肠道菌群与肝外胆管癌的因果关系[J]. 山东大学学报 (医学版), 2025, 63(4): 44-50.
[6] 王宝炫,焦杰,张厚君,刘奇,于冠英. 衰弱与肌少症评估在胃肠道肿瘤术后结局预测中的应用与展望[J]. 山东大学学报 (医学版), 2025, 63(4): 51-58.
[7] 徐晶晶,王新起,张洋,许旺旺,高进. P2X7受体抑制剂对青春期创伤后应激障碍大鼠行为及肠道菌群的影响[J]. 山东大学学报 (医学版), 2025, 63(4): 1-9.
[8] 杨慧,苏士晶,李芬. 基于双向孟德尔随机化法探讨组织蛋白酶与衰弱的因果关联[J]. 山东大学学报 (医学版), 2025, 63(2): 67-76.
[9] 唐玉宁,潘天岳,董智慧,符伟国. 深度学习在主动脉影像自动分割中的研究进展[J]. 山东大学学报 (医学版), 2024, 62(9): 66-73.
[10] 王凤燕,梁振宇,李雪萍,陈荣昌. 慢性阻塞性肺疾病近年临床研究热点[J]. 山东大学学报 (医学版), 2024, 62(5): 7-15.
[11] 曹原,张剑桥,孟祥伟,刘文,庞晓明. 治疗慢性鼻窦炎伴鼻息肉的国内药物临床研究现状[J]. 山东大学学报 (医学版), 2024, 62(12): 38-42.
[12] 冯绪强,高萍,孙超,陶琳,闫根全,冷冰. 替加环素治疗感染性疾病临床疗效及影响因素[J]. 山东大学学报 (医学版), 2024, 62(12): 11-20.
[13] 林晓倩,封茂燕,牟正. 二肽基肽酶-4抑制剂的药学特点及临床应用[J]. 山东大学学报 (医学版), 2024, 62(12): 43-48.
[14] 王琼,李欣宇,徐磊,周成超,江帆. 社区听力康复对老年听力障碍患者沟通能力的干预效果评估:一项随机对照试验[J]. 山东大学学报 (医学版), 2024, 62(11): 96-104.
[15] 赵桐,苏醒,王凯. 妇悦舒植物饮品治疗原发性痛经的疗效[J]. 山东大学学报 (医学版), 2023, 61(9): 79-83.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!