您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(医学版)》

山东大学学报 (医学版) ›› 2025, Vol. 63 ›› Issue (8): 79-85.doi: 10.6040/j.issn.1671-7554.0.2025.0118

• 临床研究 • 上一篇    下一篇

基于多模态数据融合的多癌种风险预测模型

李千1,杨帆1,2,3,薛付忠1,2,3   

  1. 1.山东大学齐鲁医学院公共卫生学院医学数据学系, 山东 济南 250012;2.国家健康医疗大数据研究院, 山东 济南 250003;3.山东大学齐鲁医院, 山东 济南 250012
  • 发布日期:2025-08-25
  • 通讯作者: 杨帆. E-mail:fanyang@sdu.edu.cn薛付忠. E-mail:xuefzh@sdu.edu.cn
  • 基金资助:
    国家自然科学基金(82273736)

Multi-cancer risk prediction model based on multi-modal data fusion

LI Qian1, YANG Fan1,2,3, XUE Fuzhong1,2,3   

  1. 1. Department of Medical Dataology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, Shandong, China;
    2. National Institute of Health and Medical Big Data, Jinan 250003, Shandong, China;
    3. Qilu Hospital of Shandong University, Jinan 250012, Shandong, China
  • Published:2025-08-25

摘要: 目的 基于英国生物银行15种常见癌症的数据,通过多模态数据融合的方法构建多癌种风险预测模型,探讨基因组数据与临床数据在癌症预测中的应用,旨在提高癌症早期预测的准确性,并为个性化医疗提供数据支持。 方法 首先对数据进行质量控制,此外,将高维的基因组数据转换为图像格式并应用卷积神经网络模型,将临床数据通过多层感知机进行建模;引入注意力机制,通过加权融合不同模态数据的特征,以优化预测效果。 结果 通过融合基因组数据与临床数据,本研究构建的多模态数据融合模型在癌症预测的准确性上得到显著提升。经过卷积神经网络提取的图像特征和多层感知机提取的临床特征有效增强预测模型的能力,提升预测结果的准确性和鲁棒性。 结论 提出一种基于基因数据与临床数据融合的多癌种风险预测方法,验证多模态深度学习方法在癌症早期预测中的效果。通过卷积神经网络、多层感知机及注意力机制等技术的结合,显著提高癌症预测的精度,为未来癌症的诊断和个性化治疗提供强有力的早期支持。

关键词: 多癌种风险预测, 多模态数据, 基因组, 临床, 卷积神经网络, 多层感知机

Abstract: Objective To develop a multi-cancer risk prediction model using data from 15 common cancers in the UK Biobank, employing a multi-modal data fusion approach, so as to explore the application of genomic and clinical data in cancer risk prediction, with the goal of enhancing early cancer detection accuracy and providing valuable insights for personalized medicine. Methods The rigorous quality control was performed to the data. High-dimensional genomic data were then transformed into image representations and processed using convolutional neural networks, while clinical data were modeled using multi-layer perceptron. An attention mechanism was incorporated to perform weighted fusion of features from both genomic and clinical modalities, aiming to optimize predictive performance. Results The integration of genomic and clinical data through a multi-modal fusion model resulted in a significant improvement in cancer prediction accuracy. Features extracted by convolutional neural networks from genomic data and by multi-layer perceptron from clinical data effectively augmented the predictive capability of the model, enhancing both the accuracy and robustness of the predictions. Conclusion This study introduces a novel multi-cancer risk prediction framework that integrates genomic and clinical data. The application of multi-modal deep learning techniques, including convolutional neural networks, multi-layer perceptrons, and attention mechanisms, significantly enhances early cancer prediction accuracy. The findings provide robust early support for cancer diagnosis and personalized treatment strategies, demonstrating the potential of multi-modal approaches in precision oncology.

Key words: Multi-cancer risk prediction, Multi-modal data, Genomics, Clinical data, Convolutional neural networks, Multi-layer perceptron

中图分类号: 

  • TP391
[1] Bodmer WF. Cancer genetics[J]. Br Med Bull, 1994, 50(3): 517-526.
[2] Zhou MG, Wang HD, Zhu J, et al. Cause-specific mortality for 240 causes in China during 1990-2013: a systematic subnational analysis for the global burden of disease study 2013[J]. Lancet, 2016, 387(10015): 251-272.
[3] Ferlay J, Colombet M, Soerjomataram I, et al. Cancer statistics for the year 2020: an overview[J]. Int J Cancer, 2021. doi:10.1002/ijc.33588
[4] Song QX, Merajver SD, Li JZ. Cancer classification in the genomic era: five contemporary problems[J]. Hum Genomics, 2015, 9: 27. doi:10.1186/s40246-015-0049-8
[5] Ravì D, Wong C, Deligianni F, et al. Deep learning for health informatics[J]. IEEE J Biomed Health Inform, 2017, 21(1): 4-21.
[6] Caleyachetty R, Littlejohns T, Lacey B, et al. United Kingdom biobank(UK biobank): JACC focus seminar 6/8[J]. J Am Coll Cardiol, 2021, 78(1): 56-65.
[7] Roca-Fernandez A, Banerjee R, Thomaides-Brears H, et al. Liver disease is a significant risk factor for cardiovascular outcomes-A UK Biobank study[J]. J Hepatol, 2023, 79(5): 1085-1095.
[8] Louhelainen J. SNP arrays[J]. Microarrays, 2016, 5(4): 27. doi:10.3390/microarrays5040027
[9] Adler KG. ICD-10: our newest documentation dilemma[J]. Fam Pract Manag, 2015, 22(5): 7.
[10] Chang CC. Data management and summary statistics with PLINK[J]. Methods Mol Biol, 2020: 49-65. doi:10.1007/978-1-0716-0199-0_3
[11] Gomes I, Collins A, Lonjou C, et al. Hardy-Weinberg quality control[J]. Ann Hum Genet, 1999, 63(6): 535-538.
[12] Petrazzini BO, Naya H, Lopez-Bello F, et al. Evaluation of different approaches for missing data imputation on features associated to genomic data[J]. BioData Min, 2021, 14(1): 44. doi:10.1186/s13040-021-00274-7
[13] Sharma A, Vans E, Shigemizu D, et al. DeepInsight: a methodology to transform a non-image data to an image for convolution neural network architecture[J]. Sci Rep, 2019, 9(1): 11399. doi:10.1038/s41598-019-47765-6
[14] Son CS, Kang WS. Multivariate CNN model for human locomotion activity recognition with a wearable exoskeleton robot[J]. Bioengineering, 2023, 10(9): 1082. doi:10.3390/bioengineering10091082
[15] 祝玉杰, 叶晟, 申利民. 基于t-SNE特征降维和K近邻的分类算法[J]. 电脑知识与技术, 2024, 20(34): 11-13.
[16] Liu YJ, Caglar T, Peterson C, et al. Integrating geometries of ReLU feedforward neural networks[J]. Front Big Data, 2023, 6: 1274831. doi:10.3389/fdata.2023.1274831
[17] de Pater I, Mitici M. A mathematical framework for improved weight initialization of neural networks using Lagrange multipliers[J]. Neural Netw, 2023, 166: 579-594. doi:10.1016/j.neunet.2023.07.035
[18] Hernández-Rodríguez JC, García-Muñoz C, Ortiz-Álvarez J, et al. Dropout rate in digital health interventions for the prevention of skin cancer: systematic review, meta-analysis, and metaregression[J]. J Med Internet Res, 2022, 24(12): e42397. doi:10.2196/42397
[19] 高宇, 李子昂, 魏正琦, 等. MR高分辨率血管壁成像影像组学联合注意力机制预测症状性颅内动脉粥样硬化狭窄患者卒中复发[J]. 中国医学影像技术, 2025, 41(2): 229-233. GAO Yu, LI Ziang, WEI Zhengqi, et al. MR high-resolution vessel wall imaging radiomics combined with attention mechanism for predicting stroke recurrence in patients with symptomatic intracranial atherosclerosis stenosis[J]. Chinese Journal of Medical Imaging Technology, 2025, 41(2): 229-233.
[20] Shao H, Wang SF. Deep classification with linearity-enhanced logits to softmax function[J]. Entropy, 2023, 25(5): 727. doi:10.3390/e25050727
[21] 王建涛, 邵一川, 孙海静, 等. 改进的Adam优化算法在阿尔茨海默病医学图像分类中的应用[J/OL]. 计算机应用与软件. https://link.cnki.net/urlid/31.1260.tp.20241230.1745.005 WANG Jiantao, SHAO Yichuan, SUN Haijing, et al. Application of improved Adam optimization algorithm in medical image classification of Alzheimers disease[J/OL]. China Industrial Economics. https://link.cnki.net/urlid/31.1260.tp.20241230.1745.005
[22] He SQ, Xiao B, Wei HJ, et al. SVM classifier of cervical histopathology images based on texture and morphological features[J]. Technol Health Care, 2023, 31(1): 69-80.
[23] 周晓燕, 魏申奥, 卢曼曼. 基于Lasso-logistic回归和随机森林模型的癌症患者抑郁影响因素分析[J]. 安徽医学, 2024, 45(9): 1177-1182.
[24] 付金露. 基于特征选择的乳腺癌患者预后模型研究[D]. 南昌: 江西财经大学, 2023.
[25] Wang ZH, Lin RC, Li YC, et al. Deep learning-based multi-modal data integration enhancing breast cancer disease-free survival prediction[J]. Precis Clin Med, 2024, 7(2): pbae012. doi:10.1093/pcmedi/pbae012
[26] Pereira B, Chin SF, Rueda OM, et al. The somatic mutation profiles of 2,433 breast cancers refine their genomic and transcriptomic landscapes[J]. Nat Commun, 2016, 7: 11479. doi:10.1038/ncomms11479
[27] Network CGA. Comprehensive molecular portraits of human breast tumours[J]. Nature, 2012, 490(7418): 61-70.
[28] Mandal PK, Perry G. SWADESH: a comprehensive platform for multimodal data and analytics for advanced research in Alzheimers disease and other brain disorders[J]. J Alzheimers Dis, 2022, 85(1): 1-5.
[29] Bessadok A, Mahjoub MA, Rekik I. Graph neural networks in network neuroscience[J]. IEEE Trans Pattern Anal Mach Intell, 2023, 45(5): 5833-5848.
[30] Lowet AS, Zheng Q, Matias S, et al. Distributional reinforcement learning in the brain[J]. Trends Neurosci, 2020, 43(12): 980-997.
[1] 杨宇凡,李悦,谢灏,林春华. 胆固醇水平与神经源性膀胱风险的因果关系[J]. 山东大学学报 (医学版), 2026, 64(5): 67-73.
[2] 杨静,张磊. 儿童早期多生态位病毒组解析:未分类噬菌体多样性及其宿主特征[J]. 山东大学学报 (医学版), 2026, 64(5): 116-124.
[3] 山东省药学会. 药物临床试验用药品管理山东专家共识[J]. 山东大学学报 (医学版), 2026, 64(4): 8-13.
[4] 王兴山,张浩天,张云峰,周一新. 旋转铰链膝在复杂初次膝关节置换中的应用及临床疗效[J]. 山东大学学报 (医学版), 2026, 64(4): 51-57.
[5] 黄佩文, 王旭东. 治疗非小细胞肺癌新药:靶向c-Met蛋白的抗体药物偶联物Telisotuzumab Vedotin[J]. 山东大学学报 (医学版), 2026, 64(3): 124-130.
[6] 机器人辅助经椎间孔腰椎椎体间融合术手术技术专家组. 机器人辅助经椎间孔腰椎椎体间融合术专家共识[J]. 山东大学学报 (医学版), 2026, 64(2): 11-21.
[7] 齐硕,刘可宇,徐展望,谭国庆,张强. 改良活检优化宏基因组二代测序腰椎间盘感染早期诊断策略[J]. 山东大学学报 (医学版), 2026, 64(2): 96-103.
[8] 王皓正,张文雄. Q热伴胸腹主动脉瘤支架植入术后感染1例并文献复习[J]. 山东大学学报 (医学版), 2026, 64(1): 126-130.
[9] 王珊,刘伟,冯强,范莹莹,刘海霞,段延华,温红玲,焦伯延. 2021年济宁市柯萨奇病毒A组6型分离株全基因组特征分析[J]. 山东大学学报 (医学版), 2025, 63(9): 92-101.
[10] 王梦星,薛付忠,杨帆. 基于多模态交叉注意力机制融合的1型糖尿病血糖浓度预测方法[J]. 山东大学学报 (医学版), 2025, 63(8): 41-50.
[11] 罗淇,王霞,姜孟. 脑功能网络分析在失语症诊疗中的应用:病理机制分析、临床诊断与疗效评价[J]. 山东大学学报 (医学版), 2025, 63(8): 111-126.
[12] 孙相洁,尹海燕,刘春兰,赵风娥. 基于潜在剖面分析的临床护理带教老师感知上级社会支持类型及与工作投入的关系[J]. 山东大学学报 (医学版), 2025, 63(7): 109-115.
[13] 葛雪,赵红艳. 疱疹病毒感染对重症肺炎患者临床预后及呼吸道微生态的影响[J]. 山东大学学报 (医学版), 2025, 63(6): 27-37.
[14] 黄馨,王梦雪,付书璠,张琦悦,徐力. 代谢综合征及其组分与消化系统恶性肿瘤的因果关联:两样本孟德尔随机化研究[J]. 山东大学学报 (医学版), 2025, 63(5): 86-94.
[15] 徐晶晶,王新起,张洋,许旺旺,高进. P2X7受体抑制剂对青春期创伤后应激障碍大鼠行为及肠道菌群的影响[J]. 山东大学学报 (医学版), 2025, 63(4): 1-9.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!