您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(医学版)》

山东大学学报 (医学版) ›› 2025, Vol. 63 ›› Issue (8): 1-16.doi: 10.6040/j.issn.1671-7554.0.2025.0568

• 大数据赋能AI大模型驱动的多模态队列设计与分析--专家述评 •    

AI语言表征的多模态大数据队列设计理论方法体系

薛付忠1,2,3   

  1. 1.山东大学齐鲁医学院公共卫生学院医学数据学系, 山东 济南 250012;2.国家健康医疗大数据研究院, 山东 济南 250003;3.山东大学齐鲁医院, 山东 济南 250012
  • 发布日期:2025-08-25
  • 通讯作者: 薛付忠. E-mail:xuefzh@sdu.edu.cn
  • 基金资助:
    国家自然科学基金重点项目(82330108);国家自然科学基金面上项目(82173625);潍坊市中央财政支持公立医院改革与高质量发展示范项目(ZFCG-2024-0000505);河南省重大科技专项项目(241100310300)

Theoretical and methodological framework for multimodal big data cohort design based on AI language representation

XUE Fuzhong1,2,3   

  1. 1. Department of Medical Dataology, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, Shandong, China;
    2. National Institute of Health and Medical Big Data, Jinan 250003, Shandong, China;
    3. Qilu Hospital of Shandong University, Jinan 250012, Shandong, China
  • Published:2025-08-25

摘要: 本研究突破传统流行病学队列设计的理论框架,创立AI语言表征的多模态队列理论方法体系,形成AI语言建模的多模态队列新范式。该体系整合健康档案、电子病历、影像、基因等多源异构数据,借助Transformer等AI模型进行低维嵌入,统一量化为多模态嵌入向量。围绕“数字组学-数字生物标记-数字表型”三层架构,提出多模态融合、嵌入向量生成、因果推理等关键方法。创新性提出数字生物标记需满足PICLS准则:可预测性(predictable)、可解释性(interpretable)、可计算性(computable)、潜变量性(latent-variable)、稳定性(stable);数字表型在此基础上还应满足终点性(endpoints),即PICLSE准则,确保多模态队列的应用价值。技术方面,本文详述了嵌入生成、数据编码/解码、数据库构建及标记提取等流程。以猩红热主动监测为应用案例,展示多模态嵌入队列的实际应用效果。该体系为流行病学队列研究提供了新范式,对推动精准医疗与公共卫生智能化具有重要意义。

关键词: AI语言表征, 多模态队列, 数字组学, 数字生物标记, 数字表型, PICLS/PICLSE准则

Abstract: This paper proposes a theoretical and methodological framework for multimodal cohort design based on artificial intelligence(AI)language representation, breaking through the conventional paradigm of traditional epidemiological cohort studies and establishing a novel model for language-based multimodal integration. The framework integrates heterogeneous medical data—such as health records, electronic medical records, medical imaging, and genomic information—into a unified low-dimensional embedding space using Transformer-based models. Centered on a three-layer architecture of “Digital Omics-Digital Biomarkers-Digital Phenotypes”, it introduces key methods including embedding vector generation, causal inference, and multimodal data fusion. The study innovatively defines the PICLS criteria for digital biomarkers: predictability, interpretability, computability, latent-variable structure, and stability. On this basis, digital phenotypes are further required to meet the endpoints criterion, forming the PICLSE criteria to ensure their clinical utility in disease prediction and intervention. Technically, the paper details the entire process of embedding generation, data encoding/decoding, database construction, and biomarker extraction. A case study on scarlet fever surveillance demonstrates the practical application of the proposed multimodal embedded cohort in clinical screening and intelligent early warning. This framework offers a novel paradigm for epidemiological cohort research and provides methodological support for advancing precision medicine and smart public health.

Key words: AI language representation, Multimodal cohort, Digital omics, Digital biomarkers, Digital phenotypes, PICLS/PICLSE criteria

中图分类号: 

  • R181.2+3
[1] Moor M, Banerjee O, Abad ZSH, et al. Foundation models for generalist medical artificial intelligence[J]. Nature, 2023, 616(7956): 259-265.
[2] Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[3] Huang KX, Altosaar J, Ranganath R. ClinicalBERT: modeling clinical notes and predicting hospital readmission[EB/OL].(2020-11-29)[2025-05-15]. https://arxiv.org/abs/1904.05342
[4] Peng YF, Yan SK, Lu ZY. Transfer learning in biomedical natural language processing: an evaluation of BERT and ELMo on ten benchmarking datasets[C] //Proceedings of the 18th BioNLP Workshop and Shared Task. Florence, Italy: Stroudsburg, PA, USAACL, 2019: 58-65.
[5] Gu Y, Tinn R, Cheng H, et al. Domain-specific language model pretraining for biomedical natural language processing[J]. ACM Trans Comput Healthcare, 2022, 3(1): 1-23.
[6] Shin HC, Zhang Y, Bakhturina E, et al. BioMegatron: larger biomedical domain language model[EB/OL].(2020-10-14)[2025-05-15]. https://arxiv.org/abs/2010.06060
[7] Yang X, Pournejatian NM, Shin HC, et al. GatorTron: a large clinical language model to unlock patient information from unstructured electronic health records[EB/OL].(2022-03-14)[2025-05-01]. https://arxiv.org/abs/2203.03540v2
[8] Peng C, Yang X, Chen AK, et al. A study of generative large language model for medical research and healthcare[J]. NPJ Digit Med, 2023, 6(1): 210. doi:10.1038/s41746-023-00958-w
[9] Singhal K, Azizi S, Tu T, et al. Large language models encode clinical knowledge[J]. Nature, 2023, 620(7972): 172-180.
[10] Radford A, Kim JW, Hallacy C, et al. Learning transferable visual models from natural language supervision[EB/OL].(2021-02-26)[2025-05-15]. https://arxiv.org/abs/2103.00020
[11] Wang ZF, Wu ZB, Agarwal D, et al. MedCLIP: contrastive learning from unpaired medical images and text[J]. Proc Conf Empir Methods Nat Lang Process, 2022, 2022: 3876-3887. doi:10.18653/v1/2022.emnlp-main.256
[12] Feliandra ZB, Khadijah S, Rachmadi MF, et al. Classification of stroke and non-stroke patients from human body movements using smartphone videos and deep neural networks[C] //2022 International Conference on Advanced Computer Science and Information Systems(ICACSIS). Depok, Indonesia: IEEE, 2022: 187-192.
[13] Qiu ZB, Wang HX, Liao CB, et al. Sound recognition of harmful bird species related to power grid faults based on VGGish transfer learning[J]. J Electr Eng Technol, 2023, 18(3): 2447-2456.
[14] Umirzakova S, Ahmad S, Mardieva S, et al. Deep learning-driven diagnosis: a multi-task approach for segmenting stroke and Bells palsy[J]. Pattern Recognit, 2023, 144: 109866. doi:10.1016/j.patcog.2023.109866
[15] Bannur S, Hyland S, Liu QC, et al. Learning to exploit temporal structure for biomedical vision-language processing[C] //2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR). Vancouver, BC, Canada: IEEE, 2023: 15016-15027.
[16] Boecking B, Usuyama N, Bannur S, et al. Making the most of text semantics to improve biomedical vision—language processing[C] //Computer Vision—ECCV 2022. Switzerland: Springer Nature, 2022: 1-21.
[17] Pearl, J. Causality: models, reasoning, and inference[M]. Cambridge, UK: Cambridge University Press, 2000.
[18] Nomura A, Takeji Y, Shimojima M, et al. Digitalomics: towards artificial intelligence/machine learning-based precision cardiovascular medicine[J]. Circ J, 2025. doi:10.1253/circj.CJ-24-0865
[19] Balasubramaniam NK, Penberthy S, Fenyo D, et al. Digitalomics-digital transformation leading to omics insights[J]. Expert Rev Proteomics, 2024, 21(9/10): 337-344.
[20] Tamura Y, Nomura A, Kagiyama N, et al. Digitalomics, digital intervention, and designing future: the next frontier in cardiology[J]. J Cardiol, 2024, 83(5): 318-322.
[21] Sameh A, Rostami M, Oussalah M, et al. Digital phenotypes and digital biomarkers for health and diseases: a systematic review of machine learning approaches utilizing passive non-invasive signals collected via wearable devices and smartphones[J]. Artif Intell Rev, 2024, 58(2): 66. doi:10.1007/s10462-024-11009-5
[22] Anderson JC, Gerbing DW. Structural equation modeling in practice: a review and recommended two-step approach[J]. Psychol Bull, 1988, 103(3): 411-423.
[23] Prentice RL. Surrogate endpoints in clinical trials: definition and operational criteria[J]. Stat Med, 1989, 8(4): 431-440.
[24] Rudolph KE, Williams NT, Diaz I. Practical causal mediation analysis: extending nonparametric estimators to accommodate multiple mediators and multiple intermediate confounders[J]. Biostatistics, 2024, 25(4): 997-1014.
[25] Alayrac JB, Donahue J, Luc P, et al. Flamingo: a visual language model for few-shot learning[EB/OL].(2022-11-15)[2025-05-15]. https://arxiv.org/abs/2204.14198
[26] Yang ZC, Wei T, Liang Y, et al. A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images[J]. Nat Commun, 2025, 16(1): 2366. doi:10.1038/s41467-025-57587-y
[27] Golovanevsky M, Eickhoff C, Singh R. Multimodal attention-based deep learning for Alzheimers disease diagnosis[J]. J Am Med Inform Assoc, 2022, 29(12): 2014-2022.
[28] Wang Q, Chen K. Multi-label zero-shot human action recognition via joint latent ranking embedding[J]. Neural Netw, 2020, 122: 1-23. doi:10.1016/j.neunet.2019.09.029
[29] Yang L, Xu S, Sellergren A, et al. Advancing multimodal medical capabilities of Gemini[EB/OL].(2024-05-06)[2025-05-15]. https://arxiv.org/abs/2405.03162
[30] Oudin A, Maatoug R, Bourla A, et al. Digital phenotyping: data-driven psychiatry to redefine mental health[J]. J Med Internet Res, 2023, 25: e44502. doi:10.2196/44502
[31] Talukder AK, Schriml L, Ghosh A, et al. Diseasomics: actionable machine interpretable disease knowledge at the point-of-care[J]. PLoS Digit Health, 2022, 1(10): e0000128. doi:10.1371/journal.pdig.0000128
[32] Molina C, Prados-Suarez B. Digital phenotypes for personalized medicine[J]. Stud Health Technol Inform, 2021, 285: 141-146. doi:10.3233/SHTI210587
[33] Myszewski JJ, Klossowski E, Meyer P, et al. Validating GAN-BioBERT: a methodology for assessing reporting trends in clinical trials[J]. Front Digit Health, 2022, 4: 878369. doi:10.3389/fdgth.2022.878369
[34] Gharavi E, LeRoy NJ, Zheng GT, et al. Joint representation learning for retrieval and annotation of genomic interval sets[J]. Bioengineering, 2024, 11(3): 263. doi:10.3390/bioengineering11030263
[35] Shojaie A, Fox EB. Granger causality: a review and recent advances[J]. Annu Rev Stat Appl, 2022, 9(1): 289-319.
[36] Zeng ZX, Jiang X, Neapolitan R. Discovering causal interactions using Bayesian network scoring and information gain[J]. BMC Bioinformatics, 2016, 17(1): 221. doi:10.1186/s12859-016-1084-8
[37] Heurtel-Depeiges D, Ruoss A, Veness J, et al. Compression via pre-trained transformers: a study on byte-level multimodal data[EB/OL].(2024-10-07)[2025-05-15]. https://arxiv.org/abs/2410.05078
[38] Mital N, Özyilkan E, Garjani A, et al. Neural distributed image compression using common information[EB/OL].(2021-11-10)[2025-05-15]. https://arxiv.org/abs/2106.11723
[39] Shao ZH, Wang PY, Zhu QH, et al. DeepSeekMath: pushing the limits of mathematical reasoning in open language models[EB/OL].(2024-04-27)[2025-05-15]. https://arxiv.org/abs/2402.03300
[40] Liao SY, Chen J, Wang YZ, et al. Embedding compression with isotropic iterative quantization[J]. Proc AAAI Conf Artif Intell, 2020, 34(5): 8336-8343.
[41] Gomes C, Brunschwiler T. Neural embedding compre-ssion for efficient multi-task earth observation modelling[C] //IGARSS 2024-2024 IEEE International Geoscience and Remote Sensing Symposium. Athens, Greece: IEEE, 2024: 8268-8273.
[42] Javed HT, Khan KU, Cheema MF, et al. Instance-based lossless summarization of knowledge graph with optimized triples and corrections(IBA-OTC)[J]. IEEE Access, 2023, 12: 5584-5604.
[1] 龚茁,张敏敏,王志萍. 流产经历和子宫肌瘤家族史对子宫肌瘤患病风险的影响[J]. 山东大学学报(医学版), 2017, 55(9): 100-104.
[2] 安 宁,李登新,陈 彤,张建业 . 幽门螺杆菌感染与胃癌关系的Meta-analysis研究[J]. 山东大学学报(医学版), 2007, 45(4): 423-426.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!