您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(医学版)》

山东大学学报 (医学版) ›› 2024, Vol. 62 ›› Issue (11): 73-84.doi: 10.6040/j.issn.1671-7554.0.2024.0723

• 公共卫生与预防医学 • 上一篇    

基于贝叶斯网络的缺血性脑卒中筛查模型构建

张伯韬1,2,3,仉率杰1,2,3,孙爽爽1,2,3,袁莹1,2,3,胡锡峰1,2,3,贾晓峰4,于媛媛2,5,薛付忠1,2,3   

  1. 1.山东大学齐鲁医学院公共卫生学院生物统计学系, 山东 济南 250012;2.山东大学齐鲁医学院公共卫生学院健康医疗大数据研究院, 山东 济南 250003;3.山东大学齐鲁医院, 山东 济南 250012;4.博兴县卫生健康保障中心网络信息办, 山东 滨州 256500;5.山东大学数据科学研究院, 山东 济南 250100
  • 发布日期:2024-11-25
  • 通讯作者: 薛付忠. E-mail:xuefzh@sdu.edu.cn于媛媛. E-mail:yu_yy_1993@163.com
  • 基金资助:
    国家自然科学基金重点项目(82330108);国家自然科学基金面上项目(82173625);山东省重点研发计划项目(2021SFGC0504);中国博士后科学基金面上资助(2022M721921);山东省自然科学基金青年基金项目(ZR2023QH236)

Development of the Bayesian network-based screening model for ischemic stroke

ZHANG Botao1,2,3, ZHANG Shuaijie1,2,3, SUN Shuangshuang1,2,3, YUAN Ying1,2,3, HU Xifeng1,2,3, JIA Xiaofeng4, YU Yuanyuan2,5, XUE Fuzhong1,2,3   

  1. 1. Department of Biostatistics, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250012, Shandong, China;
    2. Healthcare Big Data Research Institute, School of Public Health, Cheeloo College of Medicine, Shandong University, Jinan 250003, Shandong, China;
    3. Qilu Hospital of Shandong University, Jinan 250012, Shandong, China;
    4. Health and Wellness Assurance Center Network Information Office of Boxing County, Binzhou 256500, Shandong, China;
    5. Data Science Institute, Shandong University, Jinan 250100, Shandong, China
  • Published:2024-11-25

摘要: 目的 依托大规模电子健康记录,结合贝叶斯网络不确定性推理的优势,构建缺血性脑卒中筛查模型。 方法 筛查模型开发队列来自于齐鲁全生命周期电子研究型数据库(Cheeloo Lifespan Electronic Health Research Data-library, Cheeloo LEAD),按照7∶3比例分为训练集与测试集;外部验证队列来自国家健康医疗大数据研究院博兴合作中心数据库(博兴数据库)。采用单因素Logistic回归分析筛选与缺血性脑卒中发病显著相关的筛查因子,随后采用贝叶斯网络模型对筛查因子建模,利用禁忌搜索算法进行结构学习,利用贝叶斯估计算法进行参数学习,最终得到缺血性脑卒中筛查模型。从判别能力、校准能力两方面评价模型性能,并比较其与传统Logistic回归模型在缺血性脑卒中筛查中的表现。 结果 开发队列共1 067 609例,31 019例患缺血性脑卒中;外部验证队列共386 773例,13 393例患缺血性脑卒中。经过单因素筛选得到67个筛查因子,最终构建的贝叶斯网络模型包括68个节点,440条有向边,其中缺血性脑卒中节点的父节点包括年龄、高血压病、缺血性心脏病、慢性下呼吸道疾病、其他脑血管病、发作性和阵发性疾患,累及认知、知觉、情绪状态和行为的症状和体征,训练集、测试集和外部验证队列的AUC分别为0.840(95%CI:0.838~0.843)、0.839(95%CI:0.836~0.843)和0.811(95%CI:0.808~0.814),模型的判别能力良好,并且校准能力仍旧表现较好。本研究构建的筛查模型在缺失数据下的表现仍优于传统的Logistic回归模型。 结论 基于贝叶斯网络不确定性推理的优势,本研究成功构建了缺血性脑卒中筛查模型;模型具有较好的判别、校准能力,为早期缺血性脑卒中筛查提供了便捷、高效的方法。

关键词: 电子健康记录, 贝叶斯网络, Logistic回归, 缺血性脑卒中, 筛查模型

Abstract: Objective To develop a screening model for ischemic stroke by relying on large-scale electronic health records and combining the advantages of Bayesian network uncertainty inference. Methods The screening model derivation cohort was derived from the Cheeloo Lifespan Electronic Health Research Data-library(Cheeloo LEAD)and divided into training and testing sets in a 7∶3 ratio. The external validation cohort was sourced from the Boxing Collaboration Center Database of the National Healthcare Big Data Research Institute(Boxing Database). The univariate Logistic regression analysis was used to screen for factors significantly associated with the ischemic stroke. These associated screening factors were used to develop the Bayesian network. The tabu search algorithm was employed for structure learning, while Bayesian estimation algorithm was used for parameter learning, ultimately leading to the development of the ischemic stroke screening model. The performance of the model was evaluated in terms of both discrimination and calibration abilities, and compared with the traditional Logistic regression model in screening for ischemic stroke. Results The derivation cohort included 1,067,609 individuals, among whom 31,019 suffered from ischemic stroke. The external validation cohort included 386,773 individuals, among whom 13,393 suffered from ischemic stroke. After the univariate screening, 67 screening factors were identified. The final Bayesian network model included 68 nodes and 440 directed edges. The parent nodes of the ischemic stroke node included age, hypertensive diseases, ischemic heart diseases, chronic lower respiratory diseases, other cerebrovascular diseases, episodic and paroxysmal disorders, and the symptoms and signs involved cognition, perception, emotional state and behavior. The AUC for the training set, testing set, and external validation cohort were 0.840(95%CI: 0.838-0.843), 0.839(95%CI: 0.836-0.843), and 0.811(95%CI: 0.808-0.814), respectively, indicating good discrimination ability, and calibration ability also performed well. Our newly developed screening model continued to outperform the traditional Logistic regression screening model, even in the presence of missing data. Conclusion This study developed the ischemic stroke screening model with the advantage of Bayesian network uncertainty inference. The model has good discrimination and calibration abilities, providing a convenient and efficient method for early ischemic stroke screening.

Key words: Electronic health records, Bayesian network, Logistic regression, Ischemic stroke, Screening model

中图分类号: 

  • R743.3
[1] Li XM, Bian D, Yu JH, et al. Using machine learning models to improve stroke risk level classification methods of China national stroke screening[J]. BMC Med Inform Decis Mak, 2019, 19(1): 261. doi:10.1186/s12911-019-0998-2.
[2] GBD Stroke Collaborators. Global, regional, and national burden of stroke and its risk factors, 1990-2019: a systematic analysis for the Global Burden of Disease Study 2019[J]. Lancet Neurol, 2021, 20(10): 795-820.
[3] 《中国脑卒中防治报告 2021》编写组. 《中国脑卒中防治报告2021》概要[J]. 中国脑血管病杂志, 2023, 20(11): 783-793. Report on Stroke Prevention and Treatment in China Writing Group. Brief report on stroke prevention and treatment in China, 2021 [J]. Chinese Journal of Cerebrovascular Diseases, 2023, 20(11): 783-793.
[4] Wolf PA, DAgostino RB, Belanger AJ, et al. Probability of stroke: a risk profile from the Framingham Study[J]. Stroke, 1991, 22(3): 312-318.
[5] DAgostino RB, Wolf PA, Belanger AJ, et al. Stroke risk profile: adjustment for antihypertensive medication. The Framingham Study[J]. Stroke, 1994, 25(1): 40-43.
[6] Andrus B, Lacaille D. 2013 ACC/AHA guideline on the assessment of cardiovascular risk[J]. J Am Coll Cardiol, 2014, 63(25): 2886. doi:10.1016/j.jacc.2014.02.606.
[7] Parmar P, Krishnamurthi R, Ikram MA, et al. The Stroke Riskometer(TM)App: validation of a data collection tool and stroke risk predictor[J]. Int J Stroke, 2015, 10(2): 231-244.
[8] Vartiainen E, Laatikainen T, Peltonen M, et al. Predicting coronary heart disease and stroke: the FINRISK calculator[J]. Glob Heart, 2016, 11(2): 213-216.
[9] 国家“十五” 攻关“冠心病、脑卒中综合危险度评估及干预方案的研究”课题组. 国人缺血性心血管病发病危险的评估方法及简易评估工具的开发研究[J]. 中华心血管病杂志, 2003, 31(12): 893-901. The Collaborative Research Group of the National 10th Five Year Plan Projiect: a Study on Evaluation and Intervention of the Coronary Heart Disease and Stroke Integrated Risk. A study on evaluation of the risk of ischemic cardiovascular diseases in Chinese and the development of simplified tools for the evaluation[J]. Chinese Journal of Cardiology, 2003, 31(12): 893-901.
[10] Yang XL, Li JX, Hu DS, et al. Predicting the 10-year risks of atherosclerotic cardiovascular disease in Chinese population: the china-PAR project(prediction for ASCVD risk in China)[J]. Circulation, 2016, 134(19): 1430-1440.
[11] Carroll RJ, Eyler AE, Denny JC. Intelligent use and clinical benefits of electronic health records in rheumatoid arthritis[J]. Expert Rev Clin Immunol, 2015, 11(3): 329-337.
[12] Liao WQ, Coupland CAC, Burchardt J, et al. Predicting the future risk of lung cancer: development, and internal and external validation of the CanPredict(lung)model in 19.67 million people and evaluation of model performance against seven other risk prediction models[J]. Lancet Respir Med, 2023, 11(8): 685-697.
[13] Han YT, Zhu X, Hu YZ, et al. Electronic health record-based absolute risk prediction model for esophageal cancer in the Chinese population: model development and external validation[J]. JMIR Public Health Surveill, 2023, 9: e43725. doi:10.2196/43725.
[14] Kyburg HE, Pearl J. Probabilistic reasoning in intelligent systems: networks of plausible inference[J]. J Philos, 1991, 88(8): 434. doi:10.2307/2026705.
[15] Yuan BY, Wang CB, Fan ZX, et al. A Bayesian network-based approach for identifying risk factors and predicting ischemic stroke in infective endocarditis patients[J]. Front Cardiovasc Med, 2023, 10: 1294229. doi:10.3389/fcvm.2023.1294229.
[16] Fan ZX, Wang CB, Fang LB, et al. Risk factors and a Bayesian network model to predict ischemic stroke in patients with dilated cardiomyopathy[J]. Front Neurosci, 2022, 16: 1043922. doi:10.3389/fnins.2022.1043922.
[17] Larrañaga P, Karshenas H, Bielza C, et al. A review on evolutionary algorithms in Bayesian network learning and inference tasks[J]. Inf Sci, 2013, 233: 109-125. doi:10.1016/j.ins.2012.12.051.
[18] Heckerman D, Geiger D, Chickering DM. Learning Bayesian networks: the combination of knowledge and statistical data[J]. Mach Learn, 1995, 20(3): 197-243.
[19] Glover F. Artificial intelligence, heuristic frameworks and tabu search[J]. Manag Decis Econ, 1990, 11(5): 365-375.
[20] Friedman N, Goldszmidt M, Wyner A. Data analysis with Bayesian networks: a bootstrap approach[EB/OL].(2013-01-23)[2024-07-01]. 2013, arXiv: 1301.6695. http://arxiv.org/abs/1301.6695.
[21] Peng Y, Zhang SY, Pan R. Bayesian network reasoning with uncertain evidences[J]. Int J Unc Fuzz Knowl Based Syst, 2010, 18(5): 539-564.
[22] Fung R, Chang KC. Weighing and integrating evidence for stochastic simulation in Bayesian networks[J]. Machine Intelligence and Pattern Recognition, 1990, 10: 209-219. doi:10.1016/B978-0-444-88738-2.50023-3.
[23] Shachter RD, Peot MA. Simulation approaches to general probabilistic inference on belief networks[J]. Machine Intelligence and Pattern Recognition, 1990, 10: 221-231. doi:http://dx.doi.org/.
[24] 鞠柠蔚, 蒋红, 宋润琛,等. 脑卒中患病影响因素的研究现状[J]. 中外医学研究, 2024, 22(14): 165-171. JU Ningwei, JIANG Hong, SONG Runchen, et al. Research status of influencing factors of stroke [J]. 2024, 22(14): 165-171.
[25] Kelly-Hayes M. Influence of age and health behaviors on stroke risk: lessons from longitudinal studies[J]. J Am Geriatr Soc, 2010, 58(Suppl 2): S325-S328.
[26] ODonnell MJ, Xavier D, Liu LS, et al. Risk factors for ischaemic and intracerebral haemorrhagic stroke in 22 countries(the INTERSTROKE study): a case-control study[J]. Lancet, 2010, 376(9735): 112-123.
[27] Sundbøll J, Horváth-Puhó E, Schmidt M, et al. Long-term risk of stroke in myocardial infarction survivors thirty-year population-based cohort study[J]. Stroke, 2016, 47(7): 1727-1733.
[28] 李丹. 老年慢性阻塞性肺疾病合并卒中的临床分析[J]. 中国卒中杂志, 2023, 18(10): 1216.
[29] Wen LY, Ni H, Li KS, et al. Asthma and risk of stroke: a systematic review and meta-analysis[J]. J Stroke Cerebrovasc Dis, 2016, 25(3): 497-503.
[30] Chang RW, Tucker LY, Rothenberg KA, et al. Incidence of ischemic stroke in patients with asymptomatic severe carotid stenosis without surgical intervention[J]. JAMA, 2022, 327(20): 1974-1982.
[31] Holmstedt CA, Turan TN, Chimowitz MI. Atherosclerotic intracranial arterial stenosis: risk factors, diagnosis, and treatment[J]. Lancet Neurol, 2013, 12(11): 1106-1114.
[32] Johnston SC, Gress DR, Browner WS, et al. Short-term prognosis after emergency department diagnosis of TIA[J]. JAMA, 2000, 284(22): 2901-2906.
[33] 潘飞豹, 周冀英, 谭戈. 偏头痛与脑卒中:复杂的关联[J]. 重庆医学, 2013, 42(22): 2665-2667.
[34] 张亦舒, 李晓宁, 苗晨欣. 睡眠障碍与卒中风险的因果关系:孟德尔随机化研究[J]. 现代医学, 2023, 51(11): 1559-1565. ZHANG Yishu, LI Xiaoning, MIAO Chenxin. Sleep disorder and stroke risk: a Mendelian randomization study [J]. Modern Medical Journal, 2023, 51(11): 1559-1565.
[1] 刘淋,王晓楠,杨雅溪,王江腾,李旭,周新丽,管庆波,张栩. 甘油三酯-葡萄糖指数与颅内动脉粥样硬化性狭窄的相关性[J]. 山东大学学报 (医学版), 2024, 62(8): 93-100.
[2] 钟璐,薛付忠. 基于贝叶斯网络不确定性推理的肺癌风险预测模型[J]. 山东大学学报 (医学版), 2023, 61(4): 86-94.
[3] 姜震,孙静,邹雯,王唱唱,高琦. 基于两种机器学习算法的双相情感障碍患者自杀行为影响因素模型比较研究[J]. 山东大学学报 (医学版), 2022, 60(1): 101-108.
[4] 杨九龙,于涛,薛付忠. 脑血管狭窄患者狭窄分布及筛查模型的建立[J]. 山东大学学报 (医学版), 2021, 59(11): 114-119.
[5] 王晓璇,朱高培,孙娜,冯佳宁,肖宇飞,石福艳,王素珍. 东明县三春集镇贫困人群健康状况及影响因素分析[J]. 山东大学学报 (医学版), 2021, 59(1): 108-114.
[6] 龚茁,张敏敏,王志萍. 流产经历和子宫肌瘤家族史对子宫肌瘤患病风险的影响[J]. 山东大学学报(医学版), 2017, 55(9): 100-104.
[7] 李笑莹,刘芳,车海杰,张尽晖. 肿瘤标志物预测孤立性肺结节恶性概率模型的建立与初步评价[J]. 山东大学学报(医学版), 2017, 55(4): 60-64.
[8] 王永强,张维全,孙启峰,董晓鹏,彭传亮,张媛,赵小刚. 细胞角蛋白14基因在非小细胞肺癌中的表达与意义[J]. 山东大学学报(医学版), 2017, 55(3): 83-87.
[9] 王艺桦,马琳,阚艳敏,王文韬. 剖宫产术后早期子宫切口缺陷的超声观察及相关因素分析[J]. 山东大学学报(医学版), 2016, 54(5): 39-44.
[10] 曲立新,时兴华,杜怡峰. 急性缺血性脑卒中患者血浆PMP及EMP含量与预后的相关性[J]. 山东大学学报(医学版), 2016, 54(12): 32-36.
[11] 陈海丽, 顾娇阳, 张文静, 袁琳冉, 郑娟, 袁中瑞. 经典Wnt信号通路在大鼠脑缺血后血管新生中的作用[J]. 山东大学学报(医学版), 2015, 53(4): 31-36.
[12] 郑娟, 李政, 张文静, 袁琳冉, 樊书菠, 刘玉刚, 袁中瑞. Caveolin-1对脑缺血大鼠血管新生的影响[J]. 山东大学学报(医学版), 2015, 53(10): 16-20.
[13] 李建卓,金燕,杨慧,李新蕊,胡艳霞,张昌庆,朱艳文,阮师漫. 济南市艾滋病患者服药依从性及影响因素[J]. 山东大学学报(医学版), 2014, 52(3): 106-110.
[14] 冯斐1,燕锦2,袁萍1,胡晓琴1,徐琳2,杨艳芳1. 饮食习惯、生活方式与结直肠癌关系的配对病例对照研究[J]. 山东大学学报(医学版), 2013, 51(7): 107-112.
[15] 蔡毅,龙发青,曾超胜,苏庆杰,吴海荣,吴映曼,李鹏翔,周经霞,王德生,张余辉. 缺血性脑卒中二级预防中高血压防治的现状及其影响因素[J]. 山东大学学报(医学版), 2013, 51(3): 76-79.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!