山东大学学报 (医学版) ›› 2026, Vol. 64 ›› Issue (5): 88-95.doi: 10.6040/j.issn.1671-7554.0.2025.0262
• 临床医学 • 上一篇
陈雨梦1,2,张越2,张武林2,杨国兴2,许衍辉2,韩爱军2,刘彩娟2,郭雨语1,2,陈志敏2
CHEN Yumeng1,2, ZHANG Yue2, ZHANG Wulin2, YANG Guoxing2, XU Yanhui2, HAN Aijun2, LIU Caijuan2, GUO Yuyu1,2, CHEN Zhimin2
摘要: 目的 评价国内开源大语言模型(large language model, LLM)回答先天性晶状体脱位(congenital ectopia- lentis, CEL)患儿家长常见诊疗问题时的准确性、完整性及情感支持性,探讨其作为CEL患儿家长健康教育智能助手的可行性。 方法 构建包含33个CEL诊疗问题的题库。由3位高年资白内障科医师,采用李克特量表对Kimi chat、豆包、DeepSeek-R1 3个LLM的答案进行盲法评价。基于初步评测结果,选择综合表现最优的DeepSeek-R1在完整题库上进行全面评估。 结果 在3个LLM中,DeepSeek-R1表现最佳。其在全部题目中的回答准确性(≥5分)、完整性(≥2分)和情感支持性(≥2分)的比例分别为78.8%、87.9%和69.7%,评估者推荐其答案的比例为75.8%(150/198)。其回答在治疗与预后、症状等方面表现优异,但在疾病诊断方面稍欠。DeepSeek-R1的回答字数多于人工回答(P<0.05),且字数与答案完整性呈正相关(rs≈0.608, P<0.05)。三位评分者间的一致性均高于0.700,信度良好。 结论 DeepSeek-R1回答CEL相关诊疗问题具有较高的准确性、完整性和情感支持性,但其在疾病诊断方面的应用需保持谨慎。
中图分类号:
| [1] Lian ZK, Hu Y, Liu ZZ, et al. Longitudinal changes of refractive error in preschool children with congenital ectopia lentis[J]. Int Ophthalmol, 2024, 44(1): 85. doi: 10.1007/s10792-024-02953-w [2] Chandra A, Aragon-Martin JA, Hughes K, et al. A genotype-phenotype comparison of ADAMTSL4 and FBN1 in isolated ectopia lentis[J]. Invest Ophthalmol Vis Sci, 2012, 53(8): 4889-4896. [3] Chandra A, Patel D, Aragon-Martin JA, et al. The revised Ghent nosology; reclassifying isolated ectopia lentis[J]. Clin Genet, 2015, 87(3): 284-287. [4] Sakai LY, Keene DR, Renard M, et al. FBN1 The di-sease-causing gene for Marfan syndrome and other genetic disorders[J]. Gene, 2016, 591(1): 279-291. [5] Evereklioglu C, Hepsen IF, Er H. Weill-Marchesani syndrome in three generations[J]. Eye(Lond), 1999, 13(6): 773-777. [6] Morris AAM, Koich V, Santra S, et al. Guidelines for the diagnosis and management of cystathionine beta-synthase deficiency[J]. J Inherit Metab Dis, 2017, 40(1): 49-74. [7] Claerhout H, Witters P, Régal L, et al. Isolated sulfite oxidase deficiency[J]. J Inherit Metab Dis, 2018, 41(1): 101-108. [8] Fuchs J, Rosenberg T. Congenital ectopia lentis, A Da-nish national survey[J]. Acta Ophthalmol Scand, 1998, 76(1): 20-26. [9] Yang L, Wu QH, Hao YH, et al. Self-management behavior among patients with diabetic retinopathy in the community: a structural equation model[J]. Qual Life Res, 2017, 26(2): 359-366. [10] Ayers JW, Poliak A, Dredze M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum[J]. JAMA Intern Med, 2023, 183(6): 589-596. [11] Sinsky CA, Shanafelt TD, Ripp JA. The electronic health record inbox: recommendations for relief[J]. J Gen Intern Med, 2022, 37(15): 4002-4003. [12] Holmgren AJ, Byron ME, Grouse CK, et al. Association between billing patient portal messages as e-visits and patient messaging volume[J]. JAMA, 2023, 329(4): 339-342. [13] Stroop A, Stroop T, Zawy Alsofy S, et al. Large language models: Are artificial intelligence-based chatbots a reliable source of patient information for spinal surgery?[J]. Eur Spine J, 2024, 33(11): 4135-4143. [14] Kusunose K, Kashima S, Sata M. Evaluation of the accuracy of ChatGPT in answering clinical questions on the Japanese society of hypertension guidelines[J]. Circ J, 2023, 87(7): 1030-1033. [15] Saibene AM, Allevi F, Calvo-Henriquez C, et al. Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation[J]. Eur Arch Otorhinolaryngol, 2024, 281(4): 1835-1841. [16] Cheong KX, Zhang CX, Tan TN, et al. Comparing gen-erative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy[J]. Br J Ophthalmol, 2024, 108(10): 1443-1449. [17] Thirunavukarasu AJ, Hassan R, Mahmood S, et al. Trialling a large language model(ChatGPT)in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care[J]. JMIR Med Educ, 2023, 9: e46599. [18] Athaluri SA, Manthena SV, Kesapragada VSRKM, et al. Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references[J]. Cureus, 2023, 15(4): e37432. doi: 10.7759/cureus.37432 [19] 王子星, 齐乐, 廉晓丹, 等. 医疗领域聊天机器人的发展与应用:从传统方法到大语言模型[J]. 协和医学杂志, 2025, 16(5): 1170-1178. WANG Zixing, QI Le, LIAN Xiaodan, et al. The development and application of chatbots in healthcare: from traditional methods to large language models[J]. Medical Journal of Peking Union Medical College Hospital, 2025, 16(5): 1170-1178. [20] Tonsaker T, Bartlett G, Trpkov C. Health information on the Internet: gold mine or minefield?[J]. Can Fam Physician, 2014, 60(5): 407-408. [21] Vaira LA, Lechien JR, Abbate V, et al. Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis[J]. Otolaryngol Head Neck Surg, 2024, 170(6): 1492-1503. [22] Ayers JW, Poliak A, Dredze M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum[J]. JAMA Intern Med, 2023, 183(6): 589-596. [23] Link E, Baumann E. Use of health information on the Internet: personal and motivational influencing factors[J]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz, 2020, 63(6): 681-689. [24] Cakir H, Caglar U, Halis A, et al. Assessing the know-ledge of ChatGPT in answering questions regarding female urology[J]. Urol J, 2024, 21(6): 410-414. [25] Aydın FO, Aksoy BK, Ceylan A, et al. Readability and appropriateness of responses generated by ChatGPT 3.5, ChatGPT 4.0, gemini, and microsoft copilot for FAQs in refractive surgery[J]. Turk J Ophthalmol, 2024, 54(6): 313-317. [26] Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240. [26] Ali S, Abdullah, Armand TPT, et al. Metaverse in healthcare integrated with explainable AI and blockchain: enabling immersiveness, ensuring trust, and providing patient data security[J]. Sensors(Basel), 2023, 23(2): 565. doi: 10.3390/s23020565 [27] Kelly CJ, Karthikesalingam A, Suleyman M, et al. Key challenges for delivering clinical impact with artificial intelligence[J]. BMC Med, 2019, 17(1): 195. doi: 10.1186/s12916-019-1426-2 [28] Khanna RK, Ducloyer JB, Hage A, et al. Evaluating the potential of ChatGPT-4 in ophthalmology: the good, the bad and the ugly[J]. J Fr Ophtalmol, 2023, 46(7): 697-705. [29] Rasu RS, Bawa WA, Suminski R, et al. Health literacy impact on national healthcare utilization and expenditure[J]. Int J Health Policy Manag, 2015, 4(11): 747-755. [30] 高飞, 高雪, 邵彦, 等. 大语言模型在糖尿病视网膜病变患者健康教育中的应用[J]. 中华实验眼科杂志, 2024, 42(12): 1111-1118. GAO Fei, GAO Xue, SHAO Yan, et al. Application of large language models in health education for patients with diabetic retinopathy[J]. Chinese Journal of Experimental Ophthalmology, 2024, 42(12): 1111-1118. |
| [1] | 魏书生,吴海波,李松林,温镇璘,杨昌骜,卢群山,刘培来. 大型语言模型在骨科手术术前管理中的决策性能及辅助价值[J]. 山东大学学报 (医学版), 2026, 64(2): 104-110. |
| [2] | 武琪琪,成淼淼,肖晓燕. 多模态模型在肾脏病领域的应用[J]. 山东大学学报 (医学版), 2025, 63(10): 117-124. |
| [3] | 王琼,李欣宇,徐磊,周成超,江帆. 社区听力康复对老年听力障碍患者沟通能力的干预效果评估:一项随机对照试验[J]. 山东大学学报 (医学版), 2024, 62(11): 96-104. |
| [4] | 黄娟萍, 康年松. 健康教育护理干预对功能性消化不良患者抗焦虑治疗依从性及随访率的影响[J]. 山东大学学报(医学版), 2014, 52(Z1): 164-165. |
| [5] | 张英. 健康教育在128例老年高血压患者临床护理中的应用[J]. 山东大学学报(医学版), 2014, 52(Z1): 193-194. |
| [6] | 万黎萍, 罗春媚, 金珠明. 多媒体技术在呼吸科健康教育中的应用[J]. 山东大学学报(医学版), 2014, 52(Z1): 187-187. |
| [7] | 徐颖, 卢志坤, 王晓昆. 不同随访方式对下肢深静脉血栓出院患者满意度的影响[J]. 山东大学学报(医学版), 2014, 52(Z1): 192-192. |
| [8] | 邓世红, 竺红宇, 陈静, 汤丽玲, 耿婷婷, 吴雪. 健康教育在预防胃肠术后直立性低血压中探讨应用[J]. 山东大学学报(医学版), 2014, 52(S2): 126-126. |
| [9] | 刘吉伟, 张玉姝. 急性闭角型青光眼护理干预探讨[J]. 山东大学学报(医学版), 2014, 52(S2): 129-129. |
| [10] | 彩云,王束玫,冯月秋,高莉洁,房学强. 济南市城乡结合部居民代谢性疾病及相关知识知晓率调查[J]. 山东大学学报(医学版), 2006, 44(11): 1167-1170. |
|
||