您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(医学版)》

山东大学学报 (医学版) ›› 2026, Vol. 64 ›› Issue (5): 88-95.doi: 10.6040/j.issn.1671-7554.0.2025.0262

• 临床医学 • 上一篇    

应用大语言模型回答先天性晶状体脱位患儿家长提问的效果

陈雨梦1,2,张越2,张武林2,杨国兴2,许衍辉2,韩爱军2,刘彩娟2,郭雨语1,2,陈志敏2   

  1. 1.河北医科大学眼科学教研室, 河北 石家庄 050017;2.河北省眼科医院/河北省眼科学重点实验室/河北省眼部疾病临床医学研究中心, 河北 邢台 054001
  • 发布日期:2026-05-13
  • 通讯作者: 陈志敏. E-mail:ykyyczm@126.com
  • 基金资助:
    中央引导地方科技发展资金项目(246Z7710G)

Evaluating the efficacy of large language models in answering questions from parents of children with congenital lens dislocation

CHEN Yumeng1,2, ZHANG Yue2, ZHANG Wulin2, YANG Guoxing2, XU Yanhui2, HAN Aijun2, LIU Caijuan2, GUO Yuyu1,2, CHEN Zhimin2   

  1. 1. Department of Ophthalmology, Hebei Medical University, Shijiazhuang 050017, Hebei, China;
    2. Hebei Eye Hospital/Key Laboratory of Ophthalmology in Hebei Province/Hebei Provincial Clinical Research Center for Ocular Diseases, Xingtai 054001, Hebei, China
  • Published:2026-05-13

摘要: 目的 评价国内开源大语言模型(large language model, LLM)回答先天性晶状体脱位(congenital ectopia- lentis, CEL)患儿家长常见诊疗问题时的准确性、完整性及情感支持性,探讨其作为CEL患儿家长健康教育智能助手的可行性。 方法 构建包含33个CEL诊疗问题的题库。由3位高年资白内障科医师,采用李克特量表对Kimi chat、豆包、DeepSeek-R1 3个LLM的答案进行盲法评价。基于初步评测结果,选择综合表现最优的DeepSeek-R1在完整题库上进行全面评估。 结果 在3个LLM中,DeepSeek-R1表现最佳。其在全部题目中的回答准确性(≥5分)、完整性(≥2分)和情感支持性(≥2分)的比例分别为78.8%、87.9%和69.7%,评估者推荐其答案的比例为75.8%(150/198)。其回答在治疗与预后、症状等方面表现优异,但在疾病诊断方面稍欠。DeepSeek-R1的回答字数多于人工回答(P<0.05),且字数与答案完整性呈正相关(rs0.608, P<0.05)。三位评分者间的一致性均高于0.700,信度良好。 结论 DeepSeek-R1回答CEL相关诊疗问题具有较高的准确性、完整性和情感支持性,但其在疾病诊断方面的应用需保持谨慎。

关键词: 先天性晶状体脱位, 大语言模型, DeepSeek-R1, 健康教育, 问答性能, 生成质量

Abstract: Objective To evaluate the accuracy, completeness, and emotional supportiveness of domestic open-source large language models(LLMs)in answering common diagnostic and therapeutic questions from parents of children with congenital ectopia lentis(CEL), and to explore the feasibility of using LLMs as intelligent health education assistants for parents of CEL children. Methods A question bank comprising 33 CEL-related diagnosis and treatment questions was constructed. Three senior attending ophthalmologists specializing in cataract independently evaluated the answers generated by three LLMs(Kimi chat, Doubao, and DeepSeek-R1)using a blinded assessment method with Likert scales(1-6 for accuracy, 1-3 for completeness and emotional support). Based on preliminary evaluation results, the best-performing model overall, DeepSeek-R1, was selected for a comprehensive evaluation on the entire question bank. Results Among the three LLMs, DeepSeek-R1 performed the best. The proportions of its answers achieving accuracy(≥5 points), completeness(≥2 points), and emotional support(≥2 points)scores were 78.8%, 87.9%, and 69.7%, respectively. The evaluators recommendation rate for its answers was 75.8%(150/198). Its responses were excellent in areas such as treatment, prognosis, and symptoms, but were slightly weaker in disease diagnosis. The word count of DeepSeek-R1s responses was significantly higher than that of human answers(P<0.05), and the word count showed a positive correlation with completeness scores(rs0.608, P<0.05). The intraclass correlation coefficient among the three raters for all ratings was above 0.700, indicating good reliability. Conclusion DeepSeek-R1 demonstrates high accuracy, completeness, and emotional support in answering CEL-related diagnosis and treatment questions. However, its application in disease diagnosis requires cautious interpretation and should be used under professional guidance.

Key words: Congenital ectopia lentis, Large language model, DeepSeek-R1, Health education, Question-answering performance, Generation quality

中图分类号: 

  • R776
[1] Lian ZK, Hu Y, Liu ZZ, et al. Longitudinal changes of refractive error in preschool children with congenital ectopia lentis[J]. Int Ophthalmol, 2024, 44(1): 85. doi: 10.1007/s10792-024-02953-w
[2] Chandra A, Aragon-Martin JA, Hughes K, et al. A genotype-phenotype comparison of ADAMTSL4 and FBN1 in isolated ectopia lentis[J]. Invest Ophthalmol Vis Sci, 2012, 53(8): 4889-4896.
[3] Chandra A, Patel D, Aragon-Martin JA, et al. The revised Ghent nosology; reclassifying isolated ectopia lentis[J]. Clin Genet, 2015, 87(3): 284-287.
[4] Sakai LY, Keene DR, Renard M, et al. FBN1 The di-sease-causing gene for Marfan syndrome and other genetic disorders[J]. Gene, 2016, 591(1): 279-291.
[5] Evereklioglu C, Hepsen IF, Er H. Weill-Marchesani syndrome in three generations[J]. Eye(Lond), 1999, 13(6): 773-777.
[6] Morris AAM, Kožich V, Santra S, et al. Guidelines for the diagnosis and management of cystathionine beta-synthase deficiency[J]. J Inherit Metab Dis, 2017, 40(1): 49-74.
[7] Claerhout H, Witters P, Régal L, et al. Isolated sulfite oxidase deficiency[J]. J Inherit Metab Dis, 2018, 41(1): 101-108.
[8] Fuchs J, Rosenberg T. Congenital ectopia lentis, A Da-nish national survey[J]. Acta Ophthalmol Scand, 1998, 76(1): 20-26.
[9] Yang L, Wu QH, Hao YH, et al. Self-management behavior among patients with diabetic retinopathy in the community: a structural equation model[J]. Qual Life Res, 2017, 26(2): 359-366.
[10] Ayers JW, Poliak A, Dredze M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum[J]. JAMA Intern Med, 2023, 183(6): 589-596.
[11] Sinsky CA, Shanafelt TD, Ripp JA. The electronic health record inbox: recommendations for relief[J]. J Gen Intern Med, 2022, 37(15): 4002-4003.
[12] Holmgren AJ, Byron ME, Grouse CK, et al. Association between billing patient portal messages as e-visits and patient messaging volume[J]. JAMA, 2023, 329(4): 339-342.
[13] Stroop A, Stroop T, Zawy Alsofy S, et al. Large language models: Are artificial intelligence-based chatbots a reliable source of patient information for spinal surgery?[J]. Eur Spine J, 2024, 33(11): 4135-4143.
[14] Kusunose K, Kashima S, Sata M. Evaluation of the accuracy of ChatGPT in answering clinical questions on the Japanese society of hypertension guidelines[J]. Circ J, 2023, 87(7): 1030-1033.
[15] Saibene AM, Allevi F, Calvo-Henriquez C, et al. Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation[J]. Eur Arch Otorhinolaryngol, 2024, 281(4): 1835-1841.
[16] Cheong KX, Zhang CX, Tan TN, et al. Comparing gen-erative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy[J]. Br J Ophthalmol, 2024, 108(10): 1443-1449.
[17] Thirunavukarasu AJ, Hassan R, Mahmood S, et al. Trialling a large language model(ChatGPT)in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care[J]. JMIR Med Educ, 2023, 9: e46599.
[18] Athaluri SA, Manthena SV, Kesapragada VSRKM, et al. Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references[J]. Cureus, 2023, 15(4): e37432. doi: 10.7759/cureus.37432
[19] 王子星, 齐乐, 廉晓丹, 等. 医疗领域聊天机器人的发展与应用:从传统方法到大语言模型[J]. 协和医学杂志, 2025, 16(5): 1170-1178. WANG Zixing, QI Le, LIAN Xiaodan, et al. The development and application of chatbots in healthcare: from traditional methods to large language models[J]. Medical Journal of Peking Union Medical College Hospital, 2025, 16(5): 1170-1178.
[20] Tonsaker T, Bartlett G, Trpkov C. Health information on the Internet: gold mine or minefield?[J]. Can Fam Physician, 2014, 60(5): 407-408.
[21] Vaira LA, Lechien JR, Abbate V, et al. Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis[J]. Otolaryngol Head Neck Surg, 2024, 170(6): 1492-1503.
[22] Ayers JW, Poliak A, Dredze M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum[J]. JAMA Intern Med, 2023, 183(6): 589-596.
[23] Link E, Baumann E. Use of health information on the Internet: personal and motivational influencing factors[J]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz, 2020, 63(6): 681-689.
[24] Cakir H, Caglar U, Halis A, et al. Assessing the know-ledge of ChatGPT in answering questions regarding female urology[J]. Urol J, 2024, 21(6): 410-414.
[25] Aydın FO, Aksoy BK, Ceylan A, et al. Readability and appropriateness of responses generated by ChatGPT 3.5, ChatGPT 4.0, gemini, and microsoft copilot for FAQs in refractive surgery[J]. Turk J Ophthalmol, 2024, 54(6): 313-317.
[26] Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
[26] Ali S, Abdullah, Armand TPT, et al. Metaverse in healthcare integrated with explainable AI and blockchain: enabling immersiveness, ensuring trust, and providing patient data security[J]. Sensors(Basel), 2023, 23(2): 565. doi: 10.3390/s23020565
[27] Kelly CJ, Karthikesalingam A, Suleyman M, et al. Key challenges for delivering clinical impact with artificial intelligence[J]. BMC Med, 2019, 17(1): 195. doi: 10.1186/s12916-019-1426-2
[28] Khanna RK, Ducloyer JB, Hage A, et al. Evaluating the potential of ChatGPT-4 in ophthalmology: the good, the bad and the ugly[J]. J Fr Ophtalmol, 2023, 46(7): 697-705.
[29] Rasu RS, Bawa WA, Suminski R, et al. Health literacy impact on national healthcare utilization and expenditure[J]. Int J Health Policy Manag, 2015, 4(11): 747-755.
[30] 高飞, 高雪, 邵彦, 等. 大语言模型在糖尿病视网膜病变患者健康教育中的应用[J]. 中华实验眼科杂志, 2024, 42(12): 1111-1118. GAO Fei, GAO Xue, SHAO Yan, et al. Application of large language models in health education for patients with diabetic retinopathy[J]. Chinese Journal of Experimental Ophthalmology, 2024, 42(12): 1111-1118.
[1] 魏书生,吴海波,李松林,温镇璘,杨昌骜,卢群山,刘培来. 大型语言模型在骨科手术术前管理中的决策性能及辅助价值[J]. 山东大学学报 (医学版), 2026, 64(2): 104-110.
[2] 武琪琪,成淼淼,肖晓燕. 多模态模型在肾脏病领域的应用[J]. 山东大学学报 (医学版), 2025, 63(10): 117-124.
[3] 王琼,李欣宇,徐磊,周成超,江帆. 社区听力康复对老年听力障碍患者沟通能力的干预效果评估:一项随机对照试验[J]. 山东大学学报 (医学版), 2024, 62(11): 96-104.
[4] 黄娟萍, 康年松. 健康教育护理干预对功能性消化不良患者抗焦虑治疗依从性及随访率的影响[J]. 山东大学学报(医学版), 2014, 52(Z1): 164-165.
[5] 张英. 健康教育在128例老年高血压患者临床护理中的应用[J]. 山东大学学报(医学版), 2014, 52(Z1): 193-194.
[6] 万黎萍, 罗春媚, 金珠明. 多媒体技术在呼吸科健康教育中的应用[J]. 山东大学学报(医学版), 2014, 52(Z1): 187-187.
[7] 徐颖, 卢志坤, 王晓昆. 不同随访方式对下肢深静脉血栓出院患者满意度的影响[J]. 山东大学学报(医学版), 2014, 52(Z1): 192-192.
[8] 邓世红, 竺红宇, 陈静, 汤丽玲, 耿婷婷, 吴雪. 健康教育在预防胃肠术后直立性低血压中探讨应用[J]. 山东大学学报(医学版), 2014, 52(S2): 126-126.
[9] 刘吉伟, 张玉姝. 急性闭角型青光眼护理干预探讨[J]. 山东大学学报(医学版), 2014, 52(S2): 129-129.
[10] 彩云,王束玫,冯月秋,高莉洁,房学强. 济南市城乡结合部居民代谢性疾病及相关知识知晓率调查[J]. 山东大学学报(医学版), 2006, 44(11): 1167-1170.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!