Journal of Shandong University (Health Sciences) ›› 2026, Vol. 64 ›› Issue (5): 88-95.doi: 10.6040/j.issn.1671-7554.0.2025.0262

• Clinical Medicine • Previous Articles     Next Articles

Evaluating the efficacy of large language models in answering questions from parents of children with congenital lens dislocation

CHEN Yumeng1,2, ZHANG Yue2, ZHANG Wulin2, YANG Guoxing2, XU Yanhui2, HAN Aijun2, LIU Caijuan2, GUO Yuyu1,2, CHEN Zhimin2   

  1. 1. Department of Ophthalmology, Hebei Medical University, Shijiazhuang 050017, Hebei, China;
    2. Hebei Eye Hospital/Key Laboratory of Ophthalmology in Hebei Province/Hebei Provincial Clinical Research Center for Ocular Diseases, Xingtai 054001, Hebei, China
  • Online:2026-05-13 Published:2026-05-13

Abstract: Objective To evaluate the accuracy, completeness, and emotional supportiveness of domestic open-source large language models(LLMs)in answering common diagnostic and therapeutic questions from parents of children with congenital ectopia lentis(CEL), and to explore the feasibility of using LLMs as intelligent health education assistants for parents of CEL children. Methods A question bank comprising 33 CEL-related diagnosis and treatment questions was constructed. Three senior attending ophthalmologists specializing in cataract independently evaluated the answers generated by three LLMs(Kimi chat, Doubao, and DeepSeek-R1)using a blinded assessment method with Likert scales(1-6 for accuracy, 1-3 for completeness and emotional support). Based on preliminary evaluation results, the best-performing model overall, DeepSeek-R1, was selected for a comprehensive evaluation on the entire question bank. Results Among the three LLMs, DeepSeek-R1 performed the best. The proportions of its answers achieving accuracy(≥5 points), completeness(≥2 points), and emotional support(≥2 points)scores were 78.8%, 87.9%, and 69.7%, respectively. The evaluators recommendation rate for its answers was 75.8%(150/198). Its responses were excellent in areas such as treatment, prognosis, and symptoms, but were slightly weaker in disease diagnosis. The word count of DeepSeek-R1s responses was significantly higher than that of human answers(P<0.05), and the word count showed a positive correlation with completeness scores(rs0.608, P<0.05). The intraclass correlation coefficient among the three raters for all ratings was above 0.700, indicating good reliability. Conclusion DeepSeek-R1 demonstrates high accuracy, completeness, and emotional support in answering CEL-related diagnosis and treatment questions. However, its application in disease diagnosis requires cautious interpretation and should be used under professional guidance.

Key words: Congenital ectopia lentis, Large language model, DeepSeek-R1, Health education, Question-answering performance, Generation quality

CLC Number: 

  • R776
[1] Lian ZK, Hu Y, Liu ZZ, et al. Longitudinal changes of refractive error in preschool children with congenital ectopia lentis[J]. Int Ophthalmol, 2024, 44(1): 85. doi: 10.1007/s10792-024-02953-w
[2] Chandra A, Aragon-Martin JA, Hughes K, et al. A genotype-phenotype comparison of ADAMTSL4 and FBN1 in isolated ectopia lentis[J]. Invest Ophthalmol Vis Sci, 2012, 53(8): 4889-4896.
[3] Chandra A, Patel D, Aragon-Martin JA, et al. The revised Ghent nosology; reclassifying isolated ectopia lentis[J]. Clin Genet, 2015, 87(3): 284-287.
[4] Sakai LY, Keene DR, Renard M, et al. FBN1 The di-sease-causing gene for Marfan syndrome and other genetic disorders[J]. Gene, 2016, 591(1): 279-291.
[5] Evereklioglu C, Hepsen IF, Er H. Weill-Marchesani syndrome in three generations[J]. Eye(Lond), 1999, 13(6): 773-777.
[6] Morris AAM, Kožich V, Santra S, et al. Guidelines for the diagnosis and management of cystathionine beta-synthase deficiency[J]. J Inherit Metab Dis, 2017, 40(1): 49-74.
[7] Claerhout H, Witters P, Régal L, et al. Isolated sulfite oxidase deficiency[J]. J Inherit Metab Dis, 2018, 41(1): 101-108.
[8] Fuchs J, Rosenberg T. Congenital ectopia lentis, A Da-nish national survey[J]. Acta Ophthalmol Scand, 1998, 76(1): 20-26.
[9] Yang L, Wu QH, Hao YH, et al. Self-management behavior among patients with diabetic retinopathy in the community: a structural equation model[J]. Qual Life Res, 2017, 26(2): 359-366.
[10] Ayers JW, Poliak A, Dredze M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum[J]. JAMA Intern Med, 2023, 183(6): 589-596.
[11] Sinsky CA, Shanafelt TD, Ripp JA. The electronic health record inbox: recommendations for relief[J]. J Gen Intern Med, 2022, 37(15): 4002-4003.
[12] Holmgren AJ, Byron ME, Grouse CK, et al. Association between billing patient portal messages as e-visits and patient messaging volume[J]. JAMA, 2023, 329(4): 339-342.
[13] Stroop A, Stroop T, Zawy Alsofy S, et al. Large language models: Are artificial intelligence-based chatbots a reliable source of patient information for spinal surgery?[J]. Eur Spine J, 2024, 33(11): 4135-4143.
[14] Kusunose K, Kashima S, Sata M. Evaluation of the accuracy of ChatGPT in answering clinical questions on the Japanese society of hypertension guidelines[J]. Circ J, 2023, 87(7): 1030-1033.
[15] Saibene AM, Allevi F, Calvo-Henriquez C, et al. Reliability of large language models in managing odontogenic sinusitis clinical scenarios: a preliminary multidisciplinary evaluation[J]. Eur Arch Otorhinolaryngol, 2024, 281(4): 1835-1841.
[16] Cheong KX, Zhang CX, Tan TN, et al. Comparing gen-erative and retrieval-based chatbots in answering patient questions regarding age-related macular degeneration and diabetic retinopathy[J]. Br J Ophthalmol, 2024, 108(10): 1443-1449.
[17] Thirunavukarasu AJ, Hassan R, Mahmood S, et al. Trialling a large language model(ChatGPT)in general practice with the applied knowledge test: observational study demonstrating opportunities and limitations in primary care[J]. JMIR Med Educ, 2023, 9: e46599.
[18] Athaluri SA, Manthena SV, Kesapragada VSRKM, et al. Exploring the boundaries of reality: investigating the phenomenon of artificial intelligence hallucination in scientific writing through ChatGPT references[J]. Cureus, 2023, 15(4): e37432. doi: 10.7759/cureus.37432
[19] 王子星, 齐乐, 廉晓丹, 等. 医疗领域聊天机器人的发展与应用:从传统方法到大语言模型[J]. 协和医学杂志, 2025, 16(5): 1170-1178. WANG Zixing, QI Le, LIAN Xiaodan, et al. The development and application of chatbots in healthcare: from traditional methods to large language models[J]. Medical Journal of Peking Union Medical College Hospital, 2025, 16(5): 1170-1178.
[20] Tonsaker T, Bartlett G, Trpkov C. Health information on the Internet: gold mine or minefield?[J]. Can Fam Physician, 2014, 60(5): 407-408.
[21] Vaira LA, Lechien JR, Abbate V, et al. Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis[J]. Otolaryngol Head Neck Surg, 2024, 170(6): 1492-1503.
[22] Ayers JW, Poliak A, Dredze M, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum[J]. JAMA Intern Med, 2023, 183(6): 589-596.
[23] Link E, Baumann E. Use of health information on the Internet: personal and motivational influencing factors[J]. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz, 2020, 63(6): 681-689.
[24] Cakir H, Caglar U, Halis A, et al. Assessing the know-ledge of ChatGPT in answering questions regarding female urology[J]. Urol J, 2024, 21(6): 410-414.
[25] Aydın FO, Aksoy BK, Ceylan A, et al. Readability and appropriateness of responses generated by ChatGPT 3.5, ChatGPT 4.0, gemini, and microsoft copilot for FAQs in refractive surgery[J]. Turk J Ophthalmol, 2024, 54(6): 313-317.
[26] Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining[J]. Bioinformatics, 2020, 36(4): 1234-1240.
Ali S, Abdullah, Armand TPT, et al. Metaverse in healthcare integrated with explainable AI and blockchain: enabling immersiveness, ensuring trust, and providing patient data security[J]. Sensors(Basel), 2023, 23(2): 565. doi: 10.3390/s23020565
[27] Kelly CJ, Karthikesalingam A, Suleyman M, et al. Key challenges for delivering clinical impact with artificial intelligence[J]. BMC Med, 2019, 17(1): 195. doi: 10.1186/s12916-019-1426-2
[28] Khanna RK, Ducloyer JB, Hage A, et al. Evaluating the potential of ChatGPT-4 in ophthalmology: the good, the bad and the ugly[J]. J Fr Ophtalmol, 2023, 46(7): 697-705.
[29] Rasu RS, Bawa WA, Suminski R, et al. Health literacy impact on national healthcare utilization and expenditure[J]. Int J Health Policy Manag, 2015, 4(11): 747-755.
[30] 高飞, 高雪, 邵彦, 等. 大语言模型在糖尿病视网膜病变患者健康教育中的应用[J]. 中华实验眼科杂志, 2024, 42(12): 1111-1118. GAO Fei, GAO Xue, SHAO Yan, et al. Application of large language models in health education for patients with diabetic retinopathy[J]. Chinese Journal of Experimental Ophthalmology, 2024, 42(12): 1111-1118.
[1] WEI Shusheng, WU Haibo, LI Songlin, WEN Zhenlin, YANG Changao, LU Qunshan, LIU Peilai. Decision performance and auxiliary value of large language models in preoperative management of orthopedic surgery [J]. Journal of Shandong University (Health Sciences), 2026, 64(2): 104-110.
[2] WU Qiqi, CHENG Miaomiao, XIAO Xiaoyan. Multimodal models in the field of kidney disease [J]. Journal of Shandong University (Health Sciences), 2025, 63(10): 117-124.
[3] WANG Qiong, LI Xinyu, XU Lei, ZHOU Chengchao, JIANG Fan. Effectiveness evaluation of a community-based rehabilitation intervention on communication function of older adults with hearing impairment: a randomized controlled trial [J]. Journal of Shandong University (Health Sciences), 2024, 62(11): 96-104.
[4] CHANG Cai-yun,WANG Shu-mei,FENG Yue-qiu,GAO Li-jie,FANG Xue-qiang. Knowledge, attitudes and practices of Jinan ruralurban fringe population towards chronic metabolic disease: a crosssectional study [J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2006, 44(11): 1167-1170.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!