Journal of Shandong University (Health Sciences) ›› 2026, Vol. 64 ›› Issue (2): 104-110.doi: 10.6040/j.issn.1671-7554.0.2025.1327

• Clinical Medicine • Previous Articles    

Decision performance and auxiliary value of large language models in preoperative management of orthopedic surgery

WEI Shusheng, WU Haibo, LI Songlin, WEN Zhenlin, YANG Changao, LU Qunshan, LIU Peilai   

  1. Department of Orthopedics, Qilu Hospital of Shandong University, Jinan 250012, shandong, China
  • Published:2026-02-10

Abstract: Objective To explore the application effectiveness of different generation modes of large language models(such as DeepSeek, ChatGPT, etc.)in the field of preoperative management and their value in assisting decision-making processes for junior physicians. Methods A total of 100 medical history records of orthopedic inpatients at Qilu Hospital of Shandong University were randomly selected from January to August 2025. Patients who were scheduled to undergo Grade I, II, III surgeries and non-joint replacement surgeries were excluded, resulting in the inclusion of total 87 patients. Guidelines related to perioperative management were retrieved from databases such as PubMed and UpToDate. After text processing and vectorization, these guidelines were used to build a perioperative management knowledge base, providing external knowledge support for subsequent model calls and question-answering tasks. The anonymized patient records were uploaded to different versions of the DeepSeek model [DeepSeek Chat version(V3), DeepSeek Chat + knowledge base version, DeepSeek Deep Thinking version(R1), and DeepSeek R1 + knowledge base version], and questions were posed under the identical “Instruction-Context-Input-Output(ICIO)” prompt framework. The model outputs were evaluated both objectively and subjectively. Results The DeepSeek R1 model achieved accuracy rates of 75.86% and 78.16% in the Revised Cardiac Risk Index(RCRI)scoring and risk classification tasks, respectively, significantly outperforming the Chat series models. All four model versions showed moderate accuracy in the American Society of Anesthesiologists(ASA)physical status classification and surgical feasibility judgment, with the R1 version performing slightly better. The introduction of the knowledge base slightly improved RCRI scoring accuracy only in the Chat version(+4.6%)but reduced performance in the R1 version. Subjective evaluation results indicated that junior physicians generally considered the R1 series models answers to be of greater clinical reference value, with an average score(4.19±0.72)significantly higher than that of the Chat series(Chat version: 3.06±0.06; Chat + knowledge base version: 2.97±0.03). This suggested that the R1 model has stronger practicality and acceptability in preoperative decision support(P<0.05). Conclusion The DeepSeek R1 model demonstrates good application potential in orthopedic preoperative anesthesia risk assessment and clinical decision support. However, knowledge base building and task adaptation require further optimization to enhance the models reliability and generalizability in real clinical scenarios.

Key words: Large language model, DeepSeek, Preoperative decision-making, Knowledge base, Revised cardiac risk index score

CLC Number: 

  • R684
[1] 谢昉, 冯艳, 孙德峰. 围手术期规范化麻醉评估流程在日间手术中的应用[J]. 华西医学, 2021, 36(2): 144-151. XIE Fang, FENG Yan, SUN Defeng. Role of perioperative standardized anesthesia evaluation in day surgery[J]. West Chin Med J, 2021, 36(2): 144-151.
[2] 郭振江, 王宁, 赵光远, 等. 基于机器学习建立术前预测近端胃癌食管切缘阳性模型[J]. 山东大学学报(医学版), 2024, 62(7): 78-83. GUO Zhenjiang, WANG Ning, ZHAO Guangyuan, et al. Development of preoperative models for predicting positive esophageal margin in proximal gastric cancer based on machine learning[J]. Journal of Shandong University(Health Sciences), 2024, 62(7): 78-83.
[3] Selpien H, Penon J, Thunecke D, et al. Adjustment of positive end-expiratory pressure based on body mass index during general anaesthesia: a randomised controlled trial[J]. Anaesthesia, 2025, 80(11): 1322-1332.
[4] Lin C, Abboud S, Zoghbi V, et al. Suprazygomatic maxillary nerve blocks and opioid requirements in pediatric adenotonsillectomy: a randomized clinical trial[J]. JAMA Otolaryngol Head Neck Surg, 2024, 150(7): 564. doi:10.1001/jamaoto.2024.1011
[5] 王文奇, 郭梦帆, 杨杜祥, 等. 大语言模型发展与应用综述[J]. 中原工学院学报, 2025, 36(2): 1-8. WANG Wenqi, GUO Mengfan, YANG Duxiang, et al. Overview of the development and applications of large language models[J]. Journal of Zhongyuan University of Technology, 2025, 36(2): 1-8.
[6] Shool S, Adimi S, Saboori Amleshi R, et al. A systematic review of large language model(LLM)evaluations in clinical medicine[J]. BMC Med Inform Decis Mak, 2025, 25(1): 117. doi:10.1186/s12911-025-02954-4
[7] 薛东, 杨思毅, 杜晗, 等. 大语言模型的发展现状及引信行业赋能路径展望[J]. 探测与控制学报, 2025, 47(4): 9-20. XUE Dong, YANG Siyi, DU Han, et al. The large language models development status and outlook on empowering fuze industry[J]. Journal of Detection Control, 2025, 47(4): 9-20.
[8] Liu BHM, Lin YZ, Long X, et al. Utilizing AI for the identification and validation of novel therapeutic targets and repurposed drugs for endometriosis[J]. Adv Sci, 2025, 12(5): 2406565. doi:10.1002/advs.202406565
[9] Brügge E, Ricchizzi S, Arenbeck M, et al. Large language models improve clinical decision making of medical students through patient simulation and structured feedback: a randomized controlled trial[J]. BMC Med Educ, 2024, 24(1): 1391. doi:10.1186/s12909-024-06399-7
[10] Ye XD, Shan XF, Tu YF, et al. Examining the efficacy of large language models for mitigating depression and anxiety among Chinese students: a randomized controlled trial[J]. CIN Comput Inform Nurs, 2025, 43(9):e01349. doi:10.1097/cin.0000000000001349
[11] 陈紫林, 祝帆帆, 罗宇昕, 等. 大语言模型在医疗健康领域的应用现状与前景展望[J]. 医学与哲学, 2025, 46(12): 32-37. CHEN Zilin, ZHU Fanfan, LUO Yuxin, et al. Overview of the development and applications of large language models[J]. Medicine Philosophy, 2025, 46(12): 32-37.
[12] 张晓波, 冯瑞, 杨睿, 等. DeepSeek赋能的儿科全流程智慧医疗系统的构建和应用效果评价[J]. 中国循证儿科杂志, 2025, 20(3): 217-222. ZHANG Xiaobo, FENG Rui, YANG Rui, et al. A DeepSeek-enabled intelligent pediatric healthcare system: construction and application effectiveness evaluation[J]. Chinese Journal of Evidence-Based Pediatrics, 2025, 20(3): 217-222.
[13] Uzel K, Azboy(·overI), Parvizi J. Venous thromboembolism in orthopedic surgery: global guidelines[J]. Acta Orthop Traumatol Turc, 2023, 57(5): 192-203.
[14] Sigmund A, Russell LA. Optimizing rheumatoid arthritis patients for surgery[J]. Curr Rheumatol Rep, 2018, 20(8): 48. doi:10.1007/s11926-018-0757-x
[15] Grits D, Kuo A, Acuña AJ, et al. The association between perioperative blood transfusions and venous thromboembolism risk following surgical management of hip fractures[J]. J Orthop, 2022, 34: 123-131. doi:10.1016/j.jor.2022.08.016
[16] Arraut J, Thomas J, Oakley CT, et al. The AAHKS best podium presentation research award: a second dose of dexamethasone reduces postoperative opioid consumption and pain in total joint arthroplasty[J]. J Arthroplasty, 2023, 38(7): S21-S28.
[17] Santos Gomes MA, Kovaleski JL, Pagani RN, et al. Machine learning applied to healthcare: a conceptual review[J]. J Med Eng Technol, 2022, 46(7): 608-616.
[18] Rashidi HH, Pantanowitz J, Hanna MG, et al. Introduction to artificial intelligence and machine learning in pathology and medicine: generative and nongenerative artificial intelligence basics[J]. Mod Pathol, 2025, 38(4): 100688. doi:10.1016/j.modpat.2024.100688
[19] Cheng TT, Li Y, Gu JQ, et al. The performance of ChatGPT in day surgery and pre-anesthesia risk assessment: a case-control study of 150 simulated patient pre-sentations[J]. Perioper Med, 2024, 13(1): 111. doi:10.1186/s13741-024-00469-6
[20] Abdel Malek M, van Velzen M, Dahan A, et al. Gene-ration of preoperative anaesthetic plans by ChatGPT-4.0: a mixed-method study.[J]. Br J Anaesth, 2025, 134(5):1333-1340.
[21] Pedrosa E, Silva M, Lobo A, et al. Is the ASA classification universal?[J]. Turk J Anaesthesiol Reanim, 2021, 49(4): 298-303.
[22] Lee TH, Marcantonio ER, Mangione CM, et al. Derivation and prospective validation of a simple index for prediction of cardiac risk of major noncardiac surgery[J]. Circulation, 1999, 100(10): 1043-1049.
[23] Omiye JA, Gui HW, Rezaei SJ, et al. Large language models in medicine: the potentials and pitfalls: a narrative review[J]. Ann Intern Med, 2024, 177(2): 210-220.
[24] Sandmann S, Hegselmann S, Fujarski M, et al. Benchmark evaluation of DeepSeek large language models in clinical decision-making[J]. Nat Med, 2025, 31(8): 2546-2549.
[25] Jebb AT, Ng V, Tay L. A review of key likert scale development advances: 1995-2019[J]. Front Psychol, 2021, 12: 637547. doi:10.3389/fpsyg.2021.637547
[26] Wysocka M, Wysocki O, Delmas M, et al. Large language Models, scientific knowledge and factuality: a framework to streamline human expert evaluation[J]. J Biomed Inform, 2024, 158: 104724. doi:10.1016/j.jbi.2024.104724
[27] Bedi S, Liu YT, Orr-Ewing L, et al. Testing and evaluation of health care applications of large language models: a systematic review[J]. Jama, 2025, 333(4): 319. doi:10.1001/jama.2024.21700
[28] Peng YF, Malin BA, Rousseau JF, et al. From GPT to DeepSeek: significant gaps remain in realizing AI in healthcare[J]. J Biomed Inform, 2025, 163: 104791. doi:10.1016/j.jbi.2025.104791
[29] Hager P, Jungmann F, Holland R, et al. Evaluation and mitigation of the limitations of large language models in clinical decision-making[J]. Nat Med, 2024, 30(9): 2613-2622.
[30] Tordjman M, Liu ZL, Yuce M, et al. Comparative benchmarking of the DeepSeek large language model on medical tasks and clinical reasoning[J]. Nat Med, 2025, 31(8): 2550-2555.
[31] 巴宏军, 陈佳睿, 胡晗, 等. 住院医师对人工智能应用的认知与态度调查[J]. 中华医学教育杂志, 2025, 45(3):194-197. BA Hongjun, CHEN Jiarui, HU Han, et al. Survey on residents perception and attitudes towards the application of artificial intelligence[J]. Chinese Journal of Medical Education, 2025, 45(3): 194-197.
[1] WU Qiqi, CHENG Miaomiao, XIAO Xiaoyan. Multimodal models in the field of kidney disease [J]. Journal of Shandong University (Health Sciences), 2025, 63(10): 117-124.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!