您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(医学版)》

山东大学学报(医学版) ›› 2015, Vol. 53 ›› Issue (2): 92-96.doi: 10.6040/j.issn.1671-7554.0.2014.476

• 公共卫生与管理学 • 上一篇    

基于随机森林的精神分裂症血清代谢组学研究

刘盈君1, 张涛1, 王璐2, 刘佳2, 常学润2, 张敬悬2, 薛付忠1   

  1. 1. 山东大学公共卫生学院流行病与卫生统计学系, 山东 济南 250012;
    2. 山东省精神卫生中心, 山东 济南 250014
  • 收稿日期:2014-07-21 发布日期:2015-02-10
  • 通讯作者: 薛付忠. E-mail:xuefzh@sdu.edu.cn E-mail:xuefzh@sdu.edu.cn
  • 基金资助:
    国家自然科学基金(81273177);山东省自然科学基金(ZR2013HQ056)

Serum metabolic profiling of schizophrenia based on random forest

LIU Yingjun1, ZHANG Tao1, WANG Lu2, LIU Jia2, CHANG Xuerun2, ZHANG Jingxuan2, XUE Fuzhong1   

  1. 1. Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Jinan 250012, Shandong, China;
    2. Shandong Mental Health Center, Jinan 250014, Shandong, China
  • Received:2014-07-21 Published:2015-02-10

摘要: 目的 探讨随机森林对精神分裂症患者和健康对照的血清代谢组学数据的分类能力,并筛选出差异代谢物.方法 病例组为50例精神分裂症患者,对照组为62例健康个体,收集他们的血清进行代谢组学检测,然后用随机森林对数据进行分类,用OOB误差率估计、五折交叉验证评价分类效果,借助随机森林中变量重要性评分(VIM)获得重要的差异代谢物.结果 随机森林对病例组和对照组的血清代谢组学数据分类效果较好.病例组错分率为4.0%,对照组错分率为1.6%.OOB误差率估计为2.68%,五折交叉验证ROC曲线下面积为0.99,并根据VIM筛选出15个重要的差异代谢物.结论 将液相色谱-质谱代谢组学技术与随机森林相结合,能够筛选出有潜在临床应用价值的代谢物,可用于代谢组学研究.

关键词: 分类, 随机森林, 变量筛选, 代谢组学, 精神分裂症

Abstract: Objective To explore the classification ability of random forest in the serum metabolic profiling of schizophrenia patients and healthy controls and to select significant metabolites. Methods The case group consisted of 50 patients with schizophrenia and control group consisted of 62 healthy individuals. The serum samples of case and control groups were collected and detected by RRLC-QTOF/MS platform. Random forest was used to classify the serum metabolic data in case and control groups. OOB estimate of error rate and 5 fold cross validation were used to evaluate the classification ability. In addition, variable importance measure of random forest was adopted to select important metabolites. Results Schizophrenia and control serum metabolic data could be classified well using the method of random forest. The misclassification rates in case and control groups were 4.0% and 1.6% respectively, OOB estimate of error rate was 2.68%, and the area under the curve of ROC was 0.99. Furthermore,15 important metabolites were selected according to variable importance measure. Conclusion The combination of liquid chromatography-mass spectrum technology with random forest can select metabolites with potential clinical application value, and be used in the study of metabolomics.

Key words: Metabolomics, Classification, Schizophrenia, Random forest, Variable selection

中图分类号: 

  • R749.3
[1] He Y, Yu Z, Giegling I, et al. Schizophrenia shows a unique metabolomics signature in plasma[J]. Transl Psychiatry, 2012, 2: e149.
[2] Ho PM, Rumsfeld JS. Cardiac risk management in severe mental illness[J]. Lancet, 2006, 367(9521): 1469-1471.
[3] Mittal VA, Ellman LM, Cannon TD. Gene-environment interaction and covariation in schizophrenia: the role of obstetric complications[J]. Schizophr Bull, 2008, 34(6): 1083-1094.
[4] Pishva E, Kenis G, van den Hove D, et al. The epigenome and postnatal environmental influences in psychotic disorders[J]. Soc Psychiatry Psychiatr Epidemiol, 2014, 49(3):337-348.
[5] Xuan J, Pan G, Qiu Y, et al. Metabolomic profiling to identify potential serum biomarkers for schizophrenia and risperidone action[J]. J Proteome Res, 2011, 10(12): 5433-5443.
[6] Suhre K, Shin SY, Petersen AK, et al. Human metabolic individuality in biomedical and pharmaceutical research[J]. Nature, 2011, 477(7362): 54-60.
[7] Patti GJ, Yanes O, Shriver LP, et al. Metabolomics implicates altered sphingolipids in chronic pain of neuropathic origin[J]. Nat Chem Biol, 2012, 8(3): 232-234.
[8] Quinones MP, Kaddurah-Daouk R. Metabolomics tools for identifying biomarkers for neuropsychiatric diseases[J]. Neurobiol Dis, 2009, 35(2): 165-176.
[9] Patti GJ, Yanes O, Siuzdak G. Innovation: Metabolomics: the apogee of the omics trilogy[J]. Nat Rev Mol Cell Biol, 2012, 13(4): 263-269.
[10] Kaddurah-Daouk R, McEvoy J, Baillie RA, et al. Metabolomic mapping of atypical antipsychotic effects in schizophrenia[J]. Mol Psychiatry, 2007, 12(10): 934-945.
[11] 白天,周春光,王喆,等. 代谢组学中机器学习研究进展[J]. 吉林大学学报:信息科学版, 2008,26(2):163-168. BAI Tian, ZHOU Chunguang, WANG Zhe, et al. Advances of machine learning in metabonomics[J]. Journal of Jilin University: Information Science Edition, 2008, 26(2):163-168.
[12] 柯朝甫,武晓岩,侯艳,等. 偏最小二乘判别分析交叉验证在代谢组学数据分析中的应用[J]. 中国卫生统计, 2014,31(1):85-87. KE Chaofu, WU Xiaoyan, HOU Yan, et al. Application of partial least square discriminant analysis cross validation in the serum metabolic profiling[J]. Chinese Journal of Health Statistics, 2014, 31(1):85-87.
[13] Breiman L.Random forests[J]. Machine Learning, 2001, 45(1):5-32.
[14] Wang H, Liu X, Lv B, et al. Reliable multi-label learning via conformal predictor and random forest for syndrome differentiation of chronic fatigue in traditional Chinese medicine[J]. PLoS One, 2014, 9(6): e99565.
[15] Casanova R, Saldana S, Chew EY, et al. Application of random forests methods to diabetic retinopathy classification analyses[J]. PLoS One, 2014, 9(6): e98587.
[16] Mendoza MR, da FGC, Loss-Morais G, et al. RFMirTarget: predicting human microRNA target genes with a random forest classifier[J]. PLoS One, 2013, 8(7): e70153.
[17] Hijazi H, Chan C. A classification framework applied to cancer gene expression profiles[J]. J Healthc Eng, 2013, 4(2): 255-283.
[18] Fathi F, Majari-Kasmaee L, Mani-Varnosfaderani A, et al. 1H NMR based metabolic profiling in Crohn's disease by random forest methodology[J]. Magn Reson Chem, 2014, 52(7): 370-376.
[19] 方匡南,吴见彬,朱建平,等. 随机森林方法研究综述[J].统计与信息论坛, 2011,26(3):32-37. FANG Kuangnan, WU Jianbin, ZHU Jianping, et al. Review of random forest method[J]. Statistics and Information Forum, 2011, 26(3):32-37.
[20] Huang JH, Wen M, Tang LJ, et al. Using random forest to classify linear B-cell epitopes based on amino acid properties and molecular features[J]. Biochimie, 2014, 103:1-6.
[1] 陆晨琳,许露,杨俊发,潘华琴,倪清涛. 基于动态列线图及机器学习的慢性阻塞性肺疾病急性加重期伴发肺性脑病风险预测模型构建及验证[J]. 山东大学学报 (医学版), 2026, 64(3): 108-115.
[2] 林长泼,符伟国. Stanford B型主动脉夹层的诊疗进展[J]. 山东大学学报 (医学版), 2024, 62(9): 7-12.
[3] 浩妍,崔丽梅,陈颖,杨玉娟,宋西成. 代谢组学在气道炎症性疾病中的应用[J]. 山东大学学报 (医学版), 2024, 62(12): 117-124.
[4] 王智璠,林小仙,阴佳璐,王东亮,王姝麒. 不同炖煮温度燕窝次生代谢化学成分组学[J]. 山东大学学报 (医学版), 2023, 61(4): 10-17.
[5] 黄珊,娄能俊,韩晓琳,梁中昊,华梦羽,庄向华,陈诗鸿. 高糖环境下Lipin1对神经元代谢组学的影响[J]. 山东大学学报 (医学版), 2023, 61(2): 1-8.
[6] 李佳博,欧阳江峰,杨丁,高京,孙强. 长效针剂在三省部分社区精神分裂症患者中的应用现状及存在问题[J]. 山东大学学报 (医学版), 2022, 60(6): 107-113.
[7] 苑宝文,王沛,黄蔚. 组蛋白去乙酰化酶SIRT1对胰腺癌代谢的调控作用[J]. 山东大学学报 (医学版), 2022, 60(3): 1-12.
[8] 柴佳威,朱坤兵,李亚琼,王甜甜. 隐匿性甲状腺癌:1例病例报道和文献回顾[J]. 山东大学学报 (医学版), 2021, 59(1): 83-87.
[9] 王晓璇,朱高培,孙娜,冯佳宁,肖宇飞,石福艳,王素珍. 东明县三春集镇贫困人群健康状况及影响因素分析[J]. 山东大学学报 (医学版), 2021, 59(1): 108-114.
[10] 章海容, 张小红, 王超群. 哺乳动物雷帕霉素靶蛋白通路调控ECA109细胞放疗敏感性的代谢组学[J]. 山东大学学报 (医学版), 2020, 58(1): 6-12.
[11] 代晓宇,路媛,王志恒,李明卓,司书成,李吉庆, 井明, 薛付忠. 血常规数据判别骨髓增生异常综合征和急性髓样白血病的应用价值[J]. 山东大学学报 (医学版), 2020, 58(1): 20-25.
[12] 于斐,刘少壮,仲明惟,黄鑫,焦杰,胡三元,于文滨. 基于GC-TOF-MS的结直肠癌代谢组学差异分析[J]. 山东大学学报(医学版), 2016, 54(7): 60-68.
[13] 公晓云,申小涛,徐静,张涛,朱正江,薛付忠. 代谢组学数据正态性对疾病分类准确性的影响[J]. 山东大学学报(医学版), 2016, 54(4): 89-93.
[14] 龚晴, 张昊, 乔颖. 精神分裂症患者家属生活质量调查分析[J]. 山东大学学报(医学版), 2014, 52(Z2): 177-178.
[15] 王玉琴, 哈秀英, 徐卫国. 三种抗精神分裂症药物的疗效和经济学评价[J]. 山东大学学报(医学版), 2014, 52(Z2): 196-197.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!