您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(医学版)》

山东大学学报(医学版) ›› 2015, Vol. 53 ›› Issue (2): 92-96.doi: 10.6040/j.issn.1671-7554.0.2014.476

• 公共卫生与管理学 • 上一篇    

基于随机森林的精神分裂症血清代谢组学研究

刘盈君1, 张涛1, 王璐2, 刘佳2, 常学润2, 张敬悬2, 薛付忠1   

  1. 1. 山东大学公共卫生学院流行病与卫生统计学系, 山东 济南 250012;
    2. 山东省精神卫生中心, 山东 济南 250014
  • 收稿日期:2014-07-21 发布日期:2015-02-10
  • 通讯作者: 薛付忠. E-mail:xuefzh@sdu.edu.cn E-mail:xuefzh@sdu.edu.cn
  • 基金资助:
    国家自然科学基金(81273177);山东省自然科学基金(ZR2013HQ056)

Serum metabolic profiling of schizophrenia based on random forest

LIU Yingjun1, ZHANG Tao1, WANG Lu2, LIU Jia2, CHANG Xuerun2, ZHANG Jingxuan2, XUE Fuzhong1   

  1. 1. Department of Epidemiology and Biostatistics, School of Public Health, Shandong University, Jinan 250012, Shandong, China;
    2. Shandong Mental Health Center, Jinan 250014, Shandong, China
  • Received:2014-07-21 Published:2015-02-10

摘要: 目的 探讨随机森林对精神分裂症患者和健康对照的血清代谢组学数据的分类能力,并筛选出差异代谢物.方法 病例组为50例精神分裂症患者,对照组为62例健康个体,收集他们的血清进行代谢组学检测,然后用随机森林对数据进行分类,用OOB误差率估计、五折交叉验证评价分类效果,借助随机森林中变量重要性评分(VIM)获得重要的差异代谢物.结果 随机森林对病例组和对照组的血清代谢组学数据分类效果较好.病例组错分率为4.0%,对照组错分率为1.6%.OOB误差率估计为2.68%,五折交叉验证ROC曲线下面积为0.99,并根据VIM筛选出15个重要的差异代谢物.结论 将液相色谱-质谱代谢组学技术与随机森林相结合,能够筛选出有潜在临床应用价值的代谢物,可用于代谢组学研究.

关键词: 分类, 随机森林, 变量筛选, 代谢组学, 精神分裂症

Abstract: Objective To explore the classification ability of random forest in the serum metabolic profiling of schizophrenia patients and healthy controls and to select significant metabolites. Methods The case group consisted of 50 patients with schizophrenia and control group consisted of 62 healthy individuals. The serum samples of case and control groups were collected and detected by RRLC-QTOF/MS platform. Random forest was used to classify the serum metabolic data in case and control groups. OOB estimate of error rate and 5 fold cross validation were used to evaluate the classification ability. In addition, variable importance measure of random forest was adopted to select important metabolites. Results Schizophrenia and control serum metabolic data could be classified well using the method of random forest. The misclassification rates in case and control groups were 4.0% and 1.6% respectively, OOB estimate of error rate was 2.68%, and the area under the curve of ROC was 0.99. Furthermore,15 important metabolites were selected according to variable importance measure. Conclusion The combination of liquid chromatography-mass spectrum technology with random forest can select metabolites with potential clinical application value, and be used in the study of metabolomics.

Key words: Metabolomics, Classification, Schizophrenia, Random forest, Variable selection

中图分类号: 

  • R749.3
[1] He Y, Yu Z, Giegling I, et al. Schizophrenia shows a unique metabolomics signature in plasma[J]. Transl Psychiatry, 2012, 2: e149.
[2] Ho PM, Rumsfeld JS. Cardiac risk management in severe mental illness[J]. Lancet, 2006, 367(9521): 1469-1471.
[3] Mittal VA, Ellman LM, Cannon TD. Gene-environment interaction and covariation in schizophrenia: the role of obstetric complications[J]. Schizophr Bull, 2008, 34(6): 1083-1094.
[4] Pishva E, Kenis G, van den Hove D, et al. The epigenome and postnatal environmental influences in psychotic disorders[J]. Soc Psychiatry Psychiatr Epidemiol, 2014, 49(3):337-348.
[5] Xuan J, Pan G, Qiu Y, et al. Metabolomic profiling to identify potential serum biomarkers for schizophrenia and risperidone action[J]. J Proteome Res, 2011, 10(12): 5433-5443.
[6] Suhre K, Shin SY, Petersen AK, et al. Human metabolic individuality in biomedical and pharmaceutical research[J]. Nature, 2011, 477(7362): 54-60.
[7] Patti GJ, Yanes O, Shriver LP, et al. Metabolomics implicates altered sphingolipids in chronic pain of neuropathic origin[J]. Nat Chem Biol, 2012, 8(3): 232-234.
[8] Quinones MP, Kaddurah-Daouk R. Metabolomics tools for identifying biomarkers for neuropsychiatric diseases[J]. Neurobiol Dis, 2009, 35(2): 165-176.
[9] Patti GJ, Yanes O, Siuzdak G. Innovation: Metabolomics: the apogee of the omics trilogy[J]. Nat Rev Mol Cell Biol, 2012, 13(4): 263-269.
[10] Kaddurah-Daouk R, McEvoy J, Baillie RA, et al. Metabolomic mapping of atypical antipsychotic effects in schizophrenia[J]. Mol Psychiatry, 2007, 12(10): 934-945.
[11] 白天,周春光,王喆,等. 代谢组学中机器学习研究进展[J]. 吉林大学学报:信息科学版, 2008,26(2):163-168. BAI Tian, ZHOU Chunguang, WANG Zhe, et al. Advances of machine learning in metabonomics[J]. Journal of Jilin University: Information Science Edition, 2008, 26(2):163-168.
[12] 柯朝甫,武晓岩,侯艳,等. 偏最小二乘判别分析交叉验证在代谢组学数据分析中的应用[J]. 中国卫生统计, 2014,31(1):85-87. KE Chaofu, WU Xiaoyan, HOU Yan, et al. Application of partial least square discriminant analysis cross validation in the serum metabolic profiling[J]. Chinese Journal of Health Statistics, 2014, 31(1):85-87.
[13] Breiman L.Random forests[J]. Machine Learning, 2001, 45(1):5-32.
[14] Wang H, Liu X, Lv B, et al. Reliable multi-label learning via conformal predictor and random forest for syndrome differentiation of chronic fatigue in traditional Chinese medicine[J]. PLoS One, 2014, 9(6): e99565.
[15] Casanova R, Saldana S, Chew EY, et al. Application of random forests methods to diabetic retinopathy classification analyses[J]. PLoS One, 2014, 9(6): e98587.
[16] Mendoza MR, da FGC, Loss-Morais G, et al. RFMirTarget: predicting human microRNA target genes with a random forest classifier[J]. PLoS One, 2013, 8(7): e70153.
[17] Hijazi H, Chan C. A classification framework applied to cancer gene expression profiles[J]. J Healthc Eng, 2013, 4(2): 255-283.
[18] Fathi F, Majari-Kasmaee L, Mani-Varnosfaderani A, et al. 1H NMR based metabolic profiling in Crohn's disease by random forest methodology[J]. Magn Reson Chem, 2014, 52(7): 370-376.
[19] 方匡南,吴见彬,朱建平,等. 随机森林方法研究综述[J].统计与信息论坛, 2011,26(3):32-37. FANG Kuangnan, WU Jianbin, ZHU Jianping, et al. Review of random forest method[J]. Statistics and Information Forum, 2011, 26(3):32-37.
[20] Huang JH, Wen M, Tang LJ, et al. Using random forest to classify linear B-cell epitopes based on amino acid properties and molecular features[J]. Biochimie, 2014, 103:1-6.
[1] 于斐,刘少壮,仲明惟,黄鑫,焦杰,胡三元,于文滨. 基于GC-TOF-MS的结直肠癌代谢组学差异分析[J]. 山东大学学报(医学版), 2016, 54(7): 60-68.
[2] 公晓云,申小涛,徐静,张涛,朱正江,薛付忠. 代谢组学数据正态性对疾病分类准确性的影响[J]. 山东大学学报(医学版), 2016, 54(4): 89-93.
[3] 龚晴, 张昊, 乔颖. 精神分裂症患者家属生活质量调查分析[J]. 山东大学学报(医学版), 2014, 52(Z2): 177-178.
[4] 王玉琴, 哈秀英, 徐卫国. 三种抗精神分裂症药物的疗效和经济学评价[J]. 山东大学学报(医学版), 2014, 52(Z2): 196-197.
[5] 赵燕1,王刚1,陈大方2,程宇航1,陈雪彦1 . 首发未治精神分裂症患者IL-6、IL-10和IL-12水平及影响因素[J]. 山东大学学报(医学版), 2014, 52(4): 70-73.
[6] 李阳, 曹枫林, 钟耕坤, 林萍珍, 梁鑫浩. 精神分裂症患者青少年亲属生态学执行功能特征及其与侵害的关系[J]. 山东大学学报(医学版), 2014, 52(11): 86-91.
[7] 郭晓宇1,高艳秋1,刘照旭2. 临床表型分类系统在门诊慢性前列腺炎患者中的应用[J]. 山东大学学报(医学版), 2014, 52(1): 62-66.
[8] 杨丽敏1,2,王妍1,孙萌萌2,崔开艳2,王丽娜2,刘兰芬1,2. 首发精神分裂症患者的信息处理速度及其影响因素[J]. 山东大学学报(医学版), 2013, 51(9): 100-104.
[9] 王妍1,杨丽敏1,2,孙萌萌2,崔开艳2,杨晓东2,乔冬冬1,2,王汝展2,张敬悬2,刘兰芬1,2. 儿童期虐待与精神分裂症患者人格特征的相关性[J]. 山东大学学报(医学版), 2013, 51(8): 103-106.
[10] 徐云璐1,徐成敏2. 精神分裂症患者病耻感与社会支持、面子观的相关性[J]. 山东大学学报(医学版), 2013, 51(11): 90-92.
[11] 吴庆忠1,车峰远2,薛付忠1. 基于非平衡数据的癫痫发作预警模型研究[J]. 山东大学学报(医学版), 2012, 50(2): 141-.
[12] 刘振花1,郑云哨2,张天亮2,安璐璇1. 帕利哌酮缓释片对急性期精神分裂症患者社会功能的影响[J]. 山东大学学报(医学版), 2011, 49(8): 100-103.
[13] 蒋海强1,李运伦2,解君2. 基于高效液相色谱-电喷雾-飞行时间质谱联用技术的高血压病血浆代谢组学分析[J]. 山东大学学报(医学版), 2011, 49(10): 150-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!