JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES) ›› 2017, Vol. 55 ›› Issue (6): 47-55.doi: 10.6040/j.issn.1671-7554.0.2017.365

Previous Articles     Next Articles

A collecting and processing system for health care big data based on web crawler technology

BIAN Weiwei1,2, WANG Yongchao2,3, CUI Lizhen2,4, GUO Wei2,4, LI Hui2,4, ZHOU Miao1,2, XUE Fuzhong1,2, LIU Jing1,2   

  1. 1. Department of Biostatistics, School of Public Health, Shandong University, Jinan 250012, Shandong, China;
    2. Cheeloo Research Center for Biomedical Big Data, Shandong University, Jinan 250012, Shandong, China;
    3. Kangping Health Care Big Data Technology Company Limited, Jinan 250101, Shandong, China;
    4. School of Computer Science and Technology, Shandong University, Jinan 250101, Shandong, China
  • Received:2017-04-27 Online:2017-06-10 Published:2017-06-10

Abstract: Objective To collect and process the medical data from public health service system rapidly and exactly, and to provide data base for establishing the population health risk assessment model. Methods The algorithm and program were based on focused web crawler. This study mainly improved the algorithm in three aspects: automatic recording and correcting URL anomaly, original data archiving and keeping login mode. Medical data of the authorized website were obtained by the advanced web crawler, and were parsed and sorted out via medical database system. Results Data from several public health service base were acquired to provide data analysis report for local government, and multiple health risk assessment models were constructed by means of the processed data. Conclusion Utilizing the data collecting and processing system based on web crawler,we can deal with the problem that acquiring and organizing the available data in real life. This technology can be applied in medicine and health field,which will make full use of the existing rich medical data resources and greatly improve the utilization efficiency.

Key words: Web crawler, Data parsing, Database system, Focused web crawler, Data collecting, Data processing

CLC Number: 

  • R319
[1] 赵屹,卜德超.当生物医学遇上大数据[J].北大商业评论,2015(3): 74-79.
[2] 秦文哲,陈进,董力.大数据背景下医学数据挖掘的研究进展及应用[J].中国胸心血管外科临床杂志,2016, 23(1): 55-60. QIN Wenzhe, CHEN Jin, DONG Li. Progress and application of medical data mining under the background of big data[J]. Chin J Clin Thorac Cardiov Surg, 2016, 23(1):55-60.
[3] 陈锐,冯占英.大数据时代医学专业图书馆面临的挑战与对策[J].中华医学图书情报杂志,2014, 23(1): 2-6. CHEN Rui, FENG Zhanying. Challenges to medical libraries in big data era and their countermeasures[J]. Chin J Med LibrInf Sci, 2014, 23(1): 2-6.
[4] 李惠先,封二英.大数据时代医学研究面临的机遇与挑战[J].计算机光盘软件与应用, 2014(23): 138-139.
[5] 李娟.医疗卫生信息化综合大数据平台关键技术探究[J].金陵科技学院学报,2014, 30(4): 21-24. LI Juan. The key technology on integrated big data platform for informatizationof medicareand health[J]. Journal of Jinling Institute of Technology, 2014, 30(4): 21-24.
[6] 孙立伟,何国辉,吴礼发.网络爬虫技术的研究[J].电脑知识与技术,2010, 6(15): 4112-4115. SUN Liwei, HE Guohui, WU Lifa. Research on the Web Crawler[J]. Computer Knowledge and Technology, 2010, 6(15): 4112-4115.
[7] 周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用,2005, 25(9): 1965-1969. ZHOU Lizhu, LINLing. Survey on the research of focused craw ling technique[J]. Computer Applications, 2005, 25(9): 1965-1969.
[8] 于怀宝.面向建材信息的网络爬虫系统的设计与实现[D].北京:北京交通大学, 2015.
[9] 曾伟辉.支持AJAX的网络爬虫系统设计与实现[D].合肥:中国科学技术大学, 2009.
[10] 曾伟辉,李淼.深层网络爬虫研究[J].计算机系统应用,2008(5): 122-126.
[11] Chen Z, Ma J, Lei J, et al. A cross-language focused crawling algorithm based on multiple relevance prediction strategies[J]. Comput Math Appl, 2009, 57(6): 1057-1072.
[12] Batsakis S, Petrakis EGM, Milios E. Improving the performance of focused web crawlers[J]. Data & Knowledge Engineering, 2009, 68(10): 1001-1013.
[13] 杨定中,赵刚,王泰.网络爬虫在Web信息搜索与数据挖掘中应用[J].计算机工程与设计,2009, 30(24): 5658-5662. YANG Dingzhong, ZHAO Gang, WANG Tai. Application of Web Crawler in information search and data mining[J].Computer Engineering and Design, 2009, 30(24): 5658-5662.
[14] 罗一纾.微博爬虫的相关技术研究[D].哈尔滨:哈尔滨工业大学, 2013.
[15] 徐远超,刘江华,刘丽珍, 等.基于Web的网络爬虫的设计与实现[J].微计算机信息,2007, 23(7-3): 119-121. XU Yuanchao, LIUJianghua, LIU Lizhen, et al. Design and implementation of spider on web- based full-text search engine[J]. Microcomputer Information, 2007, 23(7-3): 119-121.
[16] 许笑, 张伟哲,张宏莉, 等.广域网分布式Web爬虫[J].软件学报,2010, 21(5): 1067-1082. XU Xiao, ZHANG Weizhe, ZHANG Hongli, et al. WAN-based distributed web crawling[J]. Journal of Software, 2010, 21(5): 1067-1082.
[17] Lawrence S, Giles CL. Accessibility of information on the web[J]. Nature, 1999, 400(1): 107-109.
[18] Aggarwal CC. Collaborative crawling: Mining user experiences for topical resource discovery[C] // Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining. New York: ACM, 2002: 423-428.
[19] Ahlers D, Boll S. Adaptive geospatially focused crawling[C] // ACM Conference on Information and Knowledge Management, New York: ACM, 2009: 445-454.
[20] Xu S, Yoon HJ, Tourassi G. A user-oriented web crawler for selectively acquiring online content in e-health research[J]. Bioinformatics, 2014, 30(1): 104-114.
[21] 孔抗美,张莹,李韶斌, 等.医院医疗数据挖掘与分析[J].中国卫生信息管理杂志, 2015, 8(6): 29-33.
[22] 李雄伟.数据挖掘在医疗中的应用研究[J].信息化纵横, 2009, 16: 78-82. LI Xiongwei. Application research of data mining in medical treatment[J]. Informationization, 2009, 16: 78-82.
[23] 蒋良孝,蔡之华.医疗数据挖掘及其应用[J].微型机与应用,2003, 10: 45-47.
[24] 龚卫宁.数据挖掘在医院管理中的应用[J].中国医药指南,2012, 10(12): 722-725.
[25] 李俊.数据挖掘技术在医疗信息系统中的研究与应用[D].成都:成都理工大学, 2011.
[26] 李怀庆,张文东.数据挖掘技术在医院信息系统中的应用[J].医疗设备信息, 2007, 22(12):48-49. LI Huaiqing, ZHANG Wendong. Application of data mining technology in hospital informationsystem[J]. Information of Medical Equipment, 2007, 22(12): 48-49.
[1] WU Ming-sheng,WANG Ping,LIN Jun-hao,WANG Qian,WU Shu-ming. ERCC1 expression and cisplatin based adjuvant chemotherapy in non-small cell lung cancer [J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2008, 46(6): 556-560.
[2] . Construction of lentiviral recombinants expressing shRNA targeting rPTTG 
gene and evaluation of knocking down efficiency mediated by lentivirus
[J]. JOURNAL OF SHANDONG UNIVERSITY (HEALTH SCIENCES), 2009, 47(9): 76-80.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!