您的位置:山东大学 -> 科技期刊社 -> 《山东大学学报(医学版)》

山东大学学报 (医学版) ›› 2020, Vol. 58 ›› Issue (8): 61-66.doi: 10.6040/j.issn.1671-7554.0.2020.0607

• 脑科学与类脑智能研究专题 • 上一篇    下一篇

基于深度强化学习的四足机器人运动控制发展现状与展望

张伟*(),谭文浩,李贻斌   

  1. 山东大学控制科学与工程学院,山东 济南 250061
  • 收稿日期:2020-04-16 出版日期:2020-08-01 发布日期:2020-08-07
  • 通讯作者: 张伟 E-mail:davidzhang@sdu.edu.cn
  • 作者简介:张伟,山东大学教授、博士研究生导师,香港中文大学博士,美国加州大学伯克利分校博士后。主要从事模式识别、计算机视觉、机器学习、机器人等领域的研究,主持国家自然基金重大项目课题、联合基金重点项目、国家重点研发计划课题、山东省重大专项等10余项。在IEEE TPAMI、TNNLS、TIP、TCYB、CVPR、ICCV、ECCV、IJCAI、AAAI、ICRA、IROS等人工智能与机器人领域权威期刊和会议上发表论文80余篇,获美国、中国等发明专利授权10余项。曾获香港青年科学家提名奖、2次IEEE最佳论文奖、国际学术竞赛冠军等。担任模式识别与机器智能专委会委员、计算机视觉专委会委员、成像探测与感知专委会委员,以及PRL、Neurocomputing、《控制理论与应用》等期刊编委/特邀编委等
  • 基金资助:
    科技部重点研发计划(2017YFB1300205);山东省科技重大专项(新兴产业)(2018CXGC1503)

Locmotion control of quadruped robot based on deep reinforcement learning: review and prospect

Wei ZHANG*(),Wenhao TAN,Yibin LI   

  1. School of Control Science and Engineering, Shandong University, Jinan 250061, Shandong, China
  • Received:2020-04-16 Online:2020-08-01 Published:2020-08-07
  • Contact: Wei ZHANG E-mail:davidzhang@sdu.edu.cn

摘要:

受类脑计算启发的深度强化学习在人工智能、机器人等诸多领域中都取得了巨大的成功,该方法通过结合深度学习与强化学习获得了优异的场景感知能力与任务决策能力。本文首先介绍了两类应用较为广泛的深度强化学习方法及其基本原理,并通过回顾深度强化学习在四足机器人运动控制上的应用现状讨论了该方法的研究进展,最后通过总结现有方法及腿足机器人控制特点,对深度强化学习在四足机器人上的应用前景进行了展望。

关键词: 机器学习, 深度强化学习, 四足机器人, 运动控制, 步态学习

Abstract:

Brain-inspired deep reinforcement learning has recently led to a wide range of successes in different domains such as artificial intelligence and robotics. The method combining both advantages of deep learning and reinforcement learning gets strong capability of perception and decision-making. In this paper, we first provide a brief overview of two kinds of widely used deep reinforcement learning method and their fundamentals, then introduce the current status of deep reinforcement learning applying on quadruped robots. Finally, by summarizing the existing methods and the characteristics of locomotion for quadruped robots, we present future potential of deep reinforcement learning on quadruped robots.

Key words: Machine learning, Deep reinforcement learning, Quadruped robot, Locomotion control, Gait learning

中图分类号: 

  • R574
1 Hopfield JJ . Neural networks and physical systems with emergent collective computational abilities[J]. NAS, 1982, 79 (8): 2554- 2558.
doi: 10.1073/pnas.79.8.2554
2 Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks[C]// Pereira F, Burges CJC, Bottou L. Advances in neural information processing systems. Lake Tahoe: Neural Information Processing Systems Conference, 2012: 1097-1105.
3 Russakovsky O , Deng J , Su H , et al. Imagenet large scale visual recognition challenge[J]. IJCV, 2015, 115 (3): 211- 252.
doi: 10.1007/s11263-015-0816-y
4 Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks[C]// Krishnamurthy V, Plataniotis K. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 6645-6649.
5 Cho K, Van Merri e ¨ nboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv: 1406.1078, 2014.
6 Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]// Basri R, Fermuller C, Martinez A. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1725-1732.
7 Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv: 1312.5602, 2013.
8 Wu Y, Zhang W, Song K. Master-slave curriculum design for reinforcement learning[C]// Lang J. IJCAI. Stockholm: International Joint Conferences on Artificial Intelligence, 2018: 1523-1529.
9 Mnih V , Kavukcuoglu K , Silver D , et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533.
doi: 10.1038/nature14236
10 Silver D , Huang A , Maddison CJ , et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529 (7587): 484- 489.
doi: 10.1038/nature16961
11 Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv: 1509.02971, 2015.
12 Zhang J, Zhang W, Song R, et al. Grasp for stacking via deep reinforcement learning[C]// Kallio P, Burdet E. 2020 IEEE The International Conference on Robotics and Automation (ICRA), Virtual Conference: IEEE, 2020.
13 Duan Y, Chen X, Houthooft R, et al. Benchmarking deep reinforcement learning for continuous control[C]. // Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1329-1338.
14 Gu S, Lillicrap T, Sutskever I, et al. Continuous deep Q-learning with model-based acceleration[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 2829-2838.
15 Hansen S. Using deep Q-learning to control optimization hyperparameters[J]. arXiv preprint arXiv: 1602.04062, 2016.
16 Wu Y, Rao Z, Zhang W, et al. Exploring the task cooperation in multi-goal visual navigation[C]// Hentenryck PV, Zhou ZH. Proceedings of the 28th International Joint Conference on Artificial Intelligence. Hawaii: AAAI Press, 2019: 609-615.
17 Andrychowicz M, Denil M, Gomez S, et al. Learning to learn by gradient descent by gradient descent[C]// Lee DD, Sugiyama M, Luxburg UV. Advances in neural information processing systems. Barcelona: Neural Information Processing Systems Conference, 2016: 3981-3989.
18 Oh J, Guo X, Lee H, et al. Action-conditional video prediction using deep networks in atari games[C]// Cortes C, Lawrence ND, Lee DD. Advances in neural information processing systems. Montréal: Neural Information Processing Systems Conference, 2015: 2863-2871.
19 Caicedo JC, Lazebnik S. Active object localization with deep reinforcement learning[C]// Ikeuchi K, Schnörr C, Sivic J. Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 2488-2496.
20 Watkins CJCH , Dayan P . Q-learning[J]. Machine Learning, 1992, 8 (3-4): 279- 292.
doi: 10.1007/BF00992698
21 Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]// Schuurmans D, Wellman M. Thirtieth AAAI conference on Artificial Intelligence. Phoenix: AAAI Press, 2016.
22 Hausknecht M, Stone P. Deep recurrent Q-learning for partially observable mdps[C]// Bonet B, Koenig S. 2015 AAAI Fall Symposium Series. Austin: AAAI Press, 2015.
23 Sutton RS, McAllester DA, Singh SP, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Leen TK, Dietterich TG, Tresp V. Advances in neural information processing systems. Denver: Neural Information Processing Systems Conference, 2000: 1057-1063.
24 Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1928-1937.
25 Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv: 1707.06347, 2017.
26 Fujimoto S, Van Hoof H, Meger D. Addressing function approximation error in actor-critic methods[J]. arXiv preprint arXiv: 1802.09477, 2018.
27 Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176.
doi: 10.1109/LRA.2019.2931284
28 Jin B, Sun C, Zhang A, et al. Joint torque estimation toward dynamic and compliant control for gear-driven torque sensorless quadruped robot[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4630-4637.
29 Peng XB , Abbeel P , Levine S , et al. Deepmimic: example-guided deep reinforcement learning of physics-based character skills[J]. ACM Transactions on Graphics (TOG), 2018, 37 (4): 1- 14.
30 Hwangbo J , Lee J , Dosovitskiy A , et al. Learning agile and dynamic motor skills for legged robots[J]. Science Robotics, 2019, 4 (26): eaau5872.
doi: 10.1126/scirobotics.aau5872
31 Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[J]. arXiv preprint arXiv: 1801.01290, 2018.
32 Tan J, Zhang T, Coumans E, et al. Sim-to-real: learning agile locomotion for quadruped robots[J]. arXiv preprint arXiv: 1804.10332, 2018.
33 Hwangbo J , Lee J , Hutter M . Per-contact iteration method for solving contact dynamics[J]. IEEE Robotics and Automation Letters, 2018, 3 (2): 895- 902.
doi: 10.1109/LRA.2018.2792536
34 Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications[J]. arXiv preprint arXiv: 1812.05905, 2018.
35 Jain D, Iscen A, Caluwaerts K. Hierarchical reinforcement learning for quadruped locomotion[J]. arXiv preprint arXiv: 1905.08926, 2019.
36 Iscen A, Caluwaerts K, Tan J, et al. Policies modulating trajectory generators[J]. arXiv preprint arXiv: 1910.02812, 2019.
37 Frans K, Ho J, Chen X, et al. Meta learning shared hierarchies[J]. arXiv preprint arXiv: 1710.09767, 2017.
38 Kolvenbach H, Hampp E, Barton P, et al. Towards jumping locomotion for quadruped robots on the moon[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO : IEEE, 2019: 5459-5466.
39 Saputra AA, Toda Y, Takesue N, et al. A novel capabilities of quadruped robot moving through vertical ladder without handrail support[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 1448-1453.
40 Lee YH, Lee YH, Lee H, et al. Whole-body motion and landing force control for quadrupedal stair climbing[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4746-4751.
41 Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176.
doi: 10.1109/LRA.2019.2931284
42 Ha S, Xu P, Tan Z, et al. Learning to walk in the real world with minimal human effort[J]. arXiv preprint arXiv: 2002.08550, 2020.
[1] 吴强,何泽鲲,刘琚,崔晓萌,孙双,石伟. 基于机器学习的脑胶质瘤多模态影像分析[J]. 山东大学学报 (医学版), 2020, 58(8): 81-87.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] 索东阳,申飞,郭皓,刘力畅,杨惠敏,杨向东. Tim-3在药物性急性肾损伤动物模型中的表达及作用机制[J]. 山东大学学报 (医学版), 2020, 58(7): 1 -6 .
[2] 张宝文,雷香丽,李瑾娜,罗湘俊,邹容. miR-21-5p靶向调控TIMP3抑制2型糖尿病肾病小鼠肾脏系膜细胞增殖及细胞外基质堆积[J]. 山东大学学报 (医学版), 2020, 58(7): 7 -14 .
[3] 龙婷婷,谢明,周璐,朱俊德. Noggin蛋白对小鼠脑缺血再灌注损伤后学习和记忆能力与齿状回结构的影响[J]. 山东大学学报 (医学版), 2020, 58(7): 15 -23 .
[4] 付洁琦,张曼,张晓璐,李卉,陈红. Toll样受体4抑制过氧化物酶体增殖物激活受体γ加重血脂蓄积的分子机制[J]. 山东大学学报 (医学版), 2020, 58(7): 24 -31 .
[5] 马青源,蒲沛东,韩飞,王超,朱洲均,王维山,史晨辉. miR-27b-3p调控SMAD1对骨肉瘤细胞增殖、迁移和侵袭作用的影响[J]. 山东大学学报 (医学版), 2020, 58(7): 32 -37 .
[6] 李宁,李娟,谢艳,李培龙,王允山,杜鲁涛,王传新. 长链非编码RNA AL109955.1在80例结直肠癌组织中的表达及对细胞增殖与迁移侵袭的影响[J]. 山东大学学报 (医学版), 2020, 58(7): 38 -46 .
[7] 史爽,李娟,米琦,王允山,杜鲁涛,王传新. 胃癌miRNAs预后风险评分模型的构建与应用[J]. 山东大学学报 (医学版), 2020, 58(7): 47 -52 .
[8] 肖娟,肖强,丛伟,李婷,丁守銮,张媛,邵纯纯,吴梅,刘佳宁,贾红英. 两种甲状腺超声数据报告系统诊断效能的比较[J]. 山东大学学报 (医学版), 2020, 58(7): 53 -59 .
[9] 丁祥云,于清梅,张文芳,庄园,郝晶. 胰岛素样生长因子II在84例多囊卵巢综合征患者颗粒细胞中的表达和促排卵结局的相关性[J]. 山东大学学报 (医学版), 2020, 58(7): 60 -66 .
[10] 徐玉香,刘煜东,张蓬,段瑞生. 101例脑小血管病患者脑微出血危险因素的回顾性分析[J]. 山东大学学报 (医学版), 2020, 58(7): 67 -71 .