基于深度强化学习的四足机器人运动控制发展现状与展望

doi:10.6040/j.issn.1671-7554.0.2020.0607

山东大学学报 (医学版) ›› 2020, Vol. 58 ›› Issue (8): 61-66.doi: 10.6040/j.issn.1671-7554.0.2020.0607

• 脑科学与类脑智能研究专题 • 上一篇下一篇

基于深度强化学习的四足机器人运动控制发展现状与展望

张伟*(),谭文浩,李贻斌

山东大学控制科学与工程学院，山东济南 250061

收稿日期:2020-04-16 出版日期:2020-08-01 发布日期:2020-08-07
通讯作者: 张伟 E-mail:davidzhang@sdu.edu.cn
作者简介:张伟，山东大学教授、博士研究生导师，香港中文大学博士，美国加州大学伯克利分校博士后。主要从事模式识别、计算机视觉、机器学习、机器人等领域的研究，主持国家自然基金重大项目课题、联合基金重点项目、国家重点研发计划课题、山东省重大专项等10余项。在IEEE TPAMI、TNNLS、TIP、TCYB、CVPR、ICCV、ECCV、IJCAI、AAAI、ICRA、IROS等人工智能与机器人领域权威期刊和会议上发表论文80余篇，获美国、中国等发明专利授权10余项。曾获香港青年科学家提名奖、2次IEEE最佳论文奖、国际学术竞赛冠军等。担任模式识别与机器智能专委会委员、计算机视觉专委会委员、成像探测与感知专委会委员，以及PRL、Neurocomputing、《控制理论与应用》等期刊编委/特邀编委等
基金资助:
科技部重点研发计划(2017YFB1300205);山东省科技重大专项(新兴产业)(2018CXGC1503)

Locmotion control of quadruped robot based on deep reinforcement learning: review and prospect

Wei ZHANG*(),Wenhao TAN,Yibin LI

School of Control Science and Engineering, Shandong University, Jinan 250061, Shandong, China

Received:2020-04-16 Online:2020-08-01 Published:2020-08-07
Contact: Wei ZHANG E-mail:davidzhang@sdu.edu.cn

摘要/Abstract

摘要：

受类脑计算启发的深度强化学习在人工智能、机器人等诸多领域中都取得了巨大的成功，该方法通过结合深度学习与强化学习获得了优异的场景感知能力与任务决策能力。本文首先介绍了两类应用较为广泛的深度强化学习方法及其基本原理，并通过回顾深度强化学习在四足机器人运动控制上的应用现状讨论了该方法的研究进展，最后通过总结现有方法及腿足机器人控制特点，对深度强化学习在四足机器人上的应用前景进行了展望。

关键词: 机器学习, 深度强化学习, 四足机器人, 运动控制, 步态学习

Abstract:

Brain-inspired deep reinforcement learning has recently led to a wide range of successes in different domains such as artificial intelligence and robotics. The method combining both advantages of deep learning and reinforcement learning gets strong capability of perception and decision-making. In this paper, we first provide a brief overview of two kinds of widely used deep reinforcement learning method and their fundamentals, then introduce the current status of deep reinforcement learning applying on quadruped robots. Finally, by summarizing the existing methods and the characteristics of locomotion for quadruped robots, we present future potential of deep reinforcement learning on quadruped robots.

Key words: Machine learning, Deep reinforcement learning, Quadruped robot, Locomotion control, Gait learning

中图分类号:

R574

张伟,谭文浩,李贻斌. 基于深度强化学习的四足机器人运动控制发展现状与展望[J]. 山东大学学报 (医学版), 2020, 58(8): 61-66.

Wei ZHANG,Wenhao TAN,Yibin LI. Locmotion control of quadruped robot based on deep reinforcement learning: review and prospect[J]. Journal of Shandong University (Health Sciences), 2020, 58(8): 61-66.

参考文献 42

1	Hopfield JJ . Neural networks and physical systems with emergent collective computational abilities[J]. NAS, 1982, 79 (8): 2554- 2558. doi: 10.1073/pnas.79.8.2554
2	Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks[C]// Pereira F, Burges CJC, Bottou L. Advances in neural information processing systems. Lake Tahoe: Neural Information Processing Systems Conference, 2012: 1097-1105.
3	Russakovsky O , Deng J , Su H , et al. Imagenet large scale visual recognition challenge[J]. IJCV, 2015, 115 (3): 211- 252. doi: 10.1007/s11263-015-0816-y
4	Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks[C]// Krishnamurthy V, Plataniotis K. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 6645-6649.
5	Cho K, Van Merri e ¨ nboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv: 1406.1078, 2014.
6	Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]// Basri R, Fermuller C, Martinez A. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1725-1732.
7	Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv: 1312.5602, 2013.
8	Wu Y, Zhang W, Song K. Master-slave curriculum design for reinforcement learning[C]// Lang J. IJCAI. Stockholm: International Joint Conferences on Artificial Intelligence, 2018: 1523-1529.
9	Mnih V , Kavukcuoglu K , Silver D , et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533. doi: 10.1038/nature14236
10	Silver D , Huang A , Maddison CJ , et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529 (7587): 484- 489. doi: 10.1038/nature16961
11	Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv: 1509.02971, 2015.
12	Zhang J, Zhang W, Song R, et al. Grasp for stacking via deep reinforcement learning[C]// Kallio P, Burdet E. 2020 IEEE The International Conference on Robotics and Automation (ICRA), Virtual Conference: IEEE, 2020.
13	Duan Y, Chen X, Houthooft R, et al. Benchmarking deep reinforcement learning for continuous control[C]. // Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1329-1338.
14	Gu S, Lillicrap T, Sutskever I, et al. Continuous deep Q-learning with model-based acceleration[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 2829-2838.
15	Hansen S. Using deep Q-learning to control optimization hyperparameters[J]. arXiv preprint arXiv: 1602.04062, 2016.
16	Wu Y, Rao Z, Zhang W, et al. Exploring the task cooperation in multi-goal visual navigation[C]// Hentenryck PV, Zhou ZH. Proceedings of the 28th International Joint Conference on Artificial Intelligence. Hawaii: AAAI Press, 2019: 609-615.
17	Andrychowicz M, Denil M, Gomez S, et al. Learning to learn by gradient descent by gradient descent[C]// Lee DD, Sugiyama M, Luxburg UV. Advances in neural information processing systems. Barcelona: Neural Information Processing Systems Conference, 2016: 3981-3989.
18	Oh J, Guo X, Lee H, et al. Action-conditional video prediction using deep networks in atari games[C]// Cortes C, Lawrence ND, Lee DD. Advances in neural information processing systems. Montréal: Neural Information Processing Systems Conference, 2015: 2863-2871.
19	Caicedo JC, Lazebnik S. Active object localization with deep reinforcement learning[C]// Ikeuchi K, Schnörr C, Sivic J. Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 2488-2496.
20	Watkins CJCH , Dayan P . Q-learning[J]. Machine Learning, 1992, 8 (3-4): 279- 292. doi: 10.1007/BF00992698
21	Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]// Schuurmans D, Wellman M. Thirtieth AAAI conference on Artificial Intelligence. Phoenix: AAAI Press, 2016.
22	Hausknecht M, Stone P. Deep recurrent Q-learning for partially observable mdps[C]// Bonet B, Koenig S. 2015 AAAI Fall Symposium Series. Austin: AAAI Press, 2015.
23	Sutton RS, McAllester DA, Singh SP, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Leen TK, Dietterich TG, Tresp V. Advances in neural information processing systems. Denver: Neural Information Processing Systems Conference, 2000: 1057-1063.
24	Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1928-1937.
25	Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv: 1707.06347, 2017.
26	Fujimoto S, Van Hoof H, Meger D. Addressing function approximation error in actor-critic methods[J]. arXiv preprint arXiv: 1802.09477, 2018.
27	Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176. doi: 10.1109/LRA.2019.2931284
28	Jin B, Sun C, Zhang A, et al. Joint torque estimation toward dynamic and compliant control for gear-driven torque sensorless quadruped robot[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4630-4637.
29	Peng XB , Abbeel P , Levine S , et al. Deepmimic: example-guided deep reinforcement learning of physics-based character skills[J]. ACM Transactions on Graphics (TOG), 2018, 37 (4): 1- 14.
30	Hwangbo J , Lee J , Dosovitskiy A , et al. Learning agile and dynamic motor skills for legged robots[J]. Science Robotics, 2019, 4 (26): eaau5872. doi: 10.1126/scirobotics.aau5872
31	Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[J]. arXiv preprint arXiv: 1801.01290, 2018.
32	Tan J, Zhang T, Coumans E, et al. Sim-to-real: learning agile locomotion for quadruped robots[J]. arXiv preprint arXiv: 1804.10332, 2018.
33	Hwangbo J , Lee J , Hutter M . Per-contact iteration method for solving contact dynamics[J]. IEEE Robotics and Automation Letters, 2018, 3 (2): 895- 902. doi: 10.1109/LRA.2018.2792536
34	Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications[J]. arXiv preprint arXiv: 1812.05905, 2018.
35	Jain D, Iscen A, Caluwaerts K. Hierarchical reinforcement learning for quadruped locomotion[J]. arXiv preprint arXiv: 1905.08926, 2019.
36	Iscen A, Caluwaerts K, Tan J, et al. Policies modulating trajectory generators[J]. arXiv preprint arXiv: 1910.02812, 2019.
37	Frans K, Ho J, Chen X, et al. Meta learning shared hierarchies[J]. arXiv preprint arXiv: 1710.09767, 2017.
38	Kolvenbach H, Hampp E, Barton P, et al. Towards jumping locomotion for quadruped robots on the moon[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO : IEEE, 2019: 5459-5466.
39	Saputra AA, Toda Y, Takesue N, et al. A novel capabilities of quadruped robot moving through vertical ladder without handrail support[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 1448-1453.
40	Lee YH, Lee YH, Lee H, et al. Whole-body motion and landing force control for quadrupedal stair climbing[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4746-4751.
41	Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176. doi: 10.1109/LRA.2019.2931284
42	Ha S, Xu P, Tan Z, et al. Learning to walk in the real world with minimal human effort[J]. arXiv preprint arXiv: 2002.08550, 2020.

多维度评价

Viewed

Full text

1774

HTML			PDF

Just accepted	Online first	Issue	Just accepted	Online first	Issue
0	0	110	0	0	1664

From	Others	local

Times	159	1615
Rate	9%	91%

Abstract

3605

Just accepted	Online first	Issue

0	0	3605

From	Others	local

Times	3601	4
Rate	100%	0%

Cited

Web of Science	Crossref	ScienceDirect	Search for Citations in Google Scholar >>


This page requires you have already subscribed to WoS.

Shared

Discussed

基于深度强化学习的四足机器人运动控制发展现状与展望

Locmotion control of quadruped robot based on deep reinforcement learning: review and prospect

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献 42

相关文章 1

多维度评价

本文评价

推荐阅读 10