基于深度强化学习的四足机器人运动控制发展现状与展望

doi:10.6040/j.issn.1671-7554.0.2020.0607

山东大学学报 (医学版) ›› 2020, Vol. 1 ›› Issue (8): 61-66.doi: 10.6040/j.issn.1671-7554.0.2020.0607

• 脑科学与类脑智能研究专题 • 上一篇下一篇

基于深度强化学习的四足机器人运动控制发展现状与展望

张伟*(),谭文浩,李贻斌

山东大学控制科学与工程学院，山东济南 250061

收稿日期:2020-04-16 出版日期:2020-08-01 发布日期:2020-08-07
通讯作者: 张伟 E-mail:davidzhang@sdu.edu.cn
作者简介:张伟，山东大学教授、博士研究生导师，香港中文大学博士，美国加州大学伯克利分校博士后。主要从事模式识别、计算机视觉、机器学习、机器人等领域的研究，主持国家自然基金重大项目课题、联合基金重点项目、国家重点研发计划课题、山东省重大专项等10余项。在IEEE TPAMI、TNNLS、TIP、TCYB、CVPR、ICCV、ECCV、IJCAI、AAAI、ICRA、IROS等人工智能与机器人领域权威期刊和会议上发表论文80余篇，获美国、中国等发明专利授权10余项。曾获香港青年科学家提名奖、2次IEEE最佳论文奖、国际学术竞赛冠军等。担任模式识别与机器智能专委会委员、计算机视觉专委会委员、成像探测与感知专委会委员，以及PRL、Neurocomputing、《控制理论与应用》等期刊编委/特邀编委等
基金资助:
科技部重点研发计划(2017YFB1300205);山东省科技重大专项(新兴产业)(2018CXGC1503)

Locmotion control of quadruped robot based on deep reinforcement learning: review and prospect

Wei ZHANG*(),Wenhao TAN,Yibin LI

School of Control Science and Engineering, Shandong University, Jinan 250061, Shandong, China

Received:2020-04-16 Online:2020-08-01 Published:2020-08-07
Contact: Wei ZHANG E-mail:davidzhang@sdu.edu.cn

摘要/Abstract

摘要：

受类脑计算启发的深度强化学习在人工智能、机器人等诸多领域中都取得了巨大的成功，该方法通过结合深度学习与强化学习获得了优异的场景感知能力与任务决策能力。本文首先介绍了两类应用较为广泛的深度强化学习方法及其基本原理，并通过回顾深度强化学习在四足机器人运动控制上的应用现状讨论了该方法的研究进展，最后通过总结现有方法及腿足机器人控制特点，对深度强化学习在四足机器人上的应用前景进行了展望。

关键词: 机器学习, 深度强化学习, 四足机器人, 运动控制, 步态学习

Abstract:

Brain-inspired deep reinforcement learning has recently led to a wide range of successes in different domains such as artificial intelligence and robotics. The method combining both advantages of deep learning and reinforcement learning gets strong capability of perception and decision-making. In this paper, we first provide a brief overview of two kinds of widely used deep reinforcement learning method and their fundamentals, then introduce the current status of deep reinforcement learning applying on quadruped robots. Finally, by summarizing the existing methods and the characteristics of locomotion for quadruped robots, we present future potential of deep reinforcement learning on quadruped robots.

Key words: Machine learning, Deep reinforcement learning, Quadruped robot, Locomotion control, Gait learning

中图分类号:

R574

张伟,谭文浩,李贻斌. 基于深度强化学习的四足机器人运动控制发展现状与展望[J]. 山东大学学报 (医学版), 2020, 1(8): 61-66.

Wei ZHANG,Wenhao TAN,Yibin LI. Locmotion control of quadruped robot based on deep reinforcement learning: review and prospect[J]. Journal of Shandong University (Health Sciences), 2020, 1(8): 61-66.

参考文献 42

1	Hopfield JJ . Neural networks and physical systems with emergent collective computational abilities[J]. NAS, 1982, 79 (8): 2554- 2558. doi: 10.1073/pnas.79.8.2554
2	Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks[C]// Pereira F, Burges CJC, Bottou L. Advances in neural information processing systems. Lake Tahoe: Neural Information Processing Systems Conference, 2012: 1097-1105.
3	Russakovsky O , Deng J , Su H , et al. Imagenet large scale visual recognition challenge[J]. IJCV, 2015, 115 (3): 211- 252. doi: 10.1007/s11263-015-0816-y
4	Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks[C]// Krishnamurthy V, Plataniotis K. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 6645-6649.
5	Cho K, Van Merri e ¨ nboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv: 1406.1078, 2014.
6	Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]// Basri R, Fermuller C, Martinez A. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1725-1732.
7	Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv: 1312.5602, 2013.
8	Wu Y, Zhang W, Song K. Master-slave curriculum design for reinforcement learning[C]// Lang J. IJCAI. Stockholm: International Joint Conferences on Artificial Intelligence, 2018: 1523-1529.
9	Mnih V , Kavukcuoglu K , Silver D , et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533. doi: 10.1038/nature14236
10	Silver D , Huang A , Maddison CJ , et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529 (7587): 484- 489. doi: 10.1038/nature16961
11	Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv: 1509.02971, 2015.
12	Zhang J, Zhang W, Song R, et al. Grasp for stacking via deep reinforcement learning[C]// Kallio P, Burdet E. 2020 IEEE The International Conference on Robotics and Automation (ICRA), Virtual Conference: IEEE, 2020.
13	Duan Y, Chen X, Houthooft R, et al. Benchmarking deep reinforcement learning for continuous control[C]. // Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1329-1338.
14	Gu S, Lillicrap T, Sutskever I, et al. Continuous deep Q-learning with model-based acceleration[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 2829-2838.
15	Hansen S. Using deep Q-learning to control optimization hyperparameters[J]. arXiv preprint arXiv: 1602.04062, 2016.
16	Wu Y, Rao Z, Zhang W, et al. Exploring the task cooperation in multi-goal visual navigation[C]// Hentenryck PV, Zhou ZH. Proceedings of the 28th International Joint Conference on Artificial Intelligence. Hawaii: AAAI Press, 2019: 609-615.
17	Andrychowicz M, Denil M, Gomez S, et al. Learning to learn by gradient descent by gradient descent[C]// Lee DD, Sugiyama M, Luxburg UV. Advances in neural information processing systems. Barcelona: Neural Information Processing Systems Conference, 2016: 3981-3989.
18	Oh J, Guo X, Lee H, et al. Action-conditional video prediction using deep networks in atari games[C]// Cortes C, Lawrence ND, Lee DD. Advances in neural information processing systems. Montréal: Neural Information Processing Systems Conference, 2015: 2863-2871.
19	Caicedo JC, Lazebnik S. Active object localization with deep reinforcement learning[C]// Ikeuchi K, Schnörr C, Sivic J. Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 2488-2496.
20	Watkins CJCH , Dayan P . Q-learning[J]. Machine Learning, 1992, 8 (3-4): 279- 292. doi: 10.1007/BF00992698
21	Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]// Schuurmans D, Wellman M. Thirtieth AAAI conference on Artificial Intelligence. Phoenix: AAAI Press, 2016.
22	Hausknecht M, Stone P. Deep recurrent Q-learning for partially observable mdps[C]// Bonet B, Koenig S. 2015 AAAI Fall Symposium Series. Austin: AAAI Press, 2015.
23	Sutton RS, McAllester DA, Singh SP, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Leen TK, Dietterich TG, Tresp V. Advances in neural information processing systems. Denver: Neural Information Processing Systems Conference, 2000: 1057-1063.
24	Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1928-1937.
25	Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv: 1707.06347, 2017.
26	Fujimoto S, Van Hoof H, Meger D. Addressing function approximation error in actor-critic methods[J]. arXiv preprint arXiv: 1802.09477, 2018.
27	Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176. doi: 10.1109/LRA.2019.2931284
28	Jin B, Sun C, Zhang A, et al. Joint torque estimation toward dynamic and compliant control for gear-driven torque sensorless quadruped robot[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4630-4637.
29	Peng XB , Abbeel P , Levine S , et al. Deepmimic: example-guided deep reinforcement learning of physics-based character skills[J]. ACM Transactions on Graphics (TOG), 2018, 37 (4): 1- 14.
30	Hwangbo J , Lee J , Dosovitskiy A , et al. Learning agile and dynamic motor skills for legged robots[J]. Science Robotics, 2019, 4 (26): eaau5872. doi: 10.1126/scirobotics.aau5872
31	Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[J]. arXiv preprint arXiv: 1801.01290, 2018.
32	Tan J, Zhang T, Coumans E, et al. Sim-to-real: learning agile locomotion for quadruped robots[J]. arXiv preprint arXiv: 1804.10332, 2018.
33	Hwangbo J , Lee J , Hutter M . Per-contact iteration method for solving contact dynamics[J]. IEEE Robotics and Automation Letters, 2018, 3 (2): 895- 902. doi: 10.1109/LRA.2018.2792536
34	Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications[J]. arXiv preprint arXiv: 1812.05905, 2018.
35	Jain D, Iscen A, Caluwaerts K. Hierarchical reinforcement learning for quadruped locomotion[J]. arXiv preprint arXiv: 1905.08926, 2019.
36	Iscen A, Caluwaerts K, Tan J, et al. Policies modulating trajectory generators[J]. arXiv preprint arXiv: 1910.02812, 2019.
37	Frans K, Ho J, Chen X, et al. Meta learning shared hierarchies[J]. arXiv preprint arXiv: 1710.09767, 2017.
38	Kolvenbach H, Hampp E, Barton P, et al. Towards jumping locomotion for quadruped robots on the moon[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO : IEEE, 2019: 5459-5466.
39	Saputra AA, Toda Y, Takesue N, et al. A novel capabilities of quadruped robot moving through vertical ladder without handrail support[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 1448-1453.
40	Lee YH, Lee YH, Lee H, et al. Whole-body motion and landing force control for quadrupedal stair climbing[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4746-4751.
41	Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176. doi: 10.1109/LRA.2019.2931284
42	Ha S, Xu P, Tan Z, et al. Learning to walk in the real world with minimal human effort[J]. arXiv preprint arXiv: 2002.08550, 2020.

多维度评价

Viewed

Full text

Abstract

Cited

Shared

Discussed

[1]	廖原,门丹,李一帆,李怀臣,龙飞,刘铱. PM_2.5短期暴露对结核病发病风险的个体精准评估[J]. 山东大学学报 (医学版), 2026, 64(3): 116-123.
[2]	刘禹,霍娅娅,龚丞,梁婷,李斌. 基于机器学习的骨科生物材料设计与优化[J]. 山东大学学报 (医学版), 2026, 64(2): 22-33.
[3]	王建民,李晓峰,由志涛,董圣杰,赵宇驰,李占菊,邹德鑫,张剑锋,孙涛,杜伟. 基于可解释机器学习的后路腰椎椎体间融合术后慢性疼痛风险预测模型构建[J]. 山东大学学报 (医学版), 2026, 64(2): 78-88.
[4]	王丽云,高天勤,刘雨佳,陈青,陈柳,沙凯辉. 基于机器学习产后压力性尿失禁风险预测模型的构建及验证[J]. 山东大学学报 (医学版), 2025, 63(6): 55-66.
[5]	王静,刘晓菲,曾荣,许长娟,张锦涛,董亮. 基于机器学习算法鉴定哮喘的坏死性凋亡相关生物标志物[J]. 山东大学学报 (医学版), 2024, 62(7): 21-32.
[6]	郭振江,王宁,赵光远,杜立强,崔朝勃,刘防震. 基于机器学习建立术前预测近端胃癌食管切缘阳性模型[J]. 山东大学学报 (医学版), 2024, 62(7): 78-83.
[7]	梁永媛,蔡培飞,郑桂喜. 基于多检验变量和机器学习算法的结肠癌诊断模型建立及价值评估[J]. 山东大学学报 (医学版), 2024, 62(2): 51-59.
[8]	王玉涛,孙岩. 单细胞转录组学与机器学习综合分析揭示腹主动脉瘤潜在生物标志物[J]. 山东大学学报 (医学版), 2024, 62(11): 40-53.
[9]	张景慧,王娟,赵玉洁,段淼,刘毅然,林敏娟,谯旭,李真,左秀丽. 基于机器学习的胃肠道疾病舌诊模型构建[J]. 山东大学学报 (医学版), 2024, 62(1): 38-47.
[10]	刘亚军,郎昭,郭安忆,刘文勇. 骨科冲击波治疗的智能化发展现状及趋势分析[J]. 山东大学学报 (医学版), 2023, 61(3): 7-13.
[11]	吴南,仉建国,朱源棚,陈癸霖,陈泽夫. 人工智能在脊柱畸形诊疗中的应用[J]. 山东大学学报 (医学版), 2023, 61(3): 14-20.
[12]	朱正阳,沈靖菲,陈思璇,叶梅萍,杨惠泉,周佳南,梁雪,张鑫,张冰. 磁敏感加权成像不同影像组学模型预测胶质瘤IDH基因突变[J]. 山东大学学报 (医学版), 2023, 61(12): 44-50.
[13]	巨艳丽,王丽华,成芳,黄凤艳,陈学禹,贾红英. 基于机器学习构建放射性碘治疗疗效的预测模型[J]. 山东大学学报 (医学版), 2023, 61(1): 94-99.
[14]	况利,徐小明,曾琪. 机器学习用于自杀研究的综述[J]. 山东大学学报 (医学版), 2022, 60(4): 10-16.
[15]	姜震,孙静,邹雯,王唱唱,高琦. 基于两种机器学习算法的双相情感障碍患者自杀行为影响因素模型比较研究[J]. 山东大学学报 (医学版), 2022, 60(1): 101-108.

基于深度强化学习的四足机器人运动控制发展现状与展望

Locmotion control of quadruped robot based on deep reinforcement learning: review and prospect

RichHTML

PDF (PC)

摘要/Abstract

引用本文

使用本文

参考文献 42

相关文章 15

多维度评价

本文评价

推荐阅读 0