Journal of Shandong University (Health Sciences) ›› 2020, Vol. 58 ›› Issue (8): 61-66.doi: 10.6040/j.issn.1671-7554.0.2020.0607

• Special Topic on Brain Science and Brain Like Intelligence • Previous Articles     Next Articles

Locmotion control of quadruped robot based on deep reinforcement learning: review and prospect

Wei ZHANG*(),Wenhao TAN,Yibin LI   

  1. School of Control Science and Engineering, Shandong University, Jinan 250061, Shandong, China
  • Received:2020-04-16 Online:2020-08-01 Published:2020-08-07
  • Contact: Wei ZHANG E-mail:davidzhang@sdu.edu.cn

Abstract:

Brain-inspired deep reinforcement learning has recently led to a wide range of successes in different domains such as artificial intelligence and robotics. The method combining both advantages of deep learning and reinforcement learning gets strong capability of perception and decision-making. In this paper, we first provide a brief overview of two kinds of widely used deep reinforcement learning method and their fundamentals, then introduce the current status of deep reinforcement learning applying on quadruped robots. Finally, by summarizing the existing methods and the characteristics of locomotion for quadruped robots, we present future potential of deep reinforcement learning on quadruped robots.

Key words: Machine learning, Deep reinforcement learning, Quadruped robot, Locomotion control, Gait learning

CLC Number: 

  • R574
1 Hopfield JJ . Neural networks and physical systems with emergent collective computational abilities[J]. NAS, 1982, 79 (8): 2554- 2558.
doi: 10.1073/pnas.79.8.2554
2 Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks[C]// Pereira F, Burges CJC, Bottou L. Advances in neural information processing systems. Lake Tahoe: Neural Information Processing Systems Conference, 2012: 1097-1105.
3 Russakovsky O , Deng J , Su H , et al. Imagenet large scale visual recognition challenge[J]. IJCV, 2015, 115 (3): 211- 252.
doi: 10.1007/s11263-015-0816-y
4 Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks[C]// Krishnamurthy V, Plataniotis K. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 6645-6649.
5 Cho K, Van Merri e ¨ nboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv: 1406.1078, 2014.
6 Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]// Basri R, Fermuller C, Martinez A. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1725-1732.
7 Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv: 1312.5602, 2013.
8 Wu Y, Zhang W, Song K. Master-slave curriculum design for reinforcement learning[C]// Lang J. IJCAI. Stockholm: International Joint Conferences on Artificial Intelligence, 2018: 1523-1529.
9 Mnih V , Kavukcuoglu K , Silver D , et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533.
doi: 10.1038/nature14236
10 Silver D , Huang A , Maddison CJ , et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529 (7587): 484- 489.
doi: 10.1038/nature16961
11 Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv: 1509.02971, 2015.
12 Zhang J, Zhang W, Song R, et al. Grasp for stacking via deep reinforcement learning[C]// Kallio P, Burdet E. 2020 IEEE The International Conference on Robotics and Automation (ICRA), Virtual Conference: IEEE, 2020.
13 Duan Y, Chen X, Houthooft R, et al. Benchmarking deep reinforcement learning for continuous control[C]. // Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1329-1338.
14 Gu S, Lillicrap T, Sutskever I, et al. Continuous deep Q-learning with model-based acceleration[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 2829-2838.
15 Hansen S. Using deep Q-learning to control optimization hyperparameters[J]. arXiv preprint arXiv: 1602.04062, 2016.
16 Wu Y, Rao Z, Zhang W, et al. Exploring the task cooperation in multi-goal visual navigation[C]// Hentenryck PV, Zhou ZH. Proceedings of the 28th International Joint Conference on Artificial Intelligence. Hawaii: AAAI Press, 2019: 609-615.
17 Andrychowicz M, Denil M, Gomez S, et al. Learning to learn by gradient descent by gradient descent[C]// Lee DD, Sugiyama M, Luxburg UV. Advances in neural information processing systems. Barcelona: Neural Information Processing Systems Conference, 2016: 3981-3989.
18 Oh J, Guo X, Lee H, et al. Action-conditional video prediction using deep networks in atari games[C]// Cortes C, Lawrence ND, Lee DD. Advances in neural information processing systems. Montréal: Neural Information Processing Systems Conference, 2015: 2863-2871.
19 Caicedo JC, Lazebnik S. Active object localization with deep reinforcement learning[C]// Ikeuchi K, Schnörr C, Sivic J. Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 2488-2496.
20 Watkins CJCH , Dayan P . Q-learning[J]. Machine Learning, 1992, 8 (3-4): 279- 292.
doi: 10.1007/BF00992698
21 Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]// Schuurmans D, Wellman M. Thirtieth AAAI conference on Artificial Intelligence. Phoenix: AAAI Press, 2016.
22 Hausknecht M, Stone P. Deep recurrent Q-learning for partially observable mdps[C]// Bonet B, Koenig S. 2015 AAAI Fall Symposium Series. Austin: AAAI Press, 2015.
23 Sutton RS, McAllester DA, Singh SP, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Leen TK, Dietterich TG, Tresp V. Advances in neural information processing systems. Denver: Neural Information Processing Systems Conference, 2000: 1057-1063.
24 Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1928-1937.
25 Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv: 1707.06347, 2017.
26 Fujimoto S, Van Hoof H, Meger D. Addressing function approximation error in actor-critic methods[J]. arXiv preprint arXiv: 1802.09477, 2018.
27 Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176.
doi: 10.1109/LRA.2019.2931284
28 Jin B, Sun C, Zhang A, et al. Joint torque estimation toward dynamic and compliant control for gear-driven torque sensorless quadruped robot[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4630-4637.
29 Peng XB , Abbeel P , Levine S , et al. Deepmimic: example-guided deep reinforcement learning of physics-based character skills[J]. ACM Transactions on Graphics (TOG), 2018, 37 (4): 1- 14.
30 Hwangbo J , Lee J , Dosovitskiy A , et al. Learning agile and dynamic motor skills for legged robots[J]. Science Robotics, 2019, 4 (26): eaau5872.
doi: 10.1126/scirobotics.aau5872
31 Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[J]. arXiv preprint arXiv: 1801.01290, 2018.
32 Tan J, Zhang T, Coumans E, et al. Sim-to-real: learning agile locomotion for quadruped robots[J]. arXiv preprint arXiv: 1804.10332, 2018.
33 Hwangbo J , Lee J , Hutter M . Per-contact iteration method for solving contact dynamics[J]. IEEE Robotics and Automation Letters, 2018, 3 (2): 895- 902.
doi: 10.1109/LRA.2018.2792536
34 Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications[J]. arXiv preprint arXiv: 1812.05905, 2018.
35 Jain D, Iscen A, Caluwaerts K. Hierarchical reinforcement learning for quadruped locomotion[J]. arXiv preprint arXiv: 1905.08926, 2019.
36 Iscen A, Caluwaerts K, Tan J, et al. Policies modulating trajectory generators[J]. arXiv preprint arXiv: 1910.02812, 2019.
37 Frans K, Ho J, Chen X, et al. Meta learning shared hierarchies[J]. arXiv preprint arXiv: 1710.09767, 2017.
38 Kolvenbach H, Hampp E, Barton P, et al. Towards jumping locomotion for quadruped robots on the moon[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO : IEEE, 2019: 5459-5466.
39 Saputra AA, Toda Y, Takesue N, et al. A novel capabilities of quadruped robot moving through vertical ladder without handrail support[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 1448-1453.
40 Lee YH, Lee YH, Lee H, et al. Whole-body motion and landing force control for quadrupedal stair climbing[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4746-4751.
41 Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176.
doi: 10.1109/LRA.2019.2931284
42 Ha S, Xu P, Tan Z, et al. Learning to walk in the real world with minimal human effort[J]. arXiv preprint arXiv: 2002.08550, 2020.
[1] WU Qiang, HE Zekun, LIU Ju, CUI Xiaomeng, SUN Shuang, SHI Wei. A research on multi-modal MRI analysis based on machine learning for brain glioma [J]. Journal of Shandong University (Health Sciences), 2020, 58(8): 81-87.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] SUO Dongyang, SHEN Fei, GUO Hao, LIU Lichang, YANG Huimin, YANG Xiangdong. Expression and mechanism of Tim-3 in animal model of drug-induced acute kidney injury[J]. Journal of Shandong University (Health Sciences), 2020, 58(7): 1 -6 .
[2] ZHANG Baowen, LEI Xiangli, LI Jinna, LUO Xiangjun, ZOU Rong. miR-21-5p targeted TIMP3 to inhibit proliferation and extracellular matrix accumulation of mesangial cells in Type II diabetic nephropathy mice[J]. Journal of Shandong University (Health Sciences), 2020, 58(7): 7 -14 .
[3] LONG Tingting, XIE Ming, ZHOU Lu, ZHU Junde. Effect of Noggin protein on learning and memory abilities and the dentate gyrus structure after cerebral ischemia reperfusion injury in mice[J]. Journal of Shandong University (Health Sciences), 2020, 58(7): 15 -23 .
[4] FU Jieqi, ZHANG Man, ZHANG Xiaolu, LI Hui, CHEN Hong. Molecular mechanism of Toll-like receptor 4 in the aggravation of blood lipid accumulation by inhibiting the peroxisome proliferator-activate receptor γ[J]. Journal of Shandong University (Health Sciences), 2020, 58(7): 24 -31 .
[5] MA Qingyuan, PU Peidong, HAN Fei, WANG Chao, ZHU Zhoujun, WANG Weishan, SHI Chenhui. Effect of miR-27b-3p regulating SMAD1 on osteosarcoma cell proliferation, migration and invasion[J]. Journal of Shandong University (Health Sciences), 2020, 58(7): 32 -37 .
[6] LI Ning, LI Juan, XIE Yan, LI Peilong, WANG Yunshan, DU Lutao, WANG Chuanxin. Expression of LncRNA AL109955.1 in 80 cases of colorectal cancer and its effect on cell proliferation, migration and invasion[J]. Journal of Shandong University (Health Sciences), 2020, 58(7): 38 -46 .
[7] SHI Shuang, LI Juan, MI Qi, WANG Yunshan, DU Lutao, WANG Chuanxin. Construction and application of a miRNAs prognostic risk assessment model of gastric cancer[J]. Journal of Shandong University (Health Sciences), 2020, 58(7): 47 -52 .
[8] XIAO Juan, XIAO Qiang, CONG Wei, LI Ting, DING Shouluan, ZHANG Yuan, SHAO Chunchun, WU Mei, LIU Jianing, JIA Hongying. Comparison of diagnostic efficacy of two kinds of thyroid imagine reporting and data systems[J]. Journal of Shandong University (Health Sciences), 2020, 58(7): 53 -59 .
[9] DING Xiangyun, YU Qingmei, ZHANG Wenfang, ZHUANG Yuan, HAO Jing. Correlation of the expression of insulin-like growth factor II in granulosa cells and ovulation induction outcomes of 84 patients with polycystic ovary syndrome[J]. Journal of Shandong University (Health Sciences), 2020, 58(7): 60 -66 .
[10] XU Yuxiang, LIU Yudong, ZHANG Peng, DUAN Ruisheng. A retrospective analysis of risk factors of cerebral microbleeds in 101 patients with cerebral small vessel disease[J]. Journal of Shandong University (Health Sciences), 2020, 58(7): 67 -71 .