山东大学学报 (医学版) ›› 2020, Vol. 58 ›› Issue (8): 61-66.doi: 10.6040/j.issn.1671-7554.0.2020.0607
Wei ZHANG*(),Wenhao TAN,Yibin LI
摘要:
受类脑计算启发的深度强化学习在人工智能、机器人等诸多领域中都取得了巨大的成功,该方法通过结合深度学习与强化学习获得了优异的场景感知能力与任务决策能力。本文首先介绍了两类应用较为广泛的深度强化学习方法及其基本原理,并通过回顾深度强化学习在四足机器人运动控制上的应用现状讨论了该方法的研究进展,最后通过总结现有方法及腿足机器人控制特点,对深度强化学习在四足机器人上的应用前景进行了展望。
中图分类号:
1 |
Hopfield JJ . Neural networks and physical systems with emergent collective computational abilities[J]. NAS, 1982, 79 (8): 2554- 2558.
doi: 10.1073/pnas.79.8.2554 |
2 | Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks[C]// Pereira F, Burges CJC, Bottou L. Advances in neural information processing systems. Lake Tahoe: Neural Information Processing Systems Conference, 2012: 1097-1105. |
3 |
Russakovsky O , Deng J , Su H , et al. Imagenet large scale visual recognition challenge[J]. IJCV, 2015, 115 (3): 211- 252.
doi: 10.1007/s11263-015-0816-y |
4 | Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks[C]// Krishnamurthy V, Plataniotis K. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 6645-6649. |
5 | Cho K, Van Merri e ¨ nboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv: 1406.1078, 2014. |
6 | Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]// Basri R, Fermuller C, Martinez A. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1725-1732. |
7 | Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv: 1312.5602, 2013. |
8 | Wu Y, Zhang W, Song K. Master-slave curriculum design for reinforcement learning[C]// Lang J. IJCAI. Stockholm: International Joint Conferences on Artificial Intelligence, 2018: 1523-1529. |
9 |
Mnih V , Kavukcuoglu K , Silver D , et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533.
doi: 10.1038/nature14236 |
10 |
Silver D , Huang A , Maddison CJ , et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529 (7587): 484- 489.
doi: 10.1038/nature16961 |
11 | Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv: 1509.02971, 2015. |
12 | Zhang J, Zhang W, Song R, et al. Grasp for stacking via deep reinforcement learning[C]// Kallio P, Burdet E. 2020 IEEE The International Conference on Robotics and Automation (ICRA), Virtual Conference: IEEE, 2020. |
13 | Duan Y, Chen X, Houthooft R, et al. Benchmarking deep reinforcement learning for continuous control[C]. // Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1329-1338. |
14 | Gu S, Lillicrap T, Sutskever I, et al. Continuous deep Q-learning with model-based acceleration[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 2829-2838. |
15 | Hansen S. Using deep Q-learning to control optimization hyperparameters[J]. arXiv preprint arXiv: 1602.04062, 2016. |
16 | Wu Y, Rao Z, Zhang W, et al. Exploring the task cooperation in multi-goal visual navigation[C]// Hentenryck PV, Zhou ZH. Proceedings of the 28th International Joint Conference on Artificial Intelligence. Hawaii: AAAI Press, 2019: 609-615. |
17 | Andrychowicz M, Denil M, Gomez S, et al. Learning to learn by gradient descent by gradient descent[C]// Lee DD, Sugiyama M, Luxburg UV. Advances in neural information processing systems. Barcelona: Neural Information Processing Systems Conference, 2016: 3981-3989. |
18 | Oh J, Guo X, Lee H, et al. Action-conditional video prediction using deep networks in atari games[C]// Cortes C, Lawrence ND, Lee DD. Advances in neural information processing systems. Montréal: Neural Information Processing Systems Conference, 2015: 2863-2871. |
19 | Caicedo JC, Lazebnik S. Active object localization with deep reinforcement learning[C]// Ikeuchi K, Schnörr C, Sivic J. Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 2488-2496. |
20 |
Watkins CJCH , Dayan P . Q-learning[J]. Machine Learning, 1992, 8 (3-4): 279- 292.
doi: 10.1007/BF00992698 |
21 | Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]// Schuurmans D, Wellman M. Thirtieth AAAI conference on Artificial Intelligence. Phoenix: AAAI Press, 2016. |
22 | Hausknecht M, Stone P. Deep recurrent Q-learning for partially observable mdps[C]// Bonet B, Koenig S. 2015 AAAI Fall Symposium Series. Austin: AAAI Press, 2015. |
23 | Sutton RS, McAllester DA, Singh SP, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Leen TK, Dietterich TG, Tresp V. Advances in neural information processing systems. Denver: Neural Information Processing Systems Conference, 2000: 1057-1063. |
24 | Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1928-1937. |
25 | Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv: 1707.06347, 2017. |
26 | Fujimoto S, Van Hoof H, Meger D. Addressing function approximation error in actor-critic methods[J]. arXiv preprint arXiv: 1802.09477, 2018. |
27 |
Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176.
doi: 10.1109/LRA.2019.2931284 |
28 | Jin B, Sun C, Zhang A, et al. Joint torque estimation toward dynamic and compliant control for gear-driven torque sensorless quadruped robot[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4630-4637. |
29 | Peng XB , Abbeel P , Levine S , et al. Deepmimic: example-guided deep reinforcement learning of physics-based character skills[J]. ACM Transactions on Graphics (TOG), 2018, 37 (4): 1- 14. |
30 |
Hwangbo J , Lee J , Dosovitskiy A , et al. Learning agile and dynamic motor skills for legged robots[J]. Science Robotics, 2019, 4 (26): eaau5872.
doi: 10.1126/scirobotics.aau5872 |
31 | Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[J]. arXiv preprint arXiv: 1801.01290, 2018. |
32 | Tan J, Zhang T, Coumans E, et al. Sim-to-real: learning agile locomotion for quadruped robots[J]. arXiv preprint arXiv: 1804.10332, 2018. |
33 |
Hwangbo J , Lee J , Hutter M . Per-contact iteration method for solving contact dynamics[J]. IEEE Robotics and Automation Letters, 2018, 3 (2): 895- 902.
doi: 10.1109/LRA.2018.2792536 |
34 | Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications[J]. arXiv preprint arXiv: 1812.05905, 2018. |
35 | Jain D, Iscen A, Caluwaerts K. Hierarchical reinforcement learning for quadruped locomotion[J]. arXiv preprint arXiv: 1905.08926, 2019. |
36 | Iscen A, Caluwaerts K, Tan J, et al. Policies modulating trajectory generators[J]. arXiv preprint arXiv: 1910.02812, 2019. |
37 | Frans K, Ho J, Chen X, et al. Meta learning shared hierarchies[J]. arXiv preprint arXiv: 1710.09767, 2017. |
38 | Kolvenbach H, Hampp E, Barton P, et al. Towards jumping locomotion for quadruped robots on the moon[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO : IEEE, 2019: 5459-5466. |
39 | Saputra AA, Toda Y, Takesue N, et al. A novel capabilities of quadruped robot moving through vertical ladder without handrail support[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 1448-1453. |
40 | Lee YH, Lee YH, Lee H, et al. Whole-body motion and landing force control for quadrupedal stair climbing[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4746-4751. |
41 |
Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176.
doi: 10.1109/LRA.2019.2931284 |
42 | Ha S, Xu P, Tan Z, et al. Learning to walk in the real world with minimal human effort[J]. arXiv preprint arXiv: 2002.08550, 2020. |
[1] | 吴强,何泽鲲,刘琚,崔晓萌,孙双,石伟. 基于机器学习的脑胶质瘤多模态影像分析[J]. 山东大学学报 (医学版), 2020, 58(8): 81-87. |
|