山东大学学报 (医学版) ›› 2020, Vol. 1 ›› Issue (8): 61-66.doi: 10.6040/j.issn.1671-7554.0.2020.0607
Wei ZHANG*(
),Wenhao TAN,Yibin LI
摘要:
受类脑计算启发的深度强化学习在人工智能、机器人等诸多领域中都取得了巨大的成功,该方法通过结合深度学习与强化学习获得了优异的场景感知能力与任务决策能力。本文首先介绍了两类应用较为广泛的深度强化学习方法及其基本原理,并通过回顾深度强化学习在四足机器人运动控制上的应用现状讨论了该方法的研究进展,最后通过总结现有方法及腿足机器人控制特点,对深度强化学习在四足机器人上的应用前景进行了展望。
中图分类号:
| 1 |
Hopfield JJ . Neural networks and physical systems with emergent collective computational abilities[J]. NAS, 1982, 79 (8): 2554- 2558.
doi: 10.1073/pnas.79.8.2554 |
| 2 | Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks[C]// Pereira F, Burges CJC, Bottou L. Advances in neural information processing systems. Lake Tahoe: Neural Information Processing Systems Conference, 2012: 1097-1105. |
| 3 |
Russakovsky O , Deng J , Su H , et al. Imagenet large scale visual recognition challenge[J]. IJCV, 2015, 115 (3): 211- 252.
doi: 10.1007/s11263-015-0816-y |
| 4 | Graves A, Mohamed A, Hinton G. Speech recognition with deep recurrent neural networks[C]// Krishnamurthy V, Plataniotis K. 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Vancouver: IEEE, 2013: 6645-6649. |
| 5 | Cho K, Van Merri e ¨ nboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation[J]. arXiv preprint arXiv: 1406.1078, 2014. |
| 6 | Karpathy A, Toderici G, Shetty S, et al. Large-scale video classification with convolutional neural networks[C]// Basri R, Fermuller C, Martinez A. Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. Columbus: IEEE, 2014: 1725-1732. |
| 7 | Mnih V, Kavukcuoglu K, Silver D, et al. Playing atari with deep reinforcement learning[J]. arXiv preprint arXiv: 1312.5602, 2013. |
| 8 | Wu Y, Zhang W, Song K. Master-slave curriculum design for reinforcement learning[C]// Lang J. IJCAI. Stockholm: International Joint Conferences on Artificial Intelligence, 2018: 1523-1529. |
| 9 |
Mnih V , Kavukcuoglu K , Silver D , et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518 (7540): 529- 533.
doi: 10.1038/nature14236 |
| 10 |
Silver D , Huang A , Maddison CJ , et al. Mastering the game of Go with deep neural networks and tree search[J]. Nature, 2016, 529 (7587): 484- 489.
doi: 10.1038/nature16961 |
| 11 | Lillicrap TP, Hunt JJ, Pritzel A, et al. Continuous control with deep reinforcement learning[J]. arXiv preprint arXiv: 1509.02971, 2015. |
| 12 | Zhang J, Zhang W, Song R, et al. Grasp for stacking via deep reinforcement learning[C]// Kallio P, Burdet E. 2020 IEEE The International Conference on Robotics and Automation (ICRA), Virtual Conference: IEEE, 2020. |
| 13 | Duan Y, Chen X, Houthooft R, et al. Benchmarking deep reinforcement learning for continuous control[C]. // Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1329-1338. |
| 14 | Gu S, Lillicrap T, Sutskever I, et al. Continuous deep Q-learning with model-based acceleration[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 2829-2838. |
| 15 | Hansen S. Using deep Q-learning to control optimization hyperparameters[J]. arXiv preprint arXiv: 1602.04062, 2016. |
| 16 | Wu Y, Rao Z, Zhang W, et al. Exploring the task cooperation in multi-goal visual navigation[C]// Hentenryck PV, Zhou ZH. Proceedings of the 28th International Joint Conference on Artificial Intelligence. Hawaii: AAAI Press, 2019: 609-615. |
| 17 | Andrychowicz M, Denil M, Gomez S, et al. Learning to learn by gradient descent by gradient descent[C]// Lee DD, Sugiyama M, Luxburg UV. Advances in neural information processing systems. Barcelona: Neural Information Processing Systems Conference, 2016: 3981-3989. |
| 18 | Oh J, Guo X, Lee H, et al. Action-conditional video prediction using deep networks in atari games[C]// Cortes C, Lawrence ND, Lee DD. Advances in neural information processing systems. Montréal: Neural Information Processing Systems Conference, 2015: 2863-2871. |
| 19 | Caicedo JC, Lazebnik S. Active object localization with deep reinforcement learning[C]// Ikeuchi K, Schnörr C, Sivic J. Proceedings of the IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 2488-2496. |
| 20 |
Watkins CJCH , Dayan P . Q-learning[J]. Machine Learning, 1992, 8 (3-4): 279- 292.
doi: 10.1007/BF00992698 |
| 21 | Van Hasselt H, Guez A, Silver D. Deep reinforcement learning with double q-learning[C]// Schuurmans D, Wellman M. Thirtieth AAAI conference on Artificial Intelligence. Phoenix: AAAI Press, 2016. |
| 22 | Hausknecht M, Stone P. Deep recurrent Q-learning for partially observable mdps[C]// Bonet B, Koenig S. 2015 AAAI Fall Symposium Series. Austin: AAAI Press, 2015. |
| 23 | Sutton RS, McAllester DA, Singh SP, et al. Policy gradient methods for reinforcement learning with function approximation[C]// Leen TK, Dietterich TG, Tresp V. Advances in neural information processing systems. Denver: Neural Information Processing Systems Conference, 2000: 1057-1063. |
| 24 | Mnih V, Badia AP, Mirza M, et al. Asynchronous methods for deep reinforcement learning[C]// Balcan N, Weinberger K. International Conference on Machine Learning. New York City: PMLR, 2016: 1928-1937. |
| 25 | Schulman J, Wolski F, Dhariwal P, et al. Proximal policy optimization algorithms[J]. arXiv preprint arXiv: 1707.06347, 2017. |
| 26 | Fujimoto S, Van Hoof H, Meger D. Addressing function approximation error in actor-critic methods[J]. arXiv preprint arXiv: 1802.09477, 2018. |
| 27 |
Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176.
doi: 10.1109/LRA.2019.2931284 |
| 28 | Jin B, Sun C, Zhang A, et al. Joint torque estimation toward dynamic and compliant control for gear-driven torque sensorless quadruped robot[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4630-4637. |
| 29 | Peng XB , Abbeel P , Levine S , et al. Deepmimic: example-guided deep reinforcement learning of physics-based character skills[J]. ACM Transactions on Graphics (TOG), 2018, 37 (4): 1- 14. |
| 30 |
Hwangbo J , Lee J , Dosovitskiy A , et al. Learning agile and dynamic motor skills for legged robots[J]. Science Robotics, 2019, 4 (26): eaau5872.
doi: 10.1126/scirobotics.aau5872 |
| 31 | Haarnoja T, Zhou A, Abbeel P, et al. Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor[J]. arXiv preprint arXiv: 1801.01290, 2018. |
| 32 | Tan J, Zhang T, Coumans E, et al. Sim-to-real: learning agile locomotion for quadruped robots[J]. arXiv preprint arXiv: 1804.10332, 2018. |
| 33 |
Hwangbo J , Lee J , Hutter M . Per-contact iteration method for solving contact dynamics[J]. IEEE Robotics and Automation Letters, 2018, 3 (2): 895- 902.
doi: 10.1109/LRA.2018.2792536 |
| 34 | Haarnoja T, Zhou A, Hartikainen K, et al. Soft actor-critic algorithms and applications[J]. arXiv preprint arXiv: 1812.05905, 2018. |
| 35 | Jain D, Iscen A, Caluwaerts K. Hierarchical reinforcement learning for quadruped locomotion[J]. arXiv preprint arXiv: 1905.08926, 2019. |
| 36 | Iscen A, Caluwaerts K, Tan J, et al. Policies modulating trajectory generators[J]. arXiv preprint arXiv: 1910.02812, 2019. |
| 37 | Frans K, Ho J, Chen X, et al. Meta learning shared hierarchies[J]. arXiv preprint arXiv: 1710.09767, 2017. |
| 38 | Kolvenbach H, Hampp E, Barton P, et al. Towards jumping locomotion for quadruped robots on the moon[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO : IEEE, 2019: 5459-5466. |
| 39 | Saputra AA, Toda Y, Takesue N, et al. A novel capabilities of quadruped robot moving through vertical ladder without handrail support[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 1448-1453. |
| 40 | Lee YH, Lee YH, Lee H, et al. Whole-body motion and landing force control for quadrupedal stair climbing[C]// Arai F. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). MACAO: IEEE, 2019: 4746-4751. |
| 41 |
Jenelten F , Hwangbo J , Tresoldi F , et al. Dynamic locomotion on slippery ground[J]. IEEE Robotics and Automation Letters, 2019, 4 (4): 4170- 4176.
doi: 10.1109/LRA.2019.2931284 |
| 42 | Ha S, Xu P, Tan Z, et al. Learning to walk in the real world with minimal human effort[J]. arXiv preprint arXiv: 2002.08550, 2020. |
| [1] | 廖原,门丹,李一帆,李怀臣,龙飞,刘铱. PM2.5短期暴露对结核病发病风险的个体精准评估[J]. 山东大学学报 (医学版), 2026, 64(3): 116-123. |
| [2] | 刘禹,霍娅娅,龚丞,梁婷,李斌. 基于机器学习的骨科生物材料设计与优化[J]. 山东大学学报 (医学版), 2026, 64(2): 22-33. |
| [3] | 王建民,李晓峰,由志涛,董圣杰,赵宇驰,李占菊,邹德鑫,张剑锋,孙涛,杜伟. 基于可解释机器学习的后路腰椎椎体间融合术后慢性疼痛风险预测模型构建[J]. 山东大学学报 (医学版), 2026, 64(2): 78-88. |
| [4] | 王丽云,高天勤,刘雨佳,陈青,陈柳,沙凯辉. 基于机器学习产后压力性尿失禁风险预测模型的构建及验证[J]. 山东大学学报 (医学版), 2025, 63(6): 55-66. |
| [5] | 王静,刘晓菲,曾荣,许长娟,张锦涛,董亮. 基于机器学习算法鉴定哮喘的坏死性凋亡相关生物标志物[J]. 山东大学学报 (医学版), 2024, 62(7): 21-32. |
| [6] | 郭振江,王宁,赵光远,杜立强,崔朝勃,刘防震. 基于机器学习建立术前预测近端胃癌食管切缘阳性模型[J]. 山东大学学报 (医学版), 2024, 62(7): 78-83. |
| [7] | 梁永媛,蔡培飞,郑桂喜. 基于多检验变量和机器学习算法的结肠癌诊断模型建立及价值评估[J]. 山东大学学报 (医学版), 2024, 62(2): 51-59. |
| [8] | 王玉涛,孙岩. 单细胞转录组学与机器学习综合分析揭示腹主动脉瘤潜在生物标志物[J]. 山东大学学报 (医学版), 2024, 62(11): 40-53. |
| [9] | 张景慧,王娟,赵玉洁,段淼,刘毅然,林敏娟,谯旭,李真,左秀丽. 基于机器学习的胃肠道疾病舌诊模型构建[J]. 山东大学学报 (医学版), 2024, 62(1): 38-47. |
| [10] | 刘亚军,郎昭,郭安忆,刘文勇. 骨科冲击波治疗的智能化发展现状及趋势分析[J]. 山东大学学报 (医学版), 2023, 61(3): 7-13. |
| [11] | 吴南,仉建国,朱源棚,陈癸霖,陈泽夫. 人工智能在脊柱畸形诊疗中的应用[J]. 山东大学学报 (医学版), 2023, 61(3): 14-20. |
| [12] | 朱正阳,沈靖菲,陈思璇,叶梅萍,杨惠泉,周佳南,梁雪,张鑫,张冰. 磁敏感加权成像不同影像组学模型预测胶质瘤IDH基因突变[J]. 山东大学学报 (医学版), 2023, 61(12): 44-50. |
| [13] | 巨艳丽,王丽华,成芳,黄凤艳,陈学禹,贾红英. 基于机器学习构建放射性碘治疗疗效的预测模型[J]. 山东大学学报 (医学版), 2023, 61(1): 94-99. |
| [14] | 况利,徐小明,曾琪. 机器学习用于自杀研究的综述[J]. 山东大学学报 (医学版), 2022, 60(4): 10-16. |
| [15] | 姜震,孙静,邹雯,王唱唱,高琦. 基于两种机器学习算法的双相情感障碍患者自杀行为影响因素模型比较研究[J]. 山东大学学报 (医学版), 2022, 60(1): 101-108. |
|
||