Q-LEARNING, POLICY ITERATION AND ACTOR-CRITIC REINFORCEMENT LEARNING COMBINED WITH METAHEURISTIC ALGORITHMS IN SERVO SYSTEM CONTROL

Iuliu Alexandru Zamfirache, Radu-Emil Precup, Emil M. Petriu

DOI Number
https://doi.org/10.22190/FUME231011044Z
First page
615
Last page
630

Abstract


This paper carries out the performance analysis of three control system structures and approaches, which combine Reinforcement Learning (RL) and Metaheuristic Algorithms (MAs) as representative optimization algorithms. In the first approach, the Gravitational Search Algorithm (GSA) is employed to initialize the parameters (weights and biases) of the Neural Networks (NNs) involved in Deep Q-Learning by replacing the traditional way of initializing the NNs based on random generated values. In the second approach, the Grey Wolf Optimizer (GWO) algorithm is employed to train the policy NN in Policy Iteration RL-based control. In the third approach, the GWO algorithm is employed as a critic in an Actor-Critic framework, and used to evaluate the performance of the actor NN. The goal of this paper is to analyze all three RL-based control approaches, aiming to determine which one represents the best fit for solving the proposed control optimization problem. The performance analysis is based on non-parametric statistical tests conducted on the data obtained from real-time experimental results specific to nonlinear servo system position control.

Keywords

Reinforcement Learning, Policy Iteration, Actor-Critic, Q-learning, Gravitational Search Algorithm, Grey Wolf Optimizer

Full Text:

PDF

References


Precup, R.-E., Roman, R.-C., Safaei, A., 2021, Data-Driven Model-Free Controllers, 1st Edition. CRC Press, Taylor & Francis, Boca Raton, FL.

Sutton, R.S., Barto, A.G., 2017, Reinforcement Learning: An Introduction, 2nd Edition. MIT Press, Cambridge, MA, London.

Sutton, R.S., Barto, A.G., Williams, R.J., 1992, Reinforcement learning is direct adaptive optimal control, IEEE Control Systems Magazine, 12(2), pp. 19-22.

Busoniu, L., de Bruin, T., Tolić, D., Kober, J., Palunko, I., 2018, Reinforcement learning for control: performance, stability, and deep approximators, Annual Reviews in Control, 46(1), pp. 8-28.

Ganaie, M.A., Hu, M.-H., Malik, A.K., Tanveer. M., Suganthan, P.N., 2022, Ensemble deep learning: A review, Engineering Applications of Artificial Intelligence, 115, paper 105151.

Precup, R.-E., David, R.-C., 2019, Nature-inspired Optimization Algorithms for Fuzzy Controlled Servo Systems. Butterworth-Heinemann, Elsevier, Oxford.

Precup, R.-E., Angelov, P., Costa, B.S.J., Sayed-Mouchaweh, M., 2015, An overview on fault diagnosis and nature-inspired optimal control of industrial process applications, Computers in Industry, 74, pp. 75-94.

Stanley, K.O., Clune, J., Lehman, J., Miikkulainen, R., 2019, Designing neural networks through neuroevolution, Nature Machine Intelligence, 1, pp. 24-35.

Sehgal, A., La, H.M., Louis, S.J., Nguyen, H., 2019, Deep reinforcement learning using genetic algorithm for parameter optimization, Proc. 2019 3rd IEEE International Conference on Robotic Computing, Naples, Italy, pp. 596-601.

Ajani, O.S., Mallipeddi, R., 2022, Adaptive evolution strategy with ensemble of mutations for Reinforcement Learning, Knowledge-Based Systems, 245, paper 108624.

Goulart, D.A., Pereira, R.D., 2020, Autonomous pH control by reinforcement learning for electroplating industry wastewater, Computers & Chemical Engineering, 140, paper 106909.

Lin, H.-W., Wu, Q.-Y., Liu, D.-R., Zhao, B., Yang, Q.-M., 2019, Fault tolerant control for nonlinear systems based on adaptive dynamic programming with particle swarm optimization, Proc 10th International Conference on Intelligent Control and Information Processing, Marrakesh, Morocco, pp. 322-326.

Liu, X., Zhao, B., Liu, D., 2020, Fault tolerant tracking control for nonlinear systems with actuator failures through particle swarm optimization-based adaptive dynamic programming, Applied Soft Computing, 97(A), paper 106766.

Hein, D., Hentschel, A., Runkler, T., Udluft, S., 2017, Particle swarm optimization for generating interpretable fuzzy reinforcement learning policies, Engineering Applications of Artificial Intelligence, 65, pp. 87-98.

Piperagkas, G.S., Georgoulas, G., Parsopoulos, K.E., Stylios, C.D., Likas, A.C., 2012, Integrating particle swarm optimization with reinforcement learning in noisy problems, Proc. 14th Annual Conference on Genetic and Evolutionary Computation, Philadelphia, PA, USA, pp. 65-72.

Iima, H., Kuroe, Y., 2008, Swarm reinforcement learning algorithms based on particle swarm optimization, Proc. 2008 IEEE International Conference on Systems, Man and Cybernetics, Singapore, Singapore, pp. 1110-1115.

Hein, D., Hentschel, A., Runkler, T., Udluft, S., 2016, Reinforcement learning with Particle Swarm Optimization Policy (PSO-P) in continuous state and action spaces, International Journal of Swarm Intelligence Research, 7(3), pp. 23-42.

Meerza, S.I., Islam, M., Uzzal, M.M., 2019, Q-learning based particle swarm optimization algorithm for optimal path planning of swarm of mobile robots, Proc. 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology, Dhaka, Bangladesh, pp. 1-5.

Gao, Y.-Z., Ye, J.-W., Chen, Y.-M., Liang, F.-L., 2009, Q-learning based on particle swarm optimization for positioning system of underwater vehicles, Proc. 2009 IEEE International Conference on Intelligent Computing and Intelligent Systems, Shanghai, China, vol. 2, pp. 68-71.

Zhang, P., Li, H., Ha, Q.P., Yin, Z.-Y., Chen, R.-P., 2020, Reinforcement learning based optimizer for improvement of predicting tunneling-induced ground responses, Advanced Engineering Informatics, 45, paper 101097.

Mirjalili, S., 2015, How effective is the grey wolf optimizer in training multi-layer perceptrons, Applied Intelligence, 43(1), pp. 150-161.

Zamfirache, I. A., Precup, R.-E., Roman, R.-C., Petriu, E.M., 2022, Reinforcement learning-based control using Q-learning and gravitational search algorithm with experimental validation on a nonlinear servo system, Information Sciences, 583, pp. 99-120.

Zamfirache, I. A., Precup, R.-E., Roman, R.-C., Petriu, E.M., 2022, Policy iteration reinforcement learning-based control using a grey wolf optimizer algorithm, Information Sciences, 585, pp. 162-175.

Zamfirache, I. A., Precup, R.-E., Roman, R.-C., Petriu, E.M., 2023, Neural network-based control using actor-critic reinforcement learning and grey wolf optimizer with experimental servo system validation, Expert Systems with Applications, 225, paper 120112.

Precup, R.-E., David, R.-C., Petriu, E.M., 2017, Grey wolf optimizer algorithm-based tuning of fuzzy control systems with reduced parametric sensitivity, IEEE Transactions on Industrial Electronics, 64(1), pp. 527-534.

Zamfirache, I. A., Precup, R.-E., Petriu, E.M., Oct. 2022, Data obtained by 30 independent runs of all algorithms. [Online]. Available: http://www.aut.upt.ro/~rprecup/Data_FUME.m.

Božanić, D., Tešić, D., Marinković, D., Milić, A., 2021, Modeling of neuro-fuzzy system as a support in decision-making processes, Reports in Mechanical Engineering, 2(1), pp. 222-234.

Filip, F.G., 2021, Automation and computers and their contribution to human well-being and resilience, Studies in Informatics and Control, 30(4), pp. 5-18.

Milićević, I., Popović, M., Dučić, N., Vujičić, V., Stepanić, P., Marinković, D., Ćojbašić, Ž., 2022, Improving the mechanical characteristics of the 3D printing objects using hybrid machine learning approach, Facta Universitatis, Series: Mechanical Engineering, DOI: 10.22190/FUME220429036M.

Bejinariu, S.I., Costin, H., Rotaru, F., Niţă, C., Luca, R., Lazăr, C., 2014, Parallel processing and bio-inspired computing for biomedical image registration, Computer Science Journal of Moldova, 22(2), pp. 253-277.

Rigatos, G., Siano, P., Selisteanu, D., Precup, R.-E., 2017, Nonlinear optimal control of oxygen and carbon dioxide levels in blood, Intelligent Industrial Systems, 3(2), pp. 61-75.

Gerger, M., Gumuscu, A., 2022, Diagnosis of Parkinson’s disease using spiral test based on pattern recognition, Romanian Journal of Information Science and Technology, 25(1), pp. 100-113.

Ogutcu, S., Inal, M., Celikhasi, C., Yildiz, U., Dogan N.O., Pekdemir, M., 2022, Early detection of mortality in COVID-19 patients through laboratory findings with factor analysis and artificial neural networks, Romanian Journal of Information Science and Technology, 25(3-4), pp. 290-302.

Haber-Haber, R., Haber, R., Schmittdiel, M., del Toro, R.M., 2007, A classic solution for the control of a high-performance drilling process, International Journal of Machine Tools and Manufacture, 47(15), pp. 2290-2297.

Precup, R.-E., Preitl, S., Balas, M., Balas, V., 2004, Fuzzy controllers for tire slip control in anti-lock braking systems, Proc. 2004 IEEE International Conference on Fuzzy Systems, Budapest, Hungary, vol. 3, pp. 1317-1322.

Tomescu, M.L., Preitl, S., Precup, R.-E., Tar, J.K., 2007, Stability analysis method for fuzzy control systems dedicated controlling nonlinear processes, Acta Polytechnica Hungarica, 4(3), pp. 127-141.

Precup, R.-E., Preitl, S., Petriu, E.M., Bojan-Dragos, C.-A., Szedlak-Stinean, A.-I., Roman, R.-C., Hedrea E.-L., 2020, Model-based fuzzy control results for networked control systems, Reports in Mechanical Engineering, 1(1), pp. 10-25.

Škrjanc, I., Blažič, S., Angelov, P., 2014, Robust evolving cloud-based PID control adjusted by gradient learning method, Proc. 2014 IEEE Conference on Evolving and Adaptive Intelligent Systems, Linz, Austria, pp. 1-6.

Vaščák, J., Hvizdoš, J., Puheim, M., 2016, Agent-based cloud computing systems for traffic management, Proc. 2016 International Conference on Intelligent Networking and Collaborative Systems, Ostrava, Czech Republic, pp. 73-79.

Zhang, L.-Y., Ma, J., Liu, X.-F., Zhang, M., Duan, X.-K., Wang, Z., 2022, A novel support vector machine model of traffic state identification of urban expressway integrating parallel genetic and C-means clustering algorithm, Tehnički vjesnik - Technical gazette, 29(3), pp. 731-741.

Osaba, E., Villar-Rodriguez, E., Oregi, I., Moreno-Fernandez-de-Leceta, A., 2021, Hybrid quantum computing-tabu search algorithm for partitioning problems: preliminary study on the traveling salesman problem, Proc. 2021 IEEE Congress on Evolutionary Computation, Kraków, Poland, pp. 351-358.

Precup, R.-E., Haidegger, T., Preitl, S., Benyó, B., Paul, A.S., Kovács, L., 2012, Fuzzy control solution for telesurgical applications, Applied and Computational Mathematics, 11(3), pp. 378-397.




DOI: https://doi.org/10.22190/FUME231011044Z

Refbacks

  • There are currently no refbacks.


ISSN: 0354-2025 (Print)

ISSN: 2335-0164 (Online)

COBISS.SR-ID 98732551

ZDB-ID: 2766459-4