Analysis of Model-Free Reinforcement Learning Algorithm for Target Tracking

Muhammad Fikry, Rizal Tjut Adek, Zulfhazli Zulfhazli, Subhan Hartanto, Taufiqurrahman Taufiqurrahman, Dyah Ika Rinawati

Abstract


Target tracking is a process that can find points in different domains. In tracking, some places contain prizes (positive or negative values) that the agent does not know at first. Therefore, the agent, which is a system, must learn to get the maximum value with various learning rates. Reinforcement learning is a machine learning technique in which agents learn through interaction with the environment using reward functions and probabilistic dynamics to allow agents to explore and learn about the environment through various iterations. Thus, for each action taken, the agent receives a reward from the environment, which determines positive or negative behavior. The agent's goal is to maximize the total reward received during the interaction. In this case, the agent will study three different modules, namely sidewalk, obstacle, and product, using the Q-learning algorithm. Each module will be training with various learning rates and rewards. Q-learning can work effectively with the highest final reward at a learning rate of 0.8 for 500 rounds with an epsilon of 0.9.


Keywords


Algorithm; Machine learning; Probabilistic; Q-Learning; Reinforcement learning; Target tracking

Full Text:

PDF

References


Goli, A., Khademi-Zare, H., Tavakkoli-Moghaddam, R., Sadeghieh, A., Sasanian, M., and Malekalipour-Kordestanizadeh, R. (2021). An integrated approach based on artificial intelligence and novel meta-heuristic algorithms to predict demand for dairy products: a case study. Network: Computation in Neural Systems, 32(1), 1-35.

Sivaram, M., Batri, K., Amin Salih, M., and Porkodi, V. (2019). Exploiting the local optima in genetic algorithm using tabu search. Indian Journal of Science and Technology, 12(1), 1-13.

Yang, B., Guo, L., Guo, R., Zhao, M., and Zhao, T. (2020). A novel trilateration algorithm for RSSI-based indoor localization. IEEE Sensors Journal, 20(14), 8164-8172.

Botvinick, M., Ritter, S., Wang, J. X., Kurth-Nelson, Z., Blundell, C., and Hassabis, D. (2019). Reinforcement learning, fast and slow. Trends in Cognitive Sciences, 23(5), 408-422.

Dabbaghjamanesh, M., Moeini, A., and Kavousi-Fard, A. (2020). Reinforcement learning-based load forecasting of electric vehicle charging station using q-learning technique. IEEE Transactions on Industrial Informatics, 17(6), 4229-4237.

Faruk, A., and Cahyono, E.S. (2018). Prediction and classification of low birth weight data using machine learning techniques. Indonesian Journal of Science and Technology, 3(1), 18-28.

Mohamed, S., Rosca, M., Figurnov, M., and Mnih, A. (2020). Monte carlo gradient estimation in machine learning. The Journal of Machine Learning Research, 21(1), 5183-5244.

Netrapalli, P. (2019). Stochastic gradient descent and its variants in machine learning. Journal of the Indian Institute of Science, 99(2), 201-213.

Jang, B., Kim, M., Harerimana, G., and Kim, J. W. (2019). Q-learning algorithms: a comprehensive classification and applications. IEEE Access, 7(1), 133653-133667.

Peng, Z., Luo, R., Hu, J., Shi, K., Nguang, S. K., and Ghosh, B. K. (2021). Optimal tracking control of nonlinear multiagent systems using internal reinforce q-learning. IEEE Transactions on Neural Networks and Learning Systems, 33(8), 4043-4055.

Liu, R., Nageotte, F., Zanne, P., de Mathelin, M., and Dresp-Langley, B. (2021). Deep reinforcement learning for the control of robotic manipulation: a focused mini-review. Robotics, 10(1), 1-22.

Zhang, T., and Mo, H. (2021). Reinforcement learning for robot research: a comprehensive review and open issues. International Journal of Advanced Robotic Systems, 18(3), 1-22.

Jiang, S., Huang, Z., and Ji, Y. (2020). Adaptive UAV-assisted geographic routing with q-learning in vanet. IEEE Communications Letters, 25(4), 1358-1362.

Jiang, L., Huang, H., and Ding, Z. (2019). Path planning for intelligent robots based on deep q-learning with experience replay and heuristic knowledge. IEEE/CAA Journal of Automatica Sinica, 7(4), 1179-1189.

Dittrich, M. A., and Fohlmeister, S. (2020). Cooperative multi-agent system for production control using reinforcement learning. CIRP Annals, 69(1), 389-392.

Li, Q., Meng, X., Gao, F., Zhang, G., and Chen, W. (2021). Approximate cost-optimal energy management of hydrogen electric multiple unit trains using double q-learning algorithm. IEEE Transactions on Industrial Electronics, 69(9), 9099-9110.

Vimal, S., Khari, M., Crespo, R. G., Kalaivani, L., Dey, N., and Kaliappan, M. (2020). Energy enhancement using multiobjective ant colony optimization with double q-learning algorithm for IoT based cognitive radio networks. Computer Communications, 154(1), 481-490.

Boussakssou, M., Hssina, B., and Erittali, M. (2020). Towards an adaptive e-learning system based on q-learning algorithm. Procedia Computer Science, 170(1), 1198-1203.

Genders, W., and Razavi, S. (2019). Asynchronous n-step q-learning adaptive traffic signal control. Journal of Intelligent Transportation Systems, 23(4), 319-331.

Qiu, C., Yao, H., Yu, F. R., Xu, F., and Zhao, C. (2019). Deep q-learning aided networking, caching, and computing resources allocation in software-defined satellite-terrestrial networks. IEEE Transactions on Vehicular Technology, 68(6), 5871-5883.




DOI: https://doi.org/10.17509/coelite.v1i1.43795

Refbacks

  • There are currently no refbacks.


Journal of Computer Engineering, Electronics and Information Technology (COELITE)


is published by UNIVERSITAS PENDIDIKAN INDONESIA (UPI),
and managed by Department of Computer Enginering.
Jl. Dr. Setiabudi No.229, Kota Bandung, Indonesia - 40154
email: coelite@upi.edu
e-ISSN: 2829-4149
p-ISSN: 2829-4157