COMPARATIVE ANALYSIS OF SOFTMAX AND UPPER CONFIDENCE BOUND IN GAME-PLAYING AGENTS FOR FLAPPY BIRD
Keywords:
Q Learning, Reinforcement Learning, Ucb, Softmax, Flappy BirdAbstract
Training agents is an intriguing research topic due to human limitations in maintaining consistent performance, particularly in the game Flappy Bird. This study compares action selection methods, namely Softmax and Upper Confidence Bound (UCB), to enhance agent performance in action selection. Testing was conducted using both methods in a Flappy Bird environment based on Gymnasium. Evaluation was performed using metrics such as average score, highest score, average steps, and Q-value. The final results indicate that Softmax tends to explore early in training but achieves convergence toward the end, whereas UCB tends to exploit early, leading to stagnant scores. Based on t-test results, no significant difference was found in the performance of the two action selection methods. This study provides guidance on selecting action selection methods for agents in simple games.
References
A. Ardiansyah and E. Rainarli, “Implementasi Q-Learning dan Backpropagation pada Agen yang Memainkan Permainan Flappy Bird,” J. Nas. Tek. Elektro Dan Teknol. Inf. JNTETI, vol. 6, no. 1, Feb. 2017, doi: 10.22146/jnteti.v6i1.287.
Y. Li, “Deep Reinforcement Learning for 2D Flappy Bird Game,” SHS Web Conf., vol. 144, p. 03007, 2022, doi: 10.1051/shsconf/202214403007.
A. K. Shakya, G. Pillai, and S. Chakrabarty, “Reinforcement learning algorithms: A brief survey,” Expert Syst. Appl., vol. 231, p. 120495, Nov. 2023, doi: 10.1016/j.eswa.2023.120495.
Richard S. Sutton and Andrew G. Barto, Reinforcement Learning: An Introduction, Second edition, in progress. 2015. [Online]. Available: https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf
B. Jang, M. Kim, G. Harerimana, and J. W. Kim, “Q-Learning Algorithms: A Comprehensive Classification and Applications,” IEEE Access, vol. 7, pp. 133653–133667, 2019, doi: 10.1109/ACCESS.2019.2941229.
L. Meng, R. Gorbet, and D. Kulic, “Memory-based Deep Reinforcement Learning for POMDPs,” in 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Prague, Czech Republic: IEEE, Sep. 2021, pp. 5619–5626. doi: 10.1109/IROS51168.2021.9636140.
P. Swazinna, S. Udluft, D. Hein, and T. Runkler, “Comparing Model-free and Model-based Algorithms for Offline Reinforcement Learning,” Jan. 14, 2022, arXiv: arXiv:2201.05433. doi: 10.48550/arXiv.2201.05433.
R. S. Sutton and A. G. Barto, Reinforcement learning: an introduction, Second edition. In the Adaptive Computation and machine learning series. Cambridge, Massachusetts: The MIT Press, 2018.
R. S. Sutton, “Learning to predict by the methods of temporal differences,” Mach. Learn., vol. 3, no. 1, pp. 9–44, Aug. 1988, doi: 10.1007/BF00115009.
V. Mnih et al., “Playing Atari with Deep Reinforcement Learning,” 2013, arXiv. doi: 10.48550/ARXIV.1312.5602.
T. L. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Adv. Appl. Math., vol. 6, no. 1, pp. 4–22, Mar. 1985, doi: 10.1016/0196-8858(85)90002-8.
J. Fan, Z. Wang, Y. Xie, and Z. Yang, “A Theoretical Analysis of Deep Q-Learning,” Feb. 24, 2020, arXiv: arXiv:1901.00137. Accessed: Nov. 04, 2024. [Online]. Available: http://arxiv.org/abs/1901.00137
K. Arulkumaran, M. P. Deisenroth, M. Brundage, and A. A. Bharath, “Deep Reinforcement Learning: A Brief Survey,” IEEE Signal Process. Mag., vol. 34, no. 6, pp. 26–38, Nov. 2017, doi: 10.1109/MSP.2017.2743240.
L. Zhong, “Comparison of Q-learning and SARSA Reinforcement Learning Models on Cliff Walking Problem,” in Proceedings of the 2023 International Conference on Data Science, Advanced Algorithm and Intelligent Computing (DAI 2023), vol. 180, B. H. Ahmad, Ed., in Advances in Intelligent Systems Research, vol. 180. , Dordrecht: Atlantis Press International BV, 2024, pp. 207–213. doi: 10.2991/978-94-6463-370-2_23.
Y. Yu, X. Si, C. Hu, and J. Zhang, “A Review of Recurrent Neural Networks: LSTM Cells and Network Architectures,” Neural Comput., vol. 31, no. 7, pp. 1235–1270, Jul. 2019, doi: 10.1162/neco_a_01199.
T. Vu and L. Tran, “FlapAI Bird: Training an Agent to Play Flappy Bird Using Reinforcement Learning Techniques,” Apr. 08, 2020, arXiv: arXiv:2003.09579. Accessed: Oct. 27, 2024. [Online]. Available: http://arxiv.org/abs/2003.09579
Z. He, Y. Zhang, and D. Zhao, “Flappy Bird Game Based on Reinforcement Learning Q-Learning Algorithm,” Highlights Sci. Eng. Technol., vol. 34, pp. 222–225, Feb. 2023, doi: 10.54097/hset.v34i.5475.
K. Yang, “Using DQN and Double DQN to Play Flappy Bird,” in Proceedings of the 2022 International Conference on Artificial Intelligence, Internet and Digital Economy (ICAID 2022), vol. 7, N. Radojević, G. Xu, and D. D. H. K. H. Md Mansur, Eds., in Atlantis Highlights in Intelligent Systems, vol. 7. , Dordrecht: Atlantis Press International BV, 2023, pp. 1166–1174. doi: 10.2991/978-94-6463-010-7_120.
Y. Guo, “Enhancing Flappy Bird Performance With Q-Learning and DQN Strategies,” Highlights Sci. Eng. Technol., vol. 85, pp. 396–402, Mar. 2024, doi: 10.54097/qrded191.
Louis-Samuel Pilcer, A. Hoorelbeke, and A. D’andigne, “Playing Flappy Bird with Deep Reinforcement Learning,” 2018, doi: 10.13140/RG.2.2.13159.96165.
S. Elfwing, E. Uchibe, and K. Doya, “Sigmoid-weighted linear units for neural network function approximation in reinforcement learning,” Neural Netw., vol. 107, pp. 3–11, Nov. 2018, doi: 10.1016/j.neunet.2017.12.012.




3.png)
1.png)
1.png)
