NEWS

BAT Q-LEARNING ALGORITHM


(Received: 2016-11-30, Revised: 2017-02-01 , Accepted: 2017-02-23)
Cooperative Q-learning approach allows multiple learners to learn independently and then share their Q-values among each other using a Q-value sharing strategy. A main problem with this approach is that the solutions of the learners may not converge to optimality, because the optimal Q-values may not be found. Another problem is that some cooperative algorithms perform very well with single-task problems, but quite poorly with multi-task problems. This paper proposes a new cooperative Q-learning algorithm called the Bat Q-learning algorithm (BQ-learning) that implements a Q-value sharing strategy based on the Bat algorithm. The Bat algorithm is a powerful optimization algorithm that increases the possibility of finding the optimal Q-values by balancing between the exploration and exploitation of actions by tuning the parameters of the algorithm. The BQ-learning algorithm was tested using two problems: the shortest path problem (single-task problem) and the taxi problem (multi-task problem). The experimental results suggest that BQ-learning performs better than single-agent Q-learning and some well-known cooperative Q-learning algorithms.

[1] C. Watkins, Learning from Delayed Rewards, PhD thesis, Cambridge University, Cambridge, England, 1989. 

[2] C. Watkins and P. Dayan, "Technical Note: Q-learning," Machine Learning, vol. 8, no. 3, pp. 279-292, 1992. 

[3] B. H. Abed-alguni, S. K. Chalup, F. A. Henskens and D. J. Paul, "A Multi-agent Cooperative Reinforcement Learning Model Using a Hierarchy of Consultants, Tutors and Workers," Vietnam Journal of Computer Science, vol. 2, no. 4, pp. 213-226, 2015. 

[4] P. Kormushev, S. Calinon and D. G. Caldwell, "Reinforcement Learning in Robotics: Applications and Real-world Challenges," Robotics, vol. 2, no. 3, pp. 122-148, 2013.

[5] J. Kober, J. A. Bagnell and J. Peters, "Reinforcement Learning in Robotics: A Survey," The International Journal of Robotics Research, vol. 32, no. 11, pp. 1238-1274, 2013. 

[6] H. Iima, Y. Kuroe and S. Matsuda, "Swarm Reinforcement Learning Method Based on Ant Colony Optimization," Proc. of 2010 IEEE International Conference on Systems, Man and Cybernetics (SMC), (O. Kaynak and G. Dimirovski, eds.), pp. 1726-1733, 2010. 

[7] B. Cunningham and Y. Cao, "Non-reciprocating Sharing Methods in Cooperative Q-learning Environments," Proc. of the 2012 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, vol. 2, pp. 212-219, IEEE Computer Society, 2012. 

[8] M. Tan, "Multi-agent Reinforcement Learning: Independent vs. Cooperative Agents," Proc. of 10th International Conference on Machine Learning, vol. 337, Amherst, MA, 1993. 

[9] B. H. Abed-alguni, D. J. Paul, S. K. Chalup and F. A. Henskens, "A Comparison Study of Cooperative Q-learning Algorithms for Independent Learners," International Journal of Artificial Intelligence, vol. 14, no. 1, pp. 71-93, 2016. 

[10] H. Iima, Y. Kuroe and K. Emoto, "Swarm Reinforcement Learning Methods for Problems with Continuous State-action Space," Proc. of 2011 IEEE International Conference on Systems, Man and Cybernetics (SMC), pp. 2173-2180, 2011. 

[11] H. Iima and Y. Kuroe, "Reinforcement Learning through Interaction among Multiple Agents," Institute of Control, Automation and Systems Engineers (ICASE) and the Society of Instrument and Control Engineers (SICE) International Joint Conference, pp. 2457-2462, October 2006. 

[12] H. Iima and Y. Kuroe, "Swarm Reinforcement Learning Algorithms-Exchange of Information among Multiple Agents," The Society of Instrument and Control Engineers (SICE), 2007 Annual Conference, pp. 2779-2784, September 2007. 

[13] H. Iima and Y. Kuroe, "Swarm Reinforcement Learning Algorithms Based on SARSA Method," The Society of Instrument and Control Engineers (SICE) Annual Conference 2008, pp. 2045- 2049, August 2008. 

[14] E. Di Mario, Z. Talebpour and A. Martinoli, "A Comparison of PSO and Reinforcement Learning for Multi-robot Obstacle Avoidance," 2013 IEEE Congress on Evolutionary Computation (CEC), pp. 149-156, June 2013.

[15] B. Dğan and T. Ölmez, "A Novel State Space Representation for the Solution of 2D-HP Protein Folding Problem Using Reinforcement Learning Methods," Applied Soft Computing, vol. 26, pp. 213-223, 2015.

[16] E. Pakizeh, M. Palhang and M. Pedram, "Multi-criteria Expertness Based Cooperative Q-learning," Applied Intelligence, vol. 39, no. 1, pp. 28-40, 2013.

[17] K.-S. Hwang, W.-C. Jiang and Y.-J. Chen, "Model Learning and Knowledge Sharing for a Multi-agent System with Dyna Q-learning," IEEE Transactions on Cybernetics, vol. 45, pp. 964-976, May 2015.

[18] M. N. Ahmadabadi and M. Asadpour, "Expertness Based Cooperative Q-learning," IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, vol. 32, no. 1, pp. 66-76, 2002.

[19] M. Ahmadabadi, A. Imanipour, B. Araabi, M. Asadpour and R. Siegwart, "Knowledge-based Extraction of Area of Expertise for Cooperation in Learning," International Conference on Intelligent Robots and Systems, 2006 IEEE/RSJ, pp. 3700-3705, 2006.

[20] X.-S. Yang, "A New Metaheuristic Bat-inspired Algorithm," Conference on Nature Inspired Cooperative Strategies for Optimization (NICSO 2010), pp. 65-74, Springer, 2010.

[21] X.-S. Yang and A. Hossein Gandomi, "Bat Algorithm: A Novel Approach for Global Engineering Optimization," Engineering Computations, vol. 29, no. 5, pp. 464-483, 2012.

[22] X.-S. Yang, "Bat Algorithm for Multi-objective Optimization," International Journal of Bio-Inspired Computation, vol. 3, no. 5, pp. 267-274, 2011.

[23] J. Kennedy, "Particle Swarm Optimization," Encyclopaedia of Machine Learning, pp. 760-766, Springer, 2011. 

[24] B. Hengst, "Model Approximation for HEXQ Hierarchical Reinforcement Learning," Machine Learning: ECML 2004 (J.-F. Boulicaut, F. Esposito, F. Giannotti and D. Pedreschi, eds.), vol. 3201 of Lecture Notes in Computer Science, pp. 144-155, Springer, Berlin, Heidelberg, 2004.

[25] T. G. Dietterich, "Hierarchical Reinforcement Learning With the MAXQ Value Function Decomposition," Journal of Artificial Intelligence Research, vol. 13, no. 1, pp. 227-303, 2000.

[26] D. Andre and S. J. Russell, "State Abstraction for Programmable Reinforcement Learning Agents," AAAI/IAAI, pp. 119-125, 2002.

[27] S. Ray and T. Oates, "Locking in Returns: Speeding Up Q-learning by Scaling," Proc. European Workshop on Reinforcement Learning (EWRL), pp. 32-36, 2011.

[28] R.-S. Sutton and A.-G. Barto, Reinforcement Learning: An Introduction, Cambridge, USA: MIT Press, 1998.