CFP last date
21 October 2024
Reseach Article

Exploration and Exploitation Tradeoff using Fuzzy Reinforcement Learning

by Seyed Mohammad Hossein Nabavi, Somayeh Hajforoosh
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 59 - Number 5
Year of Publication: 2012
Authors: Seyed Mohammad Hossein Nabavi, Somayeh Hajforoosh
10.5120/9545-3994

Seyed Mohammad Hossein Nabavi, Somayeh Hajforoosh . Exploration and Exploitation Tradeoff using Fuzzy Reinforcement Learning. International Journal of Computer Applications. 59, 5 ( December 2012), 26-31. DOI=10.5120/9545-3994

@article{ 10.5120/9545-3994,
author = { Seyed Mohammad Hossein Nabavi, Somayeh Hajforoosh },
title = { Exploration and Exploitation Tradeoff using Fuzzy Reinforcement Learning },
journal = { International Journal of Computer Applications },
issue_date = { December 2012 },
volume = { 59 },
number = { 5 },
month = { December },
year = { 2012 },
issn = { 0975-8887 },
pages = { 26-31 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume59/number5/9545-3994/ },
doi = { 10.5120/9545-3994 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T21:05:21.785783+05:30
%A Seyed Mohammad Hossein Nabavi
%A Somayeh Hajforoosh
%T Exploration and Exploitation Tradeoff using Fuzzy Reinforcement Learning
%J International Journal of Computer Applications
%@ 0975-8887
%V 59
%N 5
%P 26-31
%D 2012
%I Foundation of Computer Science (FCS), NY, USA
Abstract

Difficulty of making a balance between exploration and exploitation in multiagent environment is a dilemma that does not have a clear answer and there are still different methods for investigation of this problem that all refer to it. In this paper, we provide a method based on fuzzy variables for making exploration and exploitation in multiagent environment. In this method, an effective agent (? in ?-greedy method) is obtained which is updated using fuzzy variables in each step to manage tradeoff between exploration and exploitation. The proposed algorithm is investigated for determination an optimized path in the Grid World. In this method, agents effort to reach locations with a highest gain in a cooperative environment. Outcomes of the suggested fuzzy based algorithm compared with the results by conventional ?-greedy method. In addition, quality improvement of interaction between exploration and exploitation is discussed.

References
  1. Weiss, Gerhard. Multiagent Systems: A Modern Approach toDistributed Modern Approach to Artificial Intelligence. London :MIT Press, 1999.
  2. Russell, Stuart J. and Norvig, Peter. , Artificial Intelligence: A modern approach (2nd Edition). Englewood Cliffs, New Jersey : Prentice Hall, 2003.
  3. Stone, P. and Veloso, M. , "Multiagent systems: A survey from the machine learning perspective. " Auton. Robots, vol. 8, no. 3, pp. 345–383, 2000.
  4. Busoniu, Lucian, Babuska, Robert and Schutter, Bart De. , "A Comprehensive Survey of Multiagent Reinforcement Learning. ", IEEE Transaction on Systems, Man, and Cybernetics,Part C:Applications and Reviews, Vol. 38, No. 2,, pp. 156-172, 2008.
  5. Filar, Jerzy and Vrieze, Koos. , Competitive Markov Decision Process. s. l. : Springer-Verlag, 1997.
  6. Hu, J. and Wellman, P. , "Multiagent reinforcement learning:Theoretical framework and an algorithm. " In Proceedings of the Fifteenth International Conference on Machine Learning. pp. 242– 250, 1998.
  7. Kononen, Ville. , "Asymmetric multiagent reinforcement learning. " Web Intelligence and Agent Systems: An international journal, pp. 105–121, 2004.
  8. Wang, X. and Sandholm, T. , "Reinforcement learning to play an optimal Nash equilibrium in team Markov games. " Vancouver, Canada : Adv. Neural Inf. Process. Syst. (NIPS-02). pp. 1571–1578, 2002.
  9. l. Panait and S. Luke, "Cooperative multiagent learning: The state of the art," Autonomous Agents Multiagent Systems, vol. 11, no. 3, p-387-434,2005.
  10. M, A, Potter and K. A. D. Jong, " A cooperative coevolutionary approach to function optimization," ,Jerusalem, Israel,1994.
  11. S. G. Ficici and J. B. Pollack, "A game-theoretic approach to the simple coevolutionary algorithm," , Paris,France,2000.
  12. Lucian Busoniu, Robert Babuska, and Bart De Schutter, "A Comprehensive Survey of Multiagent Reinforcement Learning," IEEE Transaction on Systems, Man, and Cybernetics,Part C: Applications and Reviews, Vol. 38, No. 2, pp. 156-172, 2008.
  13. Michael Lederman Littman, "Algorithms for Sequential Decision Making," Providence, Rhode Island, 1996.
  14. J. Hu and P. Wellman, "Multiagent reinforcement learning: Theoretical framework and an algorithm," in In Proceedings of the Fifteenth International Conference on Machine Learning, 1998, p. 242–250.
  15. Ali Akramizadeh, Ahmad Afshar, and Mohammad –B. Menhaj, "Different Forms of the Games in Multiagent Reinforcement learning: Alternating vs. simultanous movements," in 17th Mediterranean Conference on Control and Automation, Thessaloniki, Greece, 2009.
  16. M. L. Littman, "Friend-or-foe Q-learning in general-sum games," , 2001.
  17. M. Guo, Y. Liu, J. Malec, A new Q-learning algorithm based on the metropolis criterion, IEEE Trans. Systems Man Cybernet. B 34 (5) (2004) 2140–2143.
  18. R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.
  19. C. Zhou, Q. Meng, Dynamic balance of a biped robot using fuzzy reinforcement learning agents, Fuzzy Sets and Systems 134 (1) (2003) 169–187.
  20. F. Saadatjou, V. Derhami, V. Majd, Balance of exploration and Exploitation in deterministic and stochastic environment in reinforcement learning, in: 11th Annu. Computer Society of Iran Computer Conf. , Tehran, Iran, 2006, pp. 492–498.
  21. G. Yan, T. Hickey, Reinforcement learning algorithms for robotic navigation in dynamic environment, in: IEEE Internat. Conf. on Neural Network, 2002, pp. 1444–1449.
  22. G. Yen, F. Yang, T. Hickey, M. Goldstein, Coordination of exploration and exploitation in a dynamic environment, in: IEEE Internat. Conf. on Neural Networks, 2001, pp. 1014–1018.
  23. H. R. Berenji, D. Vengerov, A convergent actor-critic-based FRL algorithm with application to power management of wireless transmitters, IEEE Trans. Fuzzy Systems 11 (4) (2003) 478–485.
  24. C. -K. Lin, A reinforcement learning adaptive fuzzy controller for robot, Fuzzy Sets and Systems 137 (3) (2003) 339–352.
Index Terms

Computer Science
Information Sciences

Keywords

Reinforcement learning Multiagent environment Balance between exploration and exploitation Q-Learning