Use of Reinforcement Learning as a Challenge: A Review

Rashmi Sharma; Manish Prateek; Ashok K. Sinha

Call for Paper

March Edition

IJCA solicits high quality original research papers for the upcoming March edition of the journal. The last date of research paper submission is 20 February 2026

Submit your paper

Know more

The week's pick

A Knowledge-Graph–Driven Multimodal Large Model for Semantic Understanding and Controllable Generation of Intangible Cultural Heritage

Jundi Yang Heng Yao

Random Articles

Reseach Article

Use of Reinforcement Learning as a Challenge: A Review

by Rashmi Sharma, Manish Prateek, Ashok K. Sinha

International Journal of Computer Applications

Foundation of Computer Science (FCS), NY, USA

Volume 69 - Number 22

Year of Publication: 2013

Authors: Rashmi Sharma, Manish Prateek, Ashok K. Sinha

10.5120/12105-8332

Rashmi Sharma, Manish Prateek, Ashok K. Sinha . Use of Reinforcement Learning as a Challenge: A Review. International Journal of Computer Applications. 69, 22 ( May 2013), 28-34. DOI=10.5120/12105-8332

@article{ 10.5120/12105-8332,

author = { Rashmi Sharma, Manish Prateek, Ashok K. Sinha },

title = { Use of Reinforcement Learning as a Challenge: A Review },

journal = { International Journal of Computer Applications },

issue_date = { May 2013 },

volume = { 69 },

number = { 22 },

month = { May },

year = { 2013 },

issn = { 0975-8887 },

pages = { 28-34 },

numpages = {9},

url = { https://ijcaonline.org/archives/volume69/number22/12105-8332/ },

doi = { 10.5120/12105-8332 },

publisher = {Foundation of Computer Science (FCS), NY, USA},

address = {New York, USA}

}

%0 Journal Article

%1 2024-02-06T21:31:02.859975+05:30

%A Rashmi Sharma

%A Manish Prateek

%A Ashok K. Sinha

%T Use of Reinforcement Learning as a Challenge: A Review

%J International Journal of Computer Applications

%@ 0975-8887

%V 69

%N 22

%P 28-34

%D 2013

%I Foundation of Computer Science (FCS), NY, USA

Abstract

Reinforcement learning has its origin from the animal learning theory. RL does not require prior knowledge but can autonomously get optional policy with the help of knowledge obtained by trial-and-error and continuously interacting with the dynamic environment. Due to its characteristics of self improving and online learning, reinforcement learning has become one of intelligent agent's core technologies. This paper gives an introduction of reinforcement learning, discusses its basic model, the optimal policies used in RL , the main reinforcement optimal policy that are used to reward the agent including model free and model based policies – Temporal difference method, Q-learning , average reward, certainty equivalent methods, Dyna , prioritized sweeping , queue Dyna . At last but not the least this paper briefly describe the applications of reinforcement leaning and some of the future research scope in Reinforcement Learning.

References

Singh S, Agents and reinforcement learning [M]. San Matco, CA, USA: Miller freeman publish Inc, 1997.
Bush R R & Mosteller F. Stochastic Models for Learning [M]. New York: Wiley, 1955.
C. Ribeiro, Reinforcement learning agent, Artificial Intelligence Review 17 (2002) 223–250.
A. Ayesh, Emotionally motivated reinforcement learning based controller, in: IEEE SMC, The Hague, The Netherlands, 2004.
S. Gadanho, Reinforcement learning in autonomous robots: an empirical investigation of the role of emotions,PhDThesis, University of Edinburgh, Edinburgh, 1999.
R. S. Sutton, A. G. Barto, Reinforcement Learning: An Introduction, MIT Press, Cambridge, MA, 1998.
L. P. Kaelbling, M. L. Littman, A. W. Moore, Reinforcement learning: a survey, Journal of Artificial Intelligence Research 4 (1996) 237–285.
Richard S. Sutton. Learning to predict by the method of temporal differences. Machine Learning, 3(1):9-44, 1988.
Christopher J. C. H. Watkins. Learning from Delayed Rewards. PhD thesis, King's College, Cambridge, UK, 1989.
Christopher J. C. H. Watkins and Peter Dayan. Q-learning. Machine Learning, 8(3):279-292, 1992.
Anton Schwartz. A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the Tenth International Conference on Machine Learning, pages 298-305, Amherst, Massachusetts, 1993. Morgan Kaufmann.
Sridhar Mahadevan. Average reward reinforcement learning: Foundations, algorithms, and empirical results. Machine Learning, 22(1), 1996.
Tommi Jaakkola, Satinder Pal Singh, and Michael I. Jordan. Monte-carlo reinforcement learning in non-Markovian decision problems. In G. Tesauro, D. S. Touretzky, and T. K. Leen, editors, Advances in Neural Information Processing Systems 7, Cambridge, MA, 1995. The MIT Press.
Richard S. Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the Seventh International Conference on Machine Learning, Austin, TX, 1990. Morgan Kaufmann.
Richard S. Sutton. Planning by incremental dynamic programming. In Proceedings of the Eighth International Workshop on Machine Learning, pages 353-357. Morgan Kaufmann,1991.
Tesauro, G. J. , Temporal difference learning and TD-Gammon. Commun. ACM 38, 58–68 (1995).
Nie, J. and Haykin, S. , A dynamic channel assignment policy through Q-learning. IEEE Trans. Neural Netw. 10, 1443–1455 (1999).
Beom, H. R. and Cho, H. S. , A sensor-based navigation for a mobile robot using fuzzy logic and reinforcement learning. IEEE Trans. Syst. Man Cybern. 25, 464–477 (1995).
Coelho, J. A. , Araujo, E. G. , Huber, M. , and Grupen, R. A. , Dynamical categories and control policy selection. Proceedings of IEEE International Symposium on Intelligent Control, 1998, pp. 459–464.
Witten, I. H. , The apparent conflict between estimation and control—A survey of the two-armed problem. J. Franklin Inst. 301, 161–189 (1976)
Malak, R. J. and Khosla, P. K. , A framework for the adaptive transfer of robot skill knowledge using reinforcement learning agents. Proceedings of IEEE International Conference on Robotics and Automation, 2001, pp. 1994–2001.
S. Schaal and Christopher Atkeson. Robot juggling: An implementation of memory-based learning. Control Systems Magazine, 14, 1994.
Sridhar Mahadevan and Jonathan Connell. Automatic programming of behavior-based robots using reinforcement learning. In Proceedings of the Ninth National Conference on Artificial Intelligence, Anaheim, CA, 1991.
Michael L. Littman. Markov games as a framework for multi-agent reinforcement learning. In Proceedings of the Eleventh International Conference on Machine Learning, pages 157-163, San Francisco, CA, 1994. Morgan Kaufmann.
R. H. Crites and A. G. Barto. Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, and M. Hasselmo, editors, Neural Information Processing Systems 8, 1996
Yong Duan, Qiang Liu, XinHe Xu : Application of reinforcement learning in robot soccer Engineering Applications of Artificial Intelligence 20 (2007) 936–950
Wang Qiang, Zhan Zhongli Reinforcement Learning Model, Algorithms and its application International Conference on Mechatronic Science, Electric Engineering and Computer , August 19-22 , 2011 Jilin China
Gary G. Yen, Travis W. Hickey Reinforcement learning algorithms for robotic navigation in dynamic environments ISA transactions 43 (2004) 217-230.
Maryam Shokri, Knowledge of opposite actions for reinforcement learning Elsevier applied soft computing 11 (2011) 4097-4109.
Prasad Tadepalli, DoKyeong Ok: Model-based average reward reinforcement learning Elsevier Artificial Intelligence 100 (1998) 177-224
Abhijit Gosavi : A Tutorial for Reinforcement Learning March 8, 2013
Soumya Ray, Prasad Tadepalli : Model-Based Reinforcement Learning July 10, 2009

Index Terms

Computer Science

Information Sciences

Keywords

Reinforcement Learning Q-Learning temporal difference robot control