CFP last date
22 April 2024
Reseach Article

A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

by Sanaa Chafik, Cherki Daoui
International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Volume 133 - Number 10
Year of Publication: 2016
Authors: Sanaa Chafik, Cherki Daoui
10.5120/ijca2016908033

Sanaa Chafik, Cherki Daoui . A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes. International Journal of Computer Applications. 133, 10 ( January 2016), 28-33. DOI=10.5120/ijca2016908033

@article{ 10.5120/ijca2016908033,
author = { Sanaa Chafik, Cherki Daoui },
title = { A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes },
journal = { International Journal of Computer Applications },
issue_date = { January 2016 },
volume = { 133 },
number = { 10 },
month = { January },
year = { 2016 },
issn = { 0975-8887 },
pages = { 28-33 },
numpages = {9},
url = { https://ijcaonline.org/archives/volume133/number10/23823-2016908033/ },
doi = { 10.5120/ijca2016908033 },
publisher = {Foundation of Computer Science (FCS), NY, USA},
address = {New York, USA}
}
%0 Journal Article
%1 2024-02-06T23:30:49.018903+05:30
%A Sanaa Chafik
%A Cherki Daoui
%T A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes
%J International Journal of Computer Applications
%@ 0975-8887
%V 133
%N 10
%P 28-33
%D 2016
%I Foundation of Computer Science (FCS), NY, USA
Abstract

The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows linearly with the state space size, which makes them frequently intractable. This paper presents a Modified Policy Iteration algorithm to compute an optimal policy for large Markov decision processes in the discounted reward criteria and under infinite horizon. The idea of this algorithm is based on the topology of the problem; moreover, an Open Multi-Processing (Open-MP) programming model is applied to attain efficient parallel performance in solving the Modified algorithm.

References
  1. Alagoz, O., Maillart, L. M., Schaefer, A. J., & Roberts, M. S. 2007. Determining the acceptance of cadaveric livers using an implicit model of the waiting list. Operations Research, 55(1), 24-36.
  2. Bellman, R. 1957. Dynamic Programming. Princeton University Press.
  3. Benjaafar, S., & ElHafsi, M. 2006. Production and inventory control of a single product assemble-to-order system with multiple customer classes.Management Science, 52(12), 1896-1912.
  4. Chafik, S., & Daoui, C. 2015. A Modified Value Iteration Algorithm for Discounted Markov Decision Processes. Journal of Electronic Commerce in Organizations (JECO), 13(3), 47-57.
  5. Daoui, C., Abbad, M., & Tkiouat, M. 2010. Exact decomposition approaches for Markov decision processes: A survey. Advances in Operations Research,2010.
  6. Dean, T., & Lin, S. H. 1995. Decomposition techniques for planning in stochastic domains. In IJCAI (Vol. 2, p. 3).
  7. Galoppo, N., Govindaraju, N. K., Henson, M., & Manocha, D. 2005. LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing (p. 3). IEEE Computer Society.
  8. Goto, J. H. 1999. A Markov decision process model for airline meal provisioning (Doctoral dissertation, University of British Columbia).
  9. Lewis, M. E., Ayhan, H., & Foley, R. D. 2002. Bias optimal admission control policies for a multiclass nonstationary queueing system. Journal of Applied Probability, 20-37.
  10. Munos, R. Introduction à l’apprentissage par renforcement, from http://researchers.lille.inria.fr/ ∼ munos/master-mva/.
  11. Papadimitriou, C. H., & Tsitsiklis, J. N. 1987. The complexity of Markov decision processes. Mathematics of operations research, 12(3), 441-450.
  12. Pavitsos, A., & Kyriakidis, E. G. 2009. Markov decision models for the optimal maintenance of a production unit with an upstream buffer. Computers & Operations Research, 36(6), 1993-2006.
  13. PDMIA, G. (2008). Processus décisionnels de Markov en intelligence artificielle. Edité par Olivier Buffet et Olivier Sigaud, 1.
  14. Preux, P. (2008). Apprentissage par renforcement, GRAPA.
  15. Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
  16. Sato, M. (2002, October). OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors. In Proceedings of the 15th international symposium on System Synthesis (pp. 109-111). ACM.
Index Terms

Computer Science
Information Sciences

Keywords

Markov Decision Processe Discounted reward criterion Policy Iteration algorithm Open Multi-Processing shared memory Parallelizing.