Call for Paper - January 2024 Edition
IJCA solicits original research papers for the January 2024 Edition. Last date of manuscript submission is December 20, 2023. Read More

A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes

International Journal of Computer Applications
Foundation of Computer Science (FCS), NY, USA
Year of Publication: 2016
Sanaa Chafik, Cherki Daoui

Sanaa Chafik and Cherki Daoui. Article: A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes. International Journal of Computer Applications 133(10):28-33, January 2016. Published by Foundation of Computer Science (FCS), NY, USA. BibTeX

	author = {Sanaa Chafik and Cherki Daoui},
	title = {Article: A Modified Policy Iteration Algorithm for Discounted Reward Markov Decision Processes},
	journal = {International Journal of Computer Applications},
	year = {2016},
	volume = {133},
	number = {10},
	pages = {28-33},
	month = {January},
	note = {Published by Foundation of Computer Science (FCS), NY, USA}


The running time of the classical algorithms of the Markov Decision Process (MDP) typically grows linearly with the state space size, which makes them frequently intractable. This paper presents a Modified Policy Iteration algorithm to compute an optimal policy for large Markov decision processes in the discounted reward criteria and under infinite horizon. The idea of this algorithm is based on the topology of the problem; moreover, an Open Multi-Processing (Open-MP) programming model is applied to attain efficient parallel performance in solving the Modified algorithm.


  1. Aurélie, B. 2006. Une contribution à la résolution des processus décisionnels de Markov décentralisés avec contraintes temporelles. Caen university. French.
  2. Alagoz, O., Maillart, L. M., Schaefer, A. J., & Roberts, M. S. 2007. Determining the acceptance of cadaveric livers using an implicit model of the waiting list. Operations Research, 55(1), 24-36.
  3. Bellman, R. 1957. Dynamic Programming. Princeton University Press.
  4. Benjaafar, S., & ElHafsi, M. 2006. Production and inventory control of a single product assemble-to-order system with multiple customer classes.Management Science, 52(12), 1896-1912.
  5. Chafik, S., & Daoui, C. 2015. A Modified Value Iteration Algorithm for Discounted Markov Decision Processes. Journal of Electronic Commerce in Organizations (JECO), 13(3), 47-57.
  6. Daoui, C., Abbad, M., & Tkiouat, M. 2010. Exact decomposition approaches for Markov decision processes: A survey. Advances in Operations Research,2010.
  7. Dean, T., & Lin, S. H. 1995. Decomposition techniques for planning in stochastic domains. In IJCAI (Vol. 2, p. 3).
  8. Galoppo, N., Govindaraju, N. K., Henson, M., & Manocha, D. 2005. LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware. In Proceedings of the 2005 ACM/IEEE conference on Supercomputing (p. 3). IEEE Computer Society.
  9. Goto, J. H. 1999. A Markov decision process model for airline meal provisioning (Doctoral dissertation, University of British Columbia).
  10. Lewis, M. E., Ayhan, H., & Foley, R. D. 2002. Bias optimal admission control policies for a multiclass nonstationary queueing system. Journal of Applied Probability, 20-37.
  11. Munos, R. Introduction à l’apprentissage par renforcement, from ∼ munos/master-mva/.
  12. Papadimitriou, C. H., & Tsitsiklis, J. N. 1987. The complexity of Markov decision processes. Mathematics of operations research, 12(3), 441-450.
  13. Pavitsos, A., & Kyriakidis, E. G. 2009. Markov decision models for the optimal maintenance of a production unit with an upstream buffer. Computers & Operations Research, 36(6), 1993-2006.
  14. PDMIA, G. (2008). Processus décisionnels de Markov en intelligence artificielle. Edité par Olivier Buffet et Olivier Sigaud, 1.
  15. Preux, P. (2008). Apprentissage par renforcement, GRAPA.
  16. Puterman, M. L. (2014). Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
  17. Sato, M. (2002, October). OpenMP: parallel programming API for shared memory multiprocessors and on-chip multiprocessors. In Proceedings of the 15th international symposium on System Synthesis (pp. 109-111). ACM.


Markov Decision Processe; Discounted reward criterion; Policy Iteration algorithm; Open Multi-Processing; shared memory; Parallelizing.