Conclusions
[CaR11] Canbolat, P. G., and Rothblum, U. G., 2011. “(Approximate) Iterated Successive Approximations Algo-
rithm for Sequential Decision Processes,” Technical Report, The Technion - Israel Institute of Technology; Annals of
Operations Research, to appear.
[Den67] Denardo, E. V., 1967. “Contraction Mappings in the Theory Underlying Dynamic Programming,” SIAM
Review, Vol. 9, pp. 165-177.
[Har72] Harrison, J. M., 1972. “Discrete Dynamic Programming with Unbounded Rewards,” Ann. Math. Stat., Vol.
43, pp. 636-644.
[Lip73] Lippman, S. A., 1973. “Semi-Markov Decision Processes with Unbounded Rewards,” Management Sci., Vol.
21, pp. 717-731.
[Lip75] Lippman, S. A., 1975. “On Dynamic Programming with Unbounded Rewards,” Management Sci., Vol. 19,
pp. 1225-1233.
[Put94] Puterman, M. L., 1994. Markov Decision Processes: Discrete Stochastic Dynamic Programming, J. Wiley,
N.Y.
[Rot79] Rothblum, U. G., 1979. “Iterated Successive Approximation for Sequential Decision Processes,” in Stochastic
Control and Optimization, by J. W. B. van Overhagen and H. C. Tijms (eds), Vrije University, Amsterdam.
[Sch11] Scherrer, B., 2011. “Performance Bounds for λ-Policy Iteration and Application to the Game of Tetris,”
INRIA Lorraine Report, France.
[Sch12] Scherrer, B., 2012. “On the Use of Non-Stationary Policies for Infinite-Horizon Discounted Markov Decision
Processes,” INRIA Lorraine Report, France.
[ThS10a] Thiery, C., and Scherrer, B., 2010. “Least-Squares λ-Policy Iteration: Bias-Variance Trade-off in Control
Problems,” in ICML’10: Proc. of the 27th Annual International Conf. on Machine Learning.
[ThS10b] Thiery, C., and Scherrer, B., 2010. “Performance Bound for Approximate Optimistic Policy Iteration,”
Technical Report, INRIA.
[Tse90] Tseng, P., 1990. “Solving H-Horizon, Stationary Markov Decision Problems in Time Proportional to log(H),”
Operations Research Letters, Vol. 9, pp. 287-297.
[Vei69] Veinott, A. F., Jr., 1969. “Discrete Dynamic Programming with Sensitive Discount Optimality Criteria,” Ann.
Math. Statist., Vol. 40, pp. 1635-1660.
[VeP84] Verd’u, S., and Poor, H. V., 1984. “Backward, Forward, and Backward-Forward Dynamic Programming
Models under Commutativity Conditions,” Proc. 1984 IEEE Decision and Control Conference, Las Vegas, NE, pp.
1081-1086.
[VeP87] Verd’u, S., and Poor, H. V., 1987. “Abstract Dynamic Programming Models under Commutativity Condi-
tions,” SIAM J. on Control and Optimization, Vol. 25, pp. 990-1006.
[WiB93] Williams, R. J., and Baird, L. C., 1993. “Analysis of Some Incremental Variants of Policy Iteration: First
Steps Toward Understanding Actor-Critic Learning Systems,” Report NU-CCS-93-11, College of Computer Science,
Northeastern University, Boston, MA.
[YuB11] Yu, H., and Bertsekas, D. P., 2011. “Q-Learning and Policy Iteration Algorithms for Stochastic Shortest
Path Problems,” Lab. for Information and Decision Systems Report LIDS-P-2871, MIT; to appear in Annals of OR.
[YuB12] Yu, H., and Bertsekas, D. P., 2012. “Weighted Bellman Eqations and their Applications in Dynamic Pro-
gramming,” Lab. for Information and Decision Systems Report LIDS-P-2876, MIT.
39