TY - GEN
T1 - Stochastic enforced hill-climbing
AU - Wu, Jia Hong
AU - Kalyanam, Rajesh
AU - Givan, Robert
PY - 2008
Y1 - 2008
N2 - Enforced hill-climbing is an effective deterministic hillclimbing technique that deals with local optima using breadth-first search (a process called "basin flooding"). We propose and evaluate a stochastic generalization of enforced hill-climbing for online use in goal-oriented probabilistic planning problems. We assume a provided heuristic function estimating expected cost to the goal with flaws such as local optima and plateaus that thwart straightforward greedy action choice. While breadth-first search is effective in exploring basins around local optima in deterministic problems, for stochastic problems we dynamically build and solve a local Markov-decision process model of the basin in order to find a good escape policy exiting the local optimum. We evaluate our proposal in a wide range of recent probabilistic planning-competition benchmark domains. For evaluation, we show that stochastic enforced hill-climbing produces better policies than greedy action choice for value functions derived in two very different ways. First, we propose a novel heuristic function derived from the ideas in the effective re-planner FF-Replan. This new "controlled-randomness FF heuristic" is the deterministic FF heuristic computed on the simple determinization of the probabilistic problem that makes available a deterministic transition wherever a probabilistic transition was possible. Our results show that stochastic enforced hill-climbing with this heuristic significantly outperforms simply greedily following the heuristic, and also substantially outperforms FF-Replan. We additionally evaluate our technique on automatically learned value functions that on their own perform at the state-of-the-art when used to construct a greedy policy, and again show significant improvement over greedy action selection.
AB - Enforced hill-climbing is an effective deterministic hillclimbing technique that deals with local optima using breadth-first search (a process called "basin flooding"). We propose and evaluate a stochastic generalization of enforced hill-climbing for online use in goal-oriented probabilistic planning problems. We assume a provided heuristic function estimating expected cost to the goal with flaws such as local optima and plateaus that thwart straightforward greedy action choice. While breadth-first search is effective in exploring basins around local optima in deterministic problems, for stochastic problems we dynamically build and solve a local Markov-decision process model of the basin in order to find a good escape policy exiting the local optimum. We evaluate our proposal in a wide range of recent probabilistic planning-competition benchmark domains. For evaluation, we show that stochastic enforced hill-climbing produces better policies than greedy action choice for value functions derived in two very different ways. First, we propose a novel heuristic function derived from the ideas in the effective re-planner FF-Replan. This new "controlled-randomness FF heuristic" is the deterministic FF heuristic computed on the simple determinization of the probabilistic problem that makes available a deterministic transition wherever a probabilistic transition was possible. Our results show that stochastic enforced hill-climbing with this heuristic significantly outperforms simply greedily following the heuristic, and also substantially outperforms FF-Replan. We additionally evaluate our technique on automatically learned value functions that on their own perform at the state-of-the-art when used to construct a greedy policy, and again show significant improvement over greedy action selection.
UR - https://www.scopus.com/pages/publications/58849135844
M3 - Conference contribution
AN - SCOPUS:58849135844
SN - 9781577353867
T3 - ICAPS 2008 - Proceedings of the 18th International Conference on Automated Planning and Scheduling
SP - 396
EP - 403
BT - ICAPS 2008 - Proceedings of the 18th International Conference on Automated Planning and Scheduling
T2 - 18th International Conference on Automated Planning and Scheduling, ICAPS 2008
Y2 - 14 September 2008 through 18 September 2008
ER -