|Partially Observable Markov Decision Processes|
Command Line Arguments (v. 5.0 and earlier)
|The command line options below pertain only to pre-5.2 versions. As of the 5.2 version, command line options (and configuration file options) are represented in an XML format. This is still a rough page, but see: Version 5.2 Options.|
There are a lot of command line options to the program which break down roughly into a number of categories:
appears anywhere on the command line, then you get the full list of all available options, grouped in the categorizes shown above. The command line arguments are described below in alphabetical order.
Some algorithms can save some LPs by initializing the set of vectors it produces by generating vectors for a set of random points. This argument specifies how many random points to use in this initialization. The default is zero.
The POMDP parameter file specified with the '-p' argument has a discount factor in it. If this argument is used, the given value will over-ride the one in that file. Default is not to override and valid values are between zero and one.
The single most important contribution to reducing the number of LPs required is to use a sinple componentwise domination check. Anywhere a set of vectors is pruned down from another set, you have the option of doing this domination check. The default is true.
Must be used in conjunction with the '-vi_variation adjustable_epsilon' argument. For the automatic epsilon adjustment, you need to tell the algorithm at what point to stop adjusting the epsilon. This defines the limits on the epsilon precision that is used. The default value is 1e-3.
Must be used in conjunction with the '-method enum' argument. This defines what type of purging is used to reduce a set of vectors down to its parsimonious representation. The 'none' option means that no pruning is done and the vectors are passed to the next value iteration epoch. The 'domonly' option uses only simple component-wise vector domination checks to prune the set. The default is 'normal_prune' which is a domination check followed by using the LP-based Lark/White pruning algorithm. Finally, 'epsilon_prune' uses an algorithm which approximates the set to within epsilon of the true set.
This argument defines the precision with which the pruning operations occur. The exception is the epsilon pruning algorithm which uses the argument '-prune_epsilon' to define its tolerance. The default is 1e-9.
Must be used in conjunction with the '-vi_variation adjustable_epsilon' argument. For the automatic epsilon adjustment, you need to tell the algorithm how the epsilon should be adjusted. This is the adjustment factor which is used to divide the current epsilon. The default is '10', which reduces epsilon to 1/10 its value every time it is adjusted.
Must be used in conjunction with the '-vi_variation adjustable_epsilon' argument. For the automatic epsilon adjustment, you need to tell the algorithm when to adjust the epsilon. This is a function of looking at the last N epochs and seeing if the differenceqs in size are within 'delta' of each other. If so, then the epsilon is changed. This defines the 'delta' value and its default value is '3'.
Must be used in conjunction with the '-vi_variation adjustable_epsilon' argument. For the automatic epsilon adjustment, you need to tell the algorithm when to adjust the epsilon. This is a function of looking at the last N epochs and seeing if the differences in size are within 'delta' of each other. If so, then the epsilon is changed. This defines the 'N' parameter and its default value is '5'.
Allows you to specify how many iterations (epochs) of value iteration to run, thereby making it a finite horizon solution. If this value is less than zero, then an infinite horizon solution is assumed. The default value for this is '-1', i.e., inifnite horizon.
Must be used in conjunction with the '-method incprune' argument. The Incremental Pruning algorithm has a basic normal form and a more generalized form. Somewhere in between is the restricted region variation. The normal one is conceptually and computationally much simpler, but the genralized form has better theoretical properties. This option selects which version of incremental pruning to use and the default is 'normal'.
Without this argument, value iteration assumes that the terminal values (or initial values, depending on which side you view the iteration from) are all zero. You can specify different terminal values as a piecewise linear set of vectors that are in the specified file. The format of this file needs to be the same as what the program outputs for the value functions answers. Thus the answer to one run of the program can be used as the input to another, essentially continuing the work from the place that it was left off. It currently requires a file in a very specific, somewhat brittle format described in 'alpha-file-spec.txt'.
It sets the precision of the LP solver. Currently this option does nothing for non-CPLEX versions. For CPLEX versions is sets some internal precision parameters, but I do not really know what affect it has.
This argument will result in the program adjusting the precision to try to attain a set of vectors close to this value. (Currently unimplemented.)
Limits the amount of memory used by the program to this number of bytes. Default is to have unlimited memory (up to virtual memory size.) Currently this is untested on non-Sun Solaris OSes.
Defines which algorithm to use to solve the POMDP. The 'enum' option is the exhaustive enumeration methods as first described by Sondik in his 1971 thesis and later discussed by Monahan in his 1982 survey paper. The 'twopass' is described by Sondik in his thesis as well. The 'linsup' is the linear support algorithm developed by Cheng in his 1988 thesis. The linear support is only currently supported for CPLEX versions and the new code is untested at this time. The 'witness' algorithm was developed by Littman, Cassandra and Kaelbling and is described in Littman's (1996) and Cassandra's (1998) theses. The 'incprune' is incremental pruning first developed by Zhang and Liu in 1996 and refined and genralized in a UAI'97 paper by Cassandra, Littman and Zhang. The default algorithm is 'incprune'.
The result of solving a POMDP is a set of vectors representing the value function and also a policy graph file mapping the solution to the vectors of the previous solution. These files are suffixed with '.alpha' and '.pg' respectively and use the prefix specified here. The format of these files are described in the files 'alpha-file-spec.txt' and 'pg-file-spec.txt'. The default output file prefix is 'solution'.
This is the filename of the POMDP problem to be solved. Usually this ends in the suffix '.POMDP'. The POMDP file must conform to the specific syntax and semantics, which is described in 'pomdp-file-spec.txt'. This parameter is required.
The first step of a single DP iteration does a projection of the previous iterations value function. After this projection, so of the resulting vectors might be useless and can be purged away. This option says how to purge these projection sets.
This goes with the '-vi_variation epsilon_prune' arguemnt and defines what level of precision to use in the approximation. The default value is 1e-3.
The pruning of sets of vectors can save a few LPs by having a set of point from which to initialize the process. One way to generate these point is randomly. This argument defines how many random point to use for the initialization of the prune operations. The default value is zero.
The incremental pruning, witness and two pass algorithms solve the problem one action at a time and then merge the results into a set (called the Q value functions). This merged set can then be pruned to a parsimonious set. This argument defines how the pruning of this set should be done.
To make the execution reproducible, the rand seed used by the program can be set using this argument. The program will print out the random seed to start with so this can be used as the input. This will only affect runs which use random numbers (e.g., using random point initializations.) If this argument is not used, it initializes to some (pseudo) random seed.
Normally, only the value function and policy graph solution of the last iteration is saved in a file. However, with this command line flag you can have the program save every iterations' solution. The files will have the prefix (either the default 'solution' or that specified with the '-o' option) and the '.alpha' or '.pg' suffix with the epoch number appended to that name.
Must be used in conjunction with the '-vi_variation adjustable_epsilon' argument. For the automatic epsilon adjustment, you need to tell the algorithm when to adjust the epsilon. This is a function of looking at the last N epochs and seeing if the differenceqs in size are within 'delta' of each other. If so, then the epsilon is changed. This argument defines the the value of epsilon for the first ietration.
As the algorithm progresses through the epochs, it maintain information about each epoch such as the size of the solution, the times for each aspect of the program and the actual epsilon from optimal. With this command line flag, when the program finishes you get a summary of the execution in a tabular format.
Typically the program prints out some trace of the execution as it solves the problem. The amount that gets printed out depends upon the verboseness set (the '-verbose' command line argument). All of this information is, by default, written to stdout. If you want to write it to a file, use this option to give the name of that file.
When the finite horizon is run for longer and longer epochs, the value function approaches the infinite horizon value function (assuming a discount factor of less than 1.0 is used.) When solving the infinite horizon version of the problem, you need to define when the iteration has converged. Since the finite precision of the machine makes the real true exact solution nearly impossible to get, we use one of two different stopping criteria to terminate with an answer close to the true infinite horizon answer. There are currently two stopping criteria. The 'exact' criteria requires that successive iterations match identicalls, vector for vector and component for component. The 'weak' option is a little more flexible and just ensures that the two sets do not differ more than some 'delta' in value from each other. This delta is specified with the '-delta' argument. This is optional and the default value is 'weak'.
When this option is specified, the output of the program becomes very terse, listing only a single line for each epoch. The exact format depends on the method being used to solve the pomdp. They are as follows:
linsup: NV V T
enum: NE ND V T
witness/incprune: Q1 Q2 ... QN V T
NV - number of vertices enumerated in linear support algorithm.
V - number of vectors in current value function
T - time required to solve for this epoch
NE - number of vectors enumerated with enumeration algorithm.
ND - number of vectors dominated with enumeration algorithm.
QX - number of vectors in the Q functions for action X (witness and incprune algorithms only.)
This option is good for use with scripts that compile the data, since it presents an easy to parse output. The '-stat_summary' makes this a bit unnecessary since it provides the relevant data in a sparse format
Limits the amount of time used by the program to this number of seconds. (Currently untested on non-Sun Solaris OSes.) This only guarantees a minimal time, since the time-out signal generated might take a while to take effect. The program prints out a message about timing out and the total execution time, which will be slightly more than the preset limit.
The program offers the generation of more output which is roughly divided into a bunch of classes roughly corresponding to the differnt source files. To turn on a more verbose reporting of the internal operation of these differnt aspects of the program, you use this command line option. If you want to specify multiple verbose arguments, then use a comma delimited list of the ones form the list above. Note that on the command line, there can be *no* spaces on either side of the commas (or anywhere else in the list of verbose arguments).
Although the basic dynamic programming (DP) algorithm for POMDPs is used (i.e., value iteration), the program offers a few variations of how this proceeds. The 'normal' option means just that. The 'adjustable_epsilon' uses a simple algorithm for adjusting the precision of the solution as the number of DP stages increase. Essentially, its purpose is to start at a coarse aproximation, and when it looks like that solution is settling, make the next stage's solution more precise. The user has control over a few of the parameters of this algorithm. See the command line arguments:
Finally, an unimplemented feature is the '-fixed_soln_size' argument. This will eventually be a simple algorithm that tries to adjust the solution precision to keep the solution size a constant size. A sort of way to answer the question: "How precise a solution can be achieved with N vectors?"
When solving the infinite horizon version of the problem, you need to define when the iteration has converged. There are currently two stopping criteria (see the '-stop_criteria' option). For the 'weak' stopping criteria is just ensures that the two sets do not differ more than some 'delta' in value from each other. This argument defines what that delta is and has default value 1e-9.
There are a number of unimplemented optimizations which will be added to the code concerning the use of "witness" points to save from soing a certain amount of LPs. Essentially, you can use points generated from LPs to avoid future LPs, if you save the LP results the right way. The magic to save the points is there in the code, but not the magic to actually use the points.
|© 2003-2005, Anthony R. Cassandra|