# This is an example that appears in Lonnie Chrisman's paper " # Reinforcement Learning with Perceptual Aliasing: The # Perceptual Distinctions Approach", AAAI-92 The actual values # were sent to Michael from Lonnie via email and taken directly # from Lonnie's code. # LRV - least recently visited, MRV - most recently visited # Backin up while docked has no effect (except to change LRV to MRV) # Turning around while docked, leaves you in front of station, facing it discount: 0.95 values: reward states: 8 # 0 Docked in LRV # 1 Just outside space station MRV, front of ship facing station # 2 Space facing LRV # 3 Just outside space station LRV, back of ship facing station # 4 Just outside space station MRV, back of ship facing station # 5 Space, facing LRV # 6 Just outside space station LRV, front of ship facing station # 7 Docked in MRV actions: TurnAround GoForward Backup observations: 5 # 0 See LRV forward # 1 See MRV forward # 2 See that we are docked in MRV # 3 See nothing # 4 See that we are docked in LRV start: 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 T: TurnAround 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 T: GoForward 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 T: Backup 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.4 0.3 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.1 0.8 0.0 0.0 0.1 0.0 0.7 0.0 0.0 0.3 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.0 0.7 0.0 0.1 0.0 0.0 0.8 0.1 0.0 0.0 0.0 0.0 0.0 0.3 0.0 0.3 0.4 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 O: * 0.0 0.0 0.0 0.0 1.0 0.0 1.0 0.0 0.0 0.0 0.0 0.7 0.0 0.3 0.0 0.0 0.0 0.0 1.0 0.0 0.0 0.0 0.0 1.0 0.0 0.7 0.0 0.0 0.3 0.0 1.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 0.0 R: GoForward : 1 : 1 : * -3 # R: GoForward : 7 : 6 : * -3 # What Chrisman specifies R: GoForward : 6 : 6 : * -3 # What I think it should be R: Backup : 3 : 0 : * 10