POMDP-Solver/input-format.txt at master · preet021/POMDP-Solver · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
* WHAT IS POMDP?

	POMDP is a seven tuple: (S, A, T, R, O, _||_, d).
	S: set of states
	A: set of actions
	T: transition function...gives Pr(s'| a, s)
	R: reward function...gives R(s, a)
	O: set of observations
	_||_: gives Pr(o | a, s')
	d: discount factor


* INPUT FORMAT

	1. discount: %f
	2. states: <space seperated list of states>
	3. actions: <space seperated list of actions>
	4. observations: <space seperated list of observations>
	5. T : <action> : <startState> : <end-state> %f
	6. O : <action> : <end-state> : <observation> %f
	7. R : <action> : <start-state> %f
	8. start: <space seperated |S| values denoting the belief value of each state initially>. If no start state is provided then agent can be in any state with equal probability.
	Note: the above 8 parameters can be in any order.
	9. Wildcard interpretation is implemented.
	10. Lines starting with # will be treated as comments

* MEMORY REQUIREMENTS

	1. O(|S||S||A| + |O||A||S|) for storing the POMDP + run-time requirements.