A Julia framework for solving and simulating Partially Observable Markov Decision Process (POMDP) for project end-time estimation problems.
This package models a scenario where a project has an uncertain completion time. The goal is to accurately estimate (or announce) when the project will finish based on partial observations of its progress.
-
States: Represented as tuples
(t, Ta, Tt)where:t: Current timestep in the project (from 0 to the true end time)Ta: Currently announced completion timeTt: True end time (unknown to the decision-maker)
-
Actions:
AnnounceAction(announced_time): Announce/revise the estimated completion time
-
Transitions:
- Time moves forward deterministically with each step
- The announced time (
Ta) is updated based on the chosen action
-
Observations:
- After each step, a noisy estimate of the true end time is generated
- These observations become more accurate as the project approaches completion
- Observations follow a truncated normal distribution centered on the true end time
-
Reward:
- Penalties for inaccurate announcements (proportional to the difference from true end time)
- Larger penalties for announcing impossible times (e.g., time already passed)
- Different penalties for overestimating vs. underestimating completion time
- Small ongoing costs to encourage fewer announcement changes
-
Clone this repository:
git clone https://github.com/yourusername/POMDPPlanning.git cd POMDPPlanning -
Install as a local package (optional):
using Pkg Pkg.develop(path=".")
The package includes a command-line interface for solving POMDPs and evaluating policies.
julia bin/cli.jl solve --solvers SARSOP --min-end-time 10 --max-end-time 20 --discount 0.99Options:
--solvers, -s: Solver type (SARSOP, FIB, QMDP, PBVI, POMCPOW)--min-end-time, -l: Minimum possible end time--max-end-time, -u: Maximum possible end time--discount, -d: Discount factor (between 0 and 1)--verbose, -v: Enable verbose output--debug, -D: Enable debug output--output-dir, -o: Directory for saving results--seed, -r: Random seed for reproducibility
To evaluate a previously saved policy, use the evaluate command. This command simulates the policy over a specified number of trials and generates evaluation metrics.
julia bin/cli.jl evaluate --policy-file results/policy_sarsop.jld2 --num_simulations 10 -r 0To evaluate a new policy generated by a solver, you can specify the solver type and the number of simulations:
julia bin/cli.jl evaluate --solvers SARSOP --num_simulations 10 -r 0Options:
--policy-file, -p: Path to the saved policy file (.jld2)--solvers, -s: Solver type (SARSOP, FIB, QMDP, PBVI, POMCPOW) (If you want to generate and evaluate a new policy)--true-end-time, -t: Fixed true end time for evaluation--initial-announce, -a: Initial announced time- Plus the options from the
solvecommand
Run an experiment comparing specific solvers (with fixed seed for reproducibility):
julia bin/cli.jl experiments --solvers=QMDP,SARSOP --min-end-time=10 --max-end-time=20 --num_simulations=5 --verbose -r 0Run an experiment with all solvers:
julia bin/cli.jl experiments --solvers=all --min-end-time=10 --max-end-time=20 --num_simulations=5 --verbosePOMDPPlanning/
├── bin/
│ └── cli.jl # Command-line interface entry point
├── src/
│ ├── POMDPPlanning.jl # Main module file
│ ├── problem.jl # POMDP definition
│ ├── solvers.jl # Solver implementations
│ ├── simulation.jl # Policy simulation functions
│ ├── analysis.jl # Evaluation and plotting functions
│ ├── utils.jl # Utility functions
│ └── experiments.jl # Experiment definitions
├── Project.toml # Project configuration
└── README.md # This file
- POMDPPlanning.jl: Main module that integrates all components
- problem.jl: Defines the POMDP structure with state space, transition function, observation model, and reward function
- solvers.jl: Implements different solvers including FIB, PBVI, POMCPOW, QMDP, and SARSOP
- simulation.jl: Functions for single and batch simulations of policies
- analysis.jl: Metrics calculation and visualization tools
output/
├── policy_sarsop.jld2 # Saved policy file
├── policy_sarsop.json # Policy metadata (human-readable)
├── evaluation_results.json # Detailed evaluation metrics
└── plots/ # Generated visualizations
├── reward_distribution.png
├── error_metrics.png
├── num_changes.png
└── undershoot_overshoot.png
The evaluation outputs include comprehensive metrics such as initial vs. final errors, number of announcement changes, and whether the final estimate was an undershoot or overshoot of the true end time.
The framework provides tools for visualizing and analyzing policy performance:
- Reward distribution: Histogram showing the performance across simulations
- Error metrics: Visualization of initial errors, final errors, and change magnitudes
- Number of changes: Distribution of how many times estimates were revised
- Undershoot vs. overshoot: Analysis of whether policies tend to underestimate or overestimate completion times
These visualizations help in comparing different solvers and understanding their behavior in project end-time estimation tasks.