You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: examples/survival_analysis/bayes_param_survival.myst.md
+55-51Lines changed: 55 additions & 51 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@ jupytext:
5
5
format_name: myst
6
6
format_version: 0.13
7
7
kernelspec:
8
-
display_name: default
8
+
display_name: pymc
9
9
language: python
10
10
name: python3
11
11
---
@@ -17,12 +17,12 @@ kernelspec:
17
17
```{code-cell} ipython3
18
18
import warnings
19
19
20
-
import arviz as az
20
+
import arviz.preview as az
21
21
import numpy as np
22
22
import pymc as pm
23
23
import pytensor.tensor as pt
24
24
import scipy as sp
25
-
import seaborn as sns
25
+
import xarray as xr
26
26
27
27
from matplotlib import pyplot as plt
28
28
from matplotlib.ticker import StrMethodFormatter
@@ -33,7 +33,7 @@ print(f"Running on PyMC v{pm.__version__}")
33
33
34
34
```{code-cell} ipython3
35
35
%config InlineBackend.figure_format = 'retina'
36
-
az.style.use("arviz-darkgrid")
36
+
az.style.use("arviz-variat")
37
37
warnings.filterwarnings("ignore")
38
38
```
39
39
@@ -45,9 +45,6 @@ This post illustrates a parametric approach to Bayesian survival analysis in PyM
45
45
We will analyze the [mastectomy data](https://vincentarelbundock.github.io/Rdatasets/doc/HSAUR/mastectomy.html) from `R`'s [`HSAUR`](https://cran.r-project.org/web/packages/HSAUR/index.html) package.
The $\hat{R}$ statistics also indicate convergence.
235
232
236
233
```{code-cell} ipython3
237
-
max(np.max(gr_stats) for gr_stats in az.rhat(weibull_trace).values())
234
+
az.rhat(weibull_trace).to_array().max()
238
235
```
239
236
240
237
Below we plot posterior distributions of the parameters.
241
238
242
239
```{code-cell} ipython3
243
-
az.plot_forest(weibull_trace, figsize=(10, 4));
240
+
az.plot_forest(weibull_trace);
244
241
```
245
242
246
243
These are somewhat interesting (especially the fact that the posterior of $\beta_1$ is fairly well-separated from zero), but the posterior predictive survival curves will be much more interpretable.
@@ -268,34 +265,32 @@ with weibull_model:
268
265
The posterior predictive survival times show that, on average, patients whose cancer had not metastized survived longer than those whose cancer had metastized.
title="Weibull and log-logistic\nsurvival regression models",
381
+
xlim=(0, 230),
382
+
ylim=(0, 1),
383
+
)
389
384
390
-
ax.legend(loc=1)
391
-
ax.set_title("Weibull and log-logistic\nsurvival regression models");
385
+
ax.legend()
386
+
ax.yaxis.set_major_formatter(pct_formatter)
392
387
```
393
388
394
389
This post has been a short introduction to implementing parametric survival regression models in PyMC with a fairly simple data set. The modular nature of probabilistic programming with PyMC should make it straightforward to generalize these techniques to more complex and interesting data set.
@@ -400,8 +395,17 @@ This post has been a short introduction to implementing parametric survival regr
400
395
- Originally authored as a blog post by [Austin Rochford](https://austinrochford.com/posts/2017-10-02-bayes-param-survival.html) on October 2, 2017.
401
396
- Updated by [George Ho](https://eigenfoo.xyz/) on July 18, 2018.
The problem however, is that in censored data contexts, we do not have access to the true values. If we were to use the same uncensored model on the censored data, we would anticipate that our parameter estimates will be biased. If we calculate point estimates for the mean and std, then we can see that we are likely to underestimate the mean and std for this particular dataset and censor bounds.
135
+
The problem however, is that in censored data contexts, we do not have access to the true values. If we were to use the same uncensored model on the censored data, we would anticipate that our parameter estimates will be biased. If we calculate point estimates for the mean and standard deviation, then we can see that we are likely to underestimate the mean and standard deviation for this particular dataset and censor bounds.
The models below show 2 approaches to dealing with censored data. First, we need to do a bit of data pre-processing to count the number of observations that are left or right censored. We also also need to extract just the non-censored data that we observe.
156
+
The models below show two approaches to dealing with censored data. First, we need to do a bit of data pre-processing to count the number of observations that are left or right censored. We also need to extract just the non-censored data that we observe.
152
157
153
158
+++
154
159
@@ -187,11 +192,14 @@ with pm.Model() as imputed_censored_model:
0 commit comments