Skip to content

Commit 0b47e94

Browse files
committed
update more examples
1 parent 4b3eb8c commit 0b47e94

File tree

5 files changed

+1090
-459
lines changed

5 files changed

+1090
-459
lines changed

examples/diagnostics_and_criticism/model_averaging.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -759,7 +759,7 @@
759759
"name": "python",
760760
"nbconvert_exporter": "python",
761761
"pygments_lexer": "ipython3",
762-
"version": "3.13.5"
762+
"version": "3.11.5"
763763
}
764764
},
765765
"nbformat": 4,

examples/survival_analysis/bayes_param_survival.ipynb

Lines changed: 849 additions & 167 deletions
Large diffs are not rendered by default.

examples/survival_analysis/bayes_param_survival.myst.md

Lines changed: 55 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ jupytext:
55
format_name: myst
66
format_version: 0.13
77
kernelspec:
8-
display_name: default
8+
display_name: pymc
99
language: python
1010
name: python3
1111
---
@@ -17,12 +17,12 @@ kernelspec:
1717
```{code-cell} ipython3
1818
import warnings
1919
20-
import arviz as az
20+
import arviz.preview as az
2121
import numpy as np
2222
import pymc as pm
2323
import pytensor.tensor as pt
2424
import scipy as sp
25-
import seaborn as sns
25+
import xarray as xr
2626
2727
from matplotlib import pyplot as plt
2828
from matplotlib.ticker import StrMethodFormatter
@@ -33,7 +33,7 @@ print(f"Running on PyMC v{pm.__version__}")
3333

3434
```{code-cell} ipython3
3535
%config InlineBackend.figure_format = 'retina'
36-
az.style.use("arviz-darkgrid")
36+
az.style.use("arviz-variat")
3737
warnings.filterwarnings("ignore")
3838
```
3939

@@ -45,9 +45,6 @@ This post illustrates a parametric approach to Bayesian survival analysis in PyM
4545
We will analyze the [mastectomy data](https://vincentarelbundock.github.io/Rdatasets/doc/HSAUR/mastectomy.html) from `R`'s [`HSAUR`](https://cran.r-project.org/web/packages/HSAUR/index.html) package.
4646

4747
```{code-cell} ipython3
48-
sns.set()
49-
blue, green, red, purple, gold, teal = sns.color_palette(n_colors=6)
50-
5148
pct_formatter = StrMethodFormatter("{x:.1%}")
5249
```
5350

@@ -228,19 +225,19 @@ with weibull_model:
228225
The energy plot and Bayesian fraction of missing information give no cause for concern about poor mixing in NUTS.
229226

230227
```{code-cell} ipython3
231-
az.plot_energy(weibull_trace, fill_color=("C0", "C1"));
228+
az.plot_energy(weibull_trace);
232229
```
233230

234231
The $\hat{R}$ statistics also indicate convergence.
235232

236233
```{code-cell} ipython3
237-
max(np.max(gr_stats) for gr_stats in az.rhat(weibull_trace).values())
234+
az.rhat(weibull_trace).to_array().max()
238235
```
239236

240237
Below we plot posterior distributions of the parameters.
241238

242239
```{code-cell} ipython3
243-
az.plot_forest(weibull_trace, figsize=(10, 4));
240+
az.plot_forest(weibull_trace);
244241
```
245242

246243
These are somewhat interesting (especially the fact that the posterior of $\beta_1$ is fairly well-separated from zero), but the posterior predictive survival curves will be much more interpretable.
@@ -268,34 +265,32 @@ with weibull_model:
268265
The posterior predictive survival times show that, on average, patients whose cancer had not metastized survived longer than those whose cancer had metastized.
269266

270267
```{code-cell} ipython3
271-
t_plot = np.linspace(0, 230, 100)
268+
t_plot = xr.DataArray(np.linspace(0, 230, 100), dims=["time"])
269+
pp_samples = az.extract(pp_weibull_trace.posterior_predictive["events"])
272270
273-
weibull_pp_surv = np.greater_equal.outer(
274-
np.exp(
275-
y.mean()
276-
+ y.std() * az.extract(pp_weibull_trace.posterior_predictive["events"])["events"].values
277-
),
278-
t_plot,
279-
)
280-
weibull_pp_surv_mean = weibull_pp_surv.mean(axis=1)
271+
linear_pred = y.mean() + y.std() * pp_samples
272+
273+
weibull_pp_surv = np.exp(linear_pred) >= t_plot
274+
275+
weibull_pp_surv_mean = weibull_pp_surv.mean(dim="sample")
281276
```
282277

283278
```{code-cell} ipython3
284279
fig, ax = plt.subplots(figsize=(8, 6))
285280
286281
287-
ax.plot(t_plot, weibull_pp_surv_mean[0], c=blue, label="Not metastized")
288-
ax.plot(t_plot, weibull_pp_surv_mean[1], c=red, label="Metastized")
289-
290-
ax.set_xlim(0, 230)
291-
ax.set_xlabel("Weeks since mastectomy")
282+
ax.plot(t_plot, weibull_pp_surv_mean[0], label="Not metastized")
283+
ax.plot(t_plot, weibull_pp_surv_mean[1], label="Metastized")
292284
293-
ax.set_ylim(top=1)
285+
ax.set(
286+
xlabel="Weeks since mastectomy",
287+
ylabel="Survival probability",
288+
title="Weibull survival regression model",
289+
xlim=(0, 230),
290+
ylim=(0, 1),
291+
)
292+
ax.legend()
294293
ax.yaxis.set_major_formatter(pct_formatter)
295-
ax.set_ylabel("Survival probability")
296-
297-
ax.legend(loc=1)
298-
ax.set_title("Weibull survival regression model");
299294
```
300295

301296
### Log-logistic survival regression
@@ -342,11 +337,11 @@ with log_logistic_model:
342337
All of the sampling diagnostics look good for this model.
343338

344339
```{code-cell} ipython3
345-
az.plot_energy(log_logistic_trace, fill_color=("C0", "C1"));
340+
az.plot_energy(log_logistic_trace);
346341
```
347342

348343
```{code-cell} ipython3
349-
max(np.max(gr_stats) for gr_stats in az.rhat(log_logistic_trace).values())
344+
az.rhat(log_logistic_trace).to_array().max()
350345
```
351346

352347
Again, we calculate the posterior expected survival functions for this model.
@@ -360,35 +355,35 @@ with log_logistic_model:
360355
```
361356

362357
```{code-cell} ipython3
363-
log_logistic_pp_surv = np.greater_equal.outer(
364-
np.exp(
365-
y.mean()
366-
+ y.std()
367-
* az.extract(pp_log_logistic_trace.posterior_predictive["events"])["events"].values
368-
),
369-
t_plot,
370-
)
371-
log_logistic_pp_surv_mean = log_logistic_pp_surv.mean(axis=1)
358+
pp_samples = az.extract(pp_log_logistic_trace.posterior_predictive["events"])
359+
360+
linear_pred = y.mean() + y.std() * pp_samples
361+
362+
log_logistic_pp_surv = np.exp(linear_pred) >= t_plot
363+
364+
log_logistic_pp_surv_mean = log_logistic_pp_surv.mean(dim="sample")
372365
```
373366

374367
```{code-cell} ipython3
375368
fig, ax = plt.subplots(figsize=(8, 6))
376369
377-
ax.plot(t_plot, weibull_pp_surv_mean[0], c=blue, label="Weibull, not metastized")
378-
ax.plot(t_plot, weibull_pp_surv_mean[1], c=red, label="Weibull, metastized")
370+
ax.plot(t_plot, weibull_pp_surv_mean[0], c="C0", label="Weibull, not metastized")
371+
ax.plot(t_plot, weibull_pp_surv_mean[1], c="C1", label="Weibull, metastized")
379372
380-
ax.plot(t_plot, log_logistic_pp_surv_mean[0], "--", c=blue, label="Log-logistic, not metastized")
381-
ax.plot(t_plot, log_logistic_pp_surv_mean[1], "--", c=red, label="Log-logistic, metastized")
373+
ax.plot(t_plot, log_logistic_pp_surv_mean[0], "--", c="C0", label="Log-logistic, not metastized")
374+
ax.plot(t_plot, log_logistic_pp_surv_mean[1], "--", c="C1", label="Log-logistic, metastized")
382375
383-
ax.set_xlim(0, 230)
384-
ax.set_xlabel("Weeks since mastectomy")
385376
386-
ax.set_ylim(top=1)
387-
ax.yaxis.set_major_formatter(pct_formatter)
388-
ax.set_ylabel("Survival probability")
377+
ax.set(
378+
xlabel="Weeks since mastectomy",
379+
ylabel="Survival probability",
380+
title="Weibull and log-logistic\nsurvival regression models",
381+
xlim=(0, 230),
382+
ylim=(0, 1),
383+
)
389384
390-
ax.legend(loc=1)
391-
ax.set_title("Weibull and log-logistic\nsurvival regression models");
385+
ax.legend()
386+
ax.yaxis.set_major_formatter(pct_formatter)
392387
```
393388

394389
This post has been a short introduction to implementing parametric survival regression models in PyMC with a fairly simple data set. The modular nature of probabilistic programming with PyMC should make it straightforward to generalize these techniques to more complex and interesting data set.
@@ -400,8 +395,17 @@ This post has been a short introduction to implementing parametric survival regr
400395
- Originally authored as a blog post by [Austin Rochford](https://austinrochford.com/posts/2017-10-02-bayes-param-survival.html) on October 2, 2017.
401396
- Updated by [George Ho](https://eigenfoo.xyz/) on July 18, 2018.
402397
- Updated by @fonnesbeck on September 11, 2024.
398+
- Updated by Osvaldo Martin on December 2025.
403399

404400
```{code-cell} ipython3
405401
%load_ext watermark
406402
%watermark -n -u -v -iv -w
407403
```
404+
405+
```{code-cell} ipython3
406+
pp_weibull_dt = az.convert_to_datatree(pp_weibull_trace)
407+
```
408+
409+
```{code-cell} ipython3
410+
pp_weibull_dt.posterior_predictive
411+
```

examples/survival_analysis/censored_data.ipynb

Lines changed: 158 additions & 219 deletions
Large diffs are not rendered by default.

examples/survival_analysis/censored_data.myst.md

Lines changed: 27 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -22,19 +22,18 @@ kernelspec:
2222
```{code-cell} ipython3
2323
from copy import copy
2424
25-
import arviz as az
25+
import arviz.preview as az
2626
import matplotlib.pyplot as plt
2727
import numpy as np
2828
import pymc as pm
29-
import seaborn as sns
3029
3130
from numpy.random import default_rng
3231
```
3332

3433
```{code-cell} ipython3
3534
%config InlineBackend.figure_format = 'retina'
3635
rng = default_rng(1234)
37-
az.style.use("arviz-darkgrid")
36+
az.style.use("arviz-variat")
3837
```
3938

4039
[This example notebook on Bayesian survival
@@ -59,7 +58,7 @@ Censored data arises in many modelling problems. Two common examples are:
5958
range of temperatures.
6059

6160
This example notebook presents two different ways of dealing with censored data
62-
in PyMC3:
61+
in PyMC:
6362

6463
1. An imputed censored model, which represents censored data as parameters and
6564
makes up plausible values for all censored values. As a result of this
@@ -101,8 +100,8 @@ censored = censor(samples, low, high)
101100
# Visualize uncensored and censored data
102101
_, ax = plt.subplots(figsize=(10, 3))
103102
edges = np.linspace(-5, 35, 30)
104-
ax.hist(samples, bins=edges, density=True, histtype="stepfilled", alpha=0.2, label="Uncensored")
105-
ax.hist(censored, bins=edges, density=True, histtype="stepfilled", alpha=0.2, label="Censored")
103+
ax.hist(samples, bins=edges, density=True, histtype="stepfilled", alpha=0.4, label="Uncensored")
104+
ax.hist(censored, bins=edges, density=True, histtype="stepfilled", alpha=0.4, label="Censored")
106105
[ax.axvline(x=x, c="k", ls="--") for x in [low, high]]
107106
ax.legend();
108107
```
@@ -124,13 +123,16 @@ We should predict that running the uncensored model on uncensored data, we will
124123
uncensored_model_1 = uncensored_model(samples)
125124
with uncensored_model_1:
126125
idata = pm.sample()
126+
```
127127

128-
az.plot_posterior(idata, ref_val=[true_mu, true_sigma], round_to=3);
128+
```{code-cell} ipython3
129+
pc = az.plot_dist(idata)
130+
az.add_lines(pc, {"mu": true_mu, "sigma": true_sigma});
129131
```
130132

131133
And that is exactly what we find.
132134

133-
The problem however, is that in censored data contexts, we do not have access to the true values. If we were to use the same uncensored model on the censored data, we would anticipate that our parameter estimates will be biased. If we calculate point estimates for the mean and std, then we can see that we are likely to underestimate the mean and std for this particular dataset and censor bounds.
135+
The problem however, is that in censored data contexts, we do not have access to the true values. If we were to use the same uncensored model on the censored data, we would anticipate that our parameter estimates will be biased. If we calculate point estimates for the mean and standard deviation, then we can see that we are likely to underestimate the mean and standard deviation for this particular dataset and censor bounds.
134136

135137
```{code-cell} ipython3
136138
print(f"mean={np.mean(censored):.2f}; std={np.std(censored):.2f}")
@@ -140,15 +142,18 @@ print(f"mean={np.mean(censored):.2f}; std={np.std(censored):.2f}")
140142
uncensored_model_2 = uncensored_model(censored)
141143
with uncensored_model_2:
142144
idata = pm.sample()
143-
144-
az.plot_posterior(idata, ref_val=[true_mu, true_sigma], round_to=3);
145145
```
146146

147-
The figure above confirms this.
147+
As expected, we see that both the mean and standard deviation are underestimated when using the uncensored model on censored data.
148+
149+
```{code-cell} ipython3
150+
pc = az.plot_dist(idata)
151+
az.add_lines(pc, {"mu": true_mu, "sigma": true_sigma});
152+
```
148153

149154
## Censored data models
150155

151-
The models below show 2 approaches to dealing with censored data. First, we need to do a bit of data pre-processing to count the number of observations that are left or right censored. We also also need to extract just the non-censored data that we observe.
156+
The models below show two approaches to dealing with censored data. First, we need to do a bit of data pre-processing to count the number of observations that are left or right censored. We also need to extract just the non-censored data that we observe.
152157

153158
+++
154159

@@ -187,11 +192,14 @@ with pm.Model() as imputed_censored_model:
187192
)
188193
observed = pm.Normal("observed", mu=mu, sigma=sigma, observed=uncensored, shape=int(n_observed))
189194
idata = pm.sample()
195+
```
190196

191-
az.plot_posterior(idata, var_names=["mu", "sigma"], ref_val=[true_mu, true_sigma], round_to=3);
197+
```{code-cell} ipython3
198+
pc = az.plot_dist(idata, var_names=["mu", "sigma"])
199+
az.add_lines(pc, {"mu": true_mu, "sigma": true_sigma});
192200
```
193201

194-
We can see that the bias in the estimates of the mean and variance (present in the uncensored model) have been largely removed.
202+
We can see that the bias in the estimates of the mean and standard deviation (present in the uncensored model) have been largely removed.
195203

196204
+++
197205

@@ -205,15 +213,12 @@ with pm.Model() as unimputed_censored_model:
205213
sigma = pm.HalfNormal("sigma", sigma=(high - low) / 2.0)
206214
y_latent = pm.Normal.dist(mu=mu, sigma=sigma)
207215
obs = pm.Censored("obs", y_latent, lower=low, upper=high, observed=censored)
216+
idata = pm.sample()
208217
```
209218

210-
Sampling
211-
212219
```{code-cell} ipython3
213-
with unimputed_censored_model:
214-
idata = pm.sample()
215-
216-
az.plot_posterior(idata, var_names=["mu", "sigma"], ref_val=[true_mu, true_sigma], round_to=3);
220+
pc = az.plot_dist(idata, var_names=["mu", "sigma"])
221+
az.add_lines(pc, {"mu": true_mu, "sigma": true_sigma});
217222
```
218223

219224
Again, the bias in the estimates of the mean and variance (present in the uncensored model) have been largely removed.
@@ -231,7 +236,8 @@ As we can see, both censored models appear to capture the mean and variance of t
231236
- Originally authored by [Luis Mario Domenzain](https://github.com/domenzain) on Mar 7, 2017.
232237
- Updated by [George Ho](https://github.com/eigenfoo) on Jul 14, 2018.
233238
- Updated by [Benjamin Vincent](https://github.com/drbenvincent) in May 2021.
234-
- Updated by [Benjamin Vincent](https://github.com/drbenvincent) in May 2022 to PyMC v4.
239+
- Updated by [Benjamin Vincent](https://github.com/drbenvincent) in May 2022.
240+
- Updated by [Osvaldo Martin](https://github.com/aloctavodia) in Dec 2025.
235241

236242
+++
237243

0 commit comments

Comments
 (0)