Skip to content

Implement Urban Mental Health Option 1: Tree Canopy Cover + NDVI Inputs#2314

Draft
claire-simpson wants to merge 13 commits intonatcap:feature/urban-health-modelfrom
claire-simpson:feature/2141-umh-tcc-option1
Draft

Implement Urban Mental Health Option 1: Tree Canopy Cover + NDVI Inputs#2314
claire-simpson wants to merge 13 commits intonatcap:feature/urban-health-modelfrom
claire-simpson:feature/2141-umh-tcc-option1

Conversation

@claire-simpson
Copy link
Contributor

@claire-simpson claire-simpson commented Jan 24, 2026

Description

Implements Option 1 (tree canopy cover–based scenarios) for the Urban Mental Health model by translating a user-defined tree cover target into an NDVI-based nature exposure scenario (to create 'alternate NDVI').

This PR adds a population-weighted, non-linear translation between tree canopy cover (%) and NDVI exposure, following the framework described in the UMH design document.

The steps are as follows:

  1. Align input rasters
  2. Mask baseline NDVI (mask out water or other excluded LULC classes or mask by 0 threshold)
  3. Compute neighborhood NDVI exposure by convolving masked NDVI using the search_radius
  4. Compute neighborhood tree canopy cover (TCC) exposure by convolving TCC using the same search_radius
  5. Extract exposure values block-wise: iterate over aligned blocks of buffer-mean NDVI exposure, buffer-mean TCC exposure, and population (masking no data values and pixels with no population)
  6. Bin TCC values: assign each valid pixel to a TCC bin (range: [0, 100])
  7. Compute population-weighted mean NDVI per bin
  8. Fit a linear GAM using population per bin as weights (so bins w/ more people influence the fit more), to get a function mapping TCC exposure to NDVI exposure
  9. Evaluate the fitted function at the user-specified tree cover target value to get the NDVI target
  10. Generate the alternate NDVI exposure raster via: NDVI_alt = NDVI_base + (NDVI_target - f(TCC_pixel))
  11. Compute change in nature exposure via NE_delta = NDVI_alt - NDVI_base
  12. Mask out negative values in NE_delta

Notes:

  • Population is used only as a weighting factor, not as a spatial transform of TCC or NDVI.
  • Both NDVI and TCC are evaluated on the same neighborhood (buffer) scale, so:
    • The GAM learns a relationship between experienced canopy cover and experienced greenness, not pixel-level vegetation
    • Alternate NDVI is generated directly on the exposure scale, so no additional convolution is required after translation.
  • Tests are not complete (see Add/update tests for Urban Mental Health #2316), and I haven't added pygam as an InVEST dependency (see below)

Open Questions

  1. Is the above workflow/math correct, specifically w.r.t (1) calculating mean within a buffer distance (i.e., 2d convolution operation) for both NDVI and TCC before fitting the GAM and translating TCC to alternate NDVI and (2) using the population to both compute a population-weighted conditional mean of NDVI for each TCC "bin" and as a weight when fitting the GAM (population is not explicitly used to create a population-weighted TCC layer)
  2. Are we ok adding pygam as a dependency? There are definitely alternative options like using scipy.interpolate.UnivariateSpline. If so, I'd need to add pygam to requirements.txt.
  3. Are we ok with the binning approach to reduce memory use? In a comment in the design doc, Yingjie clarified that their inputs to the GAM were aggregated at the tract level before fitting this model (to avoid loading the entire NDVI, TCC, and population rasters into memory). However in our implementation, we are not requiring users to input tracts (though we could!). There are certainly other alternative approaches we could take to fitting the TCC-NDVI relationship, including:
  • Random spatial sampling of NDVI and TCC rasters (within iterblocks)
  • Spatial window aggregation: iterate over fixed-sized windows and compute mean NDVI and TCC and fit GAM on window-level summaries (or just downsample both rasters to have fewer pixels)

Fixes #2141

Checklist

  • Updated HISTORY.rst and link to any relevant issue (if these changes are user-facing)
  • Updated the user's guide (if needed)
  • Tested the Workbench UI (if relevant)

@claire-simpson claire-simpson changed the base branch from main to feature/urban-health-model January 24, 2026 00:01
numpy.testing.assert_allclose(
actual_mean_ndvi[key], expected_mean_ndvi[key], atol=1e-6)

# def test_option1_tcc_input(self):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Template for test for whole model, but I commented it out because there is still uncertainty around population weighting and whether to convolve TCC before fitting GAM - see #2316


curve_smooth = gam.predict(centers.reshape(-1, 1))

fig, ax = plt.subplots()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe matplotlib is a new dependency as of the reports PR so I assume this would be the first time a graph is created within a model. This was an intermediate output of the demo model and seems useful for interpreting the relationship between TCC and NDVI, so I think it'd be great to ultimately include in the report. However maybe saving this as a standalone isn't needed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it feels like a useful standalone output, I think it's fine to save on its own along with adding to the report if it'd be useful there too.

It might be worth thinking about whether we should be saving the numpy arrays for centers, curve, and others as intermediate outputs.

@claire-simpson claire-simpson marked this pull request as ready for review January 29, 2026 19:06
Copy link
Member

@dcdenu4 dcdenu4 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @claire-simpson ! I don't have too many comments but think it'd be best to walk through the curve fitting part on a call, after we talk with Yingjie, or with Yingjie too!

import matplotlib.pyplot as plt
import numpy
import pandas
from pygam import LinearGAM, s # Are we ok to add pygam as new invest dependency? Alternatively, could us scipy.UnivariateSpline
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be interesting in talking about how scipy.UnivariateSpline could be an alternative.

mental disorder cases at the pixel level, based on the selected urban
greening scenario.

Args:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Has this docstring been keeping pace with the MODEL_SPEC updates? In terms of descriptive text and required / optional flags.

if args['scenario'] == 'tcc_ndvi':
LOGGER.info("Using Tree Canopy Cover and NDVI inputs")
mean_buffered_tcc_task = task_graph.add_task(
func=pygeoprocessing.convolve_2d,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I think using a dichotomous kernel and convolving with normalize_kernel=True gets you a mean value within the given kernel radius.

file_registry['tree_cover_buffer_mean'],
args['tree_cover_target'],
file_registry['ndvi_alt_buffer_mean'],
file_registry['result_fig_tc_ndvi_plot']),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the plot output also be in the target_path_list below?

Comment on lines 1363 to 1365
Writes alt NDVI raster where each pixel's NDVI is increased based on
the difference between the target NDVI (based on tc_target) and the
NDVI predicted by the TC-->NDVI curve at that pixel's tree cover value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we mention how population is used at all for weighting?

population_path (str): path to population raster
tree_cover_path (str): path to tree cover raster with pixels in
range [0, 100]
tc_target (float): target tree canopy cover value (in range [0,100])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tc_target (float): target tree canopy cover value (in range [0,100])
tc_target (float): target tree canopy cover value as a percentage (in range [0,100])

None
"""

centers, curve = _fit_tc_to_ndvi_curve(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this should be its own taskgraph step in the workflow instead of called in here? Benefits could be avoided re-computation, more modular step by step breakout in execute, and maybe more targeted testing?

Args:
base_ndvi_path (str): path to baseline NDVI raster
tree_cover_path (str): path to tree cover raster
population_path (str): path to population raster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Density or count? For future us mostly.


curve_smooth = gam.predict(centers.reshape(-1, 1))

fig, ax = plt.subplots()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it feels like a useful standalone output, I think it's fine to save on its own along with adding to the report if it'd be useful there too.

It might be worth thinking about whether we should be saving the numpy arrays for centers, curve, and others as intermediate outputs.

file_registry['population_aligned'],
file_registry['tree_cover_buffer_mean'],
args['tree_cover_target'],
file_registry['ndvi_alt_buffer_mean'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe its worth updating ndvi_alt_buffer_mean, since this function isn't really returning a buffered mean? Right?

@claire-simpson claire-simpson marked this pull request as draft February 18, 2026 23:41
@claire-simpson claire-simpson added the on hold There's a reason we're not working on this yet label Feb 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

on hold There's a reason we're not working on this yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Urban Mental Health Model: implement scenario 1 (NDVI + TCC)

2 participants