Skip to content

Enable running subsets of input data. #1492

@davidorme

Description

@davidorme

At the moment, there isn't a clean path to run a spatial or temporal subset of a dataset for a site. This would be an invaluable exploratory tool for calibration, validation and profiling by reducing the computational demand for runs intended to explore model behaviour rather than simulate the entire site.

We do have some elements in place:

  • The grid config sets the coordinates of the cells expected in spatial inputs. Under the hood, the validation of those coordinates for the gridded data complains if data is missing, but not if additional data is present. I don't think though that this carries forwards into only loading the requested cells, but it could do.

  • The only other spatially explicit data at the moment is the plant community data. That currently uses cell id but needs to switch to using XY coordinates for cells (cell_id is supposed to be an internal mechanism and that genie needs stuffing back in the bottle). We could equally filter this by requested cells.

  • We have a debug mechanism in place for truncating the number of time steps. It would be better to have this select dates from within the provided data but we do at least have something functional for now.

These changes would give us a huge advantage in terms of data preparation effort. Rather than having to generate and maintain cut down versions of data files containing spatial subsets, we could have the single site directory and use model configuration to run spatial subsets flexibly.

@kirkImperial This could be particularly useful for profiling - is it something you could scope out?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions