Skip to content

Memory explosion when disaggregating shapefiles with many regions #15

@irm-codebase

Description

@irm-codebase

I ran into this while trying to disaggregate Mexico's rooftop PV potential on a decently sized laptop.
Attempting to disaggregate the raster will result on a large amount of memory consumption, leading to crashes.

The files themselves are not that large (GeoTiff is ~30 MB, Geoparquet is ~40 MB).
The difference here is the number of regions (2000+).

I suspect this may be caused by rioxarray exploding its dimensionality to the number of shapes during processing.

Specs

  • OS: Fedora 42
  • RAM: 16 GB DDR5 SODIMM
  • Swap: 8 GB
  • CPU: i7 13k w/ 20 CPU threads

Reproduction

Input files

Can be found here
https://surfdrive.surf.nl/files/index.php/s/Z4bLHF38tc87T6J

Script

# %%
import math

import geopandas as gpd
import gregor
import pandas as pd
import rioxarray as rxr
from matplotlib import pyplot as plt

case = "MEX"
year = 2023

# %%
shapes_df = gpd.read_parquet(f"downloads/{case}/{case}.parquet")
countries_df = shapes_df[["country_id", "geometry"]].dissolve("country_id").reset_index()
case_df = countries_df[countries_df["country_id"] == case]
case_df.plot()

# %%
area_potential = rxr.open_rasterio(
    f"downloads/{case}/{case}.tif",
    chunks={"x": 1024, "y": 1024},
).squeeze()
area_potential = area_potential.rio.write_crs("EPSG:4326")

# %%
# Decide on a maximum number of pixels in the final plot
max_pixels = 50_000_000  # tweak this to taste

# Compute needed coarsening factor
nx, ny = area_potential.sizes["x"], area_potential.sizes["y"]
factor = math.ceil(math.sqrt((nx * ny) / max_pixels))
pixel_count = (nx // factor) * (ny // factor)
print(
    f"Downsampling factor: {factor} (output will be ~{pixel_count} pixels)"
)

# Coarsen (block-average) the data
coarse = area_potential.coarsen(x=factor, y=factor, boundary="trim").mean()

# Set up plot
fig, ax = plt.subplots(figsize=(8, 6), layout="constrained")


# Plot full extent of the coarsened raster
case_df.to_crs(area_potential.rio.crs).geometry.boundary.plot(
    ax=ax, color="black", aspect=None, linewidth=0.3, alpha=0.2
)
coarse.plot.imshow(
    ax=ax,
    cmap="Oranges",
    vmax=500,
    add_colorbar=True,
    cbar_kwargs={"location": "bottom", "label": "Area potential for PV"},
    alpha=1
)
ax.set_aspect("equal")
ax.set_xlabel("Longitude")
ax.set_ylabel("Latitude")
ax.set_title(f"{case} Rooftop PV Potential used for aggregation\n"
             f"(figure coarsened to ~{pixel_count:.1e} pixels)")

plt.show()

# %%
# WARNING: crash below!
case_df["to_disaggregate"] = 1282 # dummy value
aggregated_pv = gregor.disaggregate.disaggregate_polygon_to_raster(case_df, column="to_disaggregate", proxy=area_potential)

Dependencies

❯ pixi list --explicit
Package                Version     Build               Size       Kind   Source
cartopy                0.24.0      py312hf9745cd_0     1.5 MiB    conda  https://conda.anaconda.org/conda-forge/
click                  8.2.1       pyh707e725_0        85.7 KiB   conda  https://conda.anaconda.org/conda-forge/
clio-tools             2025.03.03  pyhd8ed1ab_0        14.2 KiB   conda  https://conda.anaconda.org/conda-forge/
conda                  25.3.1      py312h7900ff3_0     1.1 MiB    conda  https://conda.anaconda.org/conda-forge/
contextily             1.6.2       pyhd8ed1ab_1        20.3 KiB   conda  https://conda.anaconda.org/conda-forge/
dask                   2025.5.1    pyhe01879c_1        11.1 KiB   conda  https://conda.anaconda.org/conda-forge/
gdal                   3.10.3      py312hf1b357c_11    1.7 MiB    conda  https://conda.anaconda.org/conda-forge/
geopandas              1.0.1       pyhd8ed1ab_3        7.4 KiB    conda  https://conda.anaconda.org/conda-forge/
gregor                 0.0.3.dev0                                 pypi   git+https://github.com/jnnr/gregor.git?rev=4d54d11#4d54d1167ebb78de553c0439374ab936c03923ad
ipdb                   0.13.13     pyhd8ed1ab_1        18.3 KiB   conda  https://conda.anaconda.org/conda-forge/
ipykernel              6.29.5      pyh3099207_0        116.3 KiB  conda  https://conda.anaconda.org/conda-forge/
libgdal-arrow-parquet  3.10.3      h8ae71d8_11         807.9 KiB  conda  https://conda.anaconda.org/conda-forge/
libgdal-core           3.10.3      hcac4edf_11         10.3 MiB   conda  https://conda.anaconda.org/conda-forge/
mypy                   1.15.0      py312h66e93f0_0     17.8 MiB   conda  https://conda.anaconda.org/conda-forge/
pandera-geopandas      0.24.0      hd8ed1ab_2          7.3 KiB    conda  https://conda.anaconda.org/conda-forge/
pandera-pandas         0.24.0      hd8ed1ab_2          7.3 KiB    conda  https://conda.anaconda.org/conda-forge/
powerplantmatching     0.7.1       pyhd8ed1ab_0        661.1 KiB  conda  https://conda.anaconda.org/conda-forge/
pyarrow                19.0.1      py312h7900ff3_0     24.7 KiB   conda  https://conda.anaconda.org/conda-forge/
pycountry              24.6.1      pyhd8ed1ab_0        3 MiB      conda  https://conda.anaconda.org/conda-forge/
pystac-client          0.8.6       pyhd8ed1ab_0        35 KiB     conda  https://conda.anaconda.org/conda-forge/
pytest                 8.3.5       pyhd8ed1ab_0        253.7 KiB  conda  https://conda.anaconda.org/conda-forge/
python                 3.12.9      h9e4cc4f_1_cpython  30.2 MiB   conda  https://conda.anaconda.org/conda-forge/
rasterio               1.4.3       py312h021bea1_1     7.6 MiB    conda  https://conda.anaconda.org/conda-forge/
richdem                2.3.0       py312h546fd74_12    5.1 MiB    conda  https://conda.anaconda.org/conda-forge/
ruff                   0.11.4      py312h286b59f_0     8.6 MiB    conda  https://conda.anaconda.org/conda-forge/
snakefmt               0.11.0      pyhdfd78af_0        31.2 KiB   conda  https://conda.anaconda.org/bioconda/
snakemake-minimal      9.1.9       pyhdfd78af_0        848.4 KiB  conda  https://conda.anaconda.org/bioconda/

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions