Skip to content

Add kernel density estimation (KDE)#1170

Merged
brendancol merged 7 commits intomasterfrom
issue-1143
Apr 6, 2026
Merged

Add kernel density estimation (KDE)#1170
brendancol merged 7 commits intomasterfrom
issue-1143

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

The repo has rasterize() for burning discrete values onto a grid but nothing for continuous density surfaces. This adds that.

Closes #1143

Summary

  • kde() turns point coordinates into a density raster. Gaussian, Epanechnikov, and quartic kernels. Automatic bandwidth (Silverman's rule) or manual. Optional per-point weights. All four backends: numpy, cupy, dask+numpy, dask+cupy.
  • line_density() does the same for line segments. numpy only for now.
  • 35 tests: correctness, edge cases, kernel types, bandwidth, weights, cross-backend parity (dask, cupy).
  • User guide notebook (examples/user_guide/49_KDE.ipynb) with earthquake cluster and road network examples.
  • Docs: docs/source/reference/kde.rst added, README feature matrix updated.

Test plan

  • pytest xrspatial/tests/test_kde.py -- 35/35 passing
  • Notebook executes via jupyter nbconvert --execute
  • Verify docs build with make html

_extract_transect was calling .compute() on the full dask array just to
read a handful of transect cells. Now uses vindex fancy indexing so only
the relevant chunks are materialized.

cumulative_viewshed was allocating a full-size np.zeros count array and
calling .values on each viewshed result, forcing materialization every
iteration. Now accumulates lazily with da.zeros and dask array addition
when the input is dask-backed.
The dask Tier B memory guard underestimated peak usage at 280 bytes/pixel.
Actual peak during lexsort reaches ~360 bytes/pixel (sorted + unsorted
event_list coexist) plus 8 bytes/pixel for the computed raster. Updated
estimate to 368 bytes/pixel to prevent borderline OOM.

Also use astype(copy=False) to skip the float64 copy when data is already
float64.
Implements kde() and line_density() for point-to-raster and
line-to-raster density surfaces.  Supports Gaussian, Epanechnikov,
and quartic kernels with automatic bandwidth selection via
Silverman's rule.  All four backends: numpy, cupy, dask+numpy,
dask+cupy.
35 tests covering correctness, edge cases, kernel types,
bandwidth selection, weights, and cross-backend parity
(dask+numpy, cupy). Removes hard cutoff from GPU Gaussian
kernel to avoid box-vs-circle mismatch with CPU.
Creates docs/source/reference/kde.rst with autosummary entries
for kde() and line_density(). Adds both functions to __init__.py
and the docs toctree.
Covers Gaussian/Epanechnikov/quartic kernels, bandwidth effects,
weighted KDE, and line density with synthetic earthquake and
road network data.
@github-actions github-actions bot added the performance PR touches performance-sensitive code label Apr 6, 2026
@brendancol brendancol merged commit 46ee269 into master Apr 6, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add kernel density estimation (KDE) for point-to-raster conversion

1 participant