Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -517,6 +517,15 @@ Same-CRS tiles skip reprojection entirely and are placed by direct coordinate al

--------

### **Kernel Density Estimation**

| Name | Description | Source | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
|:-----|:------------|:------:|:------------------:|:-----------------:|:---------------------:|:---------------------:|
| [KDE](xrspatial/kde.py) | Point-to-raster kernel density estimation (Gaussian, Epanechnikov, quartic) | Silverman 1986 | ✅️ | ✅️ | ✅️ | ✅️ |
| [Line Density](xrspatial/kde.py) | Line-segment-to-raster density estimation | Standard | ✅️ | | | |

--------

### **Multivariate**

| Name | Description | Source | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
Expand Down
1 change: 1 addition & 0 deletions docs/source/reference/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Reference
focal
hydrology
interpolation
kde
mcda
morphology
multispectral
Expand Down
24 changes: 24 additions & 0 deletions docs/source/reference/kde.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
.. _reference.kde:

***
KDE
***

.. note::

Kernel density estimation converts point or line data into
continuous density surfaces on a raster grid.

KDE
===
.. autosummary::
:toctree: _autosummary

xrspatial.kde.kde

Line Density
============
.. autosummary::
:toctree: _autosummary

xrspatial.kde.line_density
177 changes: 177 additions & 0 deletions examples/user_guide/49_KDE.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "bfd6d891",
"source": "# Xarray-Spatial KDE: Point density, kernel shapes, and line density\n\nTurning scattered point data into a continuous surface is one of the most common spatial analysis tasks. It comes up whenever you have event locations (earthquakes, crimes, disease cases) and want a smooth density map instead of a dot plot. xrspatial's KDE tools produce density rasters from raw coordinates, with configurable kernels and bandwidth.",
"metadata": {}
},
{
"cell_type": "markdown",
"id": "ee7bb1d4",
"source": "### What you'll build\n\n1. Generate synthetic earthquake cluster data\n2. Compute a Gaussian density surface with `kde()`\n3. Compare three kernel shapes: Gaussian, Epanechnikov, and quartic\n4. See how bandwidth controls smoothness\n5. Use per-point weights for magnitude-weighted density\n6. Map road density with `line_density()`\n\n![KDE preview](images/kde_preview.png)\n\n[Gaussian KDE](#Gaussian-KDE) · [Kernel comparison](#Kernel-comparison) · [Bandwidth effects](#Bandwidth-effects) · [Weighted KDE](#Weighted-KDE) · [Line density](#Line-density)",
"metadata": {}
},
{
"cell_type": "markdown",
"id": "92a81141",
"source": "Standard imports plus the two KDE functions.",
"metadata": {}
},
{
"cell_type": "code",
"id": "a1764753",
"source": "import numpy as np\nimport xarray as xr\n\nimport matplotlib.pyplot as plt\nfrom matplotlib.patches import Patch\n\nfrom xrspatial.kde import kde, line_density",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "965e4d4d",
"source": "## Synthetic earthquake data\n\nThree clusters of varying tightness and size, plus some background scatter. Each point gets a random magnitude between 2 and 6 that we'll use as weights later.",
"metadata": {}
},
{
"cell_type": "code",
"id": "78a82a28",
"source": "rng = np.random.default_rng(42)\n\n# Three clusters with different spreads\nc1_x, c1_y = rng.normal(2, 0.8, 120), rng.normal(3, 0.6, 120)\nc2_x, c2_y = rng.normal(7, 1.2, 80), rng.normal(7, 1.0, 80)\nc3_x, c3_y = rng.normal(5, 0.5, 50), rng.normal(1, 0.5, 50)\n\n# Background scatter\nbg_x, bg_y = rng.uniform(0, 10, 30), rng.uniform(0, 10, 30)\n\nx = np.concatenate([c1_x, c2_x, c3_x, bg_x])\ny = np.concatenate([c1_y, c2_y, c3_y, bg_y])\n\n# Random magnitudes for weighted KDE later\nmagnitudes = rng.uniform(2, 6, len(x))\n\nprint(f\"{len(x)} earthquake epicenters generated\")",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "3ff48b8a",
"source": "Raw scatter plot. The three clusters are visible but the exact density pattern is hard to read from points alone.",
"metadata": {}
},
{
"cell_type": "code",
"id": "ca17dc8f",
"source": "fig, ax = plt.subplots(figsize=(10, 7.5))\nax.plot(x, y, 'o', color='steelblue', markersize=3, alpha=0.5)\nax.set_xlim(0, 10)\nax.set_ylim(0, 10)\nax.set_aspect('equal')\nax.set_xlabel('x')\nax.set_ylabel('y')\nplt.show()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "498b0083",
"source": "## Gaussian KDE\n\n[Kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) places a smooth kernel function at each data point and sums the contributions at every output pixel. The Gaussian kernel is the most common choice because it produces infinitely smooth surfaces with no hard edges.\n\nThe plot below shows the density surface with the original points overlaid. Brighter areas have more points per unit area.",
"metadata": {}
},
{
"cell_type": "code",
"id": "ca293129",
"source": "density = kde(x, y, bandwidth=0.6, kernel='gaussian',\n x_range=(0, 10), y_range=(0, 10), width=400, height=400)\n\nfig, ax = plt.subplots(figsize=(10, 7.5))\ndensity.plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\nax.plot(x, y, 'o', color='white', markersize=1.5, alpha=0.3)\nax.set_title('')\nax.set_xlabel('')\nax.set_ylabel('')\nax.set_xticks([])\nax.set_yticks([])\nax.legend(handles=[Patch(facecolor='yellow', alpha=0.78, label='High density'),\n Patch(facecolor='#1a0a2e', alpha=0.78, label='Low density')],\n loc='lower right', fontsize=11, framealpha=0.9)\nplt.show()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "1612b72d",
"source": "The three clusters show up clearly. The tight cluster in the lower middle produces the sharpest peak, while the spread-out upper-right cluster has a broader, lower dome.",
"metadata": {}
},
{
"cell_type": "markdown",
"id": "d9a7148a",
"source": "## Kernel comparison\n\nThree kernel shapes are available:\n\n- **Gaussian**: bell curve, infinite range, always smooth\n- **Epanechnikov**: parabolic, drops to zero at the bandwidth radius, statistically optimal for minimizing mean integrated squared error\n- **Quartic** (biweight): similar compact support but smoother falloff than Epanechnikov\n\nThe next plot shows all three side by side on the same data and bandwidth.",
"metadata": {}
},
{
"cell_type": "code",
"id": "cff98669",
"source": "kernels = ['gaussian', 'epanechnikov', 'quartic']\nresults = {}\nfor k in kernels:\n results[k] = kde(x, y, bandwidth=0.8, kernel=k,\n x_range=(0, 10), y_range=(0, 10), width=300, height=300)\n\nfig, axes = plt.subplots(1, 3, figsize=(15, 5))\nfor ax, k in zip(axes, kernels):\n results[k].plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\n ax.set_title(k.capitalize(), fontsize=13)\n ax.set_xticks([])\n ax.set_yticks([])\n ax.set_xlabel('')\n ax.set_ylabel('')\nplt.tight_layout()\nplt.show()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "69773ca7",
"source": "The Gaussian surface bleeds out to the edges because it has infinite range. Epanechnikov and quartic cut off at exactly the bandwidth radius, leaving the corners at zero. For most practical work the differences are small; pick Gaussian for smoothness, Epanechnikov if you need a hard cutoff boundary.",
"metadata": {}
},
{
"cell_type": "markdown",
"id": "d3e9f621",
"source": "## Bandwidth effects\n\nBandwidth controls how far each point's influence reaches. Too narrow and you get spiky noise; too wide and real structure gets washed out. The default `'silverman'` rule works well for unimodal data but can oversmooth multimodal clusters.\n\nBelow: four bandwidths from very narrow (0.2) to very wide (2.0).",
"metadata": {}
},
{
"cell_type": "code",
"id": "4fee443f",
"source": "bandwidths = [0.2, 0.5, 1.0, 2.0]\n\nfig, axes = plt.subplots(1, 4, figsize=(18, 4.5))\nfor ax, bw in zip(axes, bandwidths):\n d = kde(x, y, bandwidth=bw, x_range=(0, 10), y_range=(0, 10),\n width=200, height=200)\n d.plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\n ax.set_title(f'bw = {bw}', fontsize=12)\n ax.set_xticks([])\n ax.set_yticks([])\n ax.set_xlabel('')\n ax.set_ylabel('')\nplt.tight_layout()\nplt.show()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "9dac2c7a",
"source": "At bw=0.2 you can pick out individual points. At bw=2.0 the three clusters have merged into a single blob. Something around 0.5 to 0.8 captures the cluster structure without too much noise.\n\n<div class=\"alert alert-block alert-warning\">\n<b>Units matter.</b> Bandwidth is in the same units as your coordinates. If x and y are in meters, a bandwidth of 500 means 500 m. If they're in degrees, 0.01 means roughly 1 km at mid-latitudes. Always think about what physical distance makes sense for your data before picking a number.\n</div>",
"metadata": {}
},
{
"cell_type": "markdown",
"id": "2465dd33",
"source": "## Weighted KDE\n\nNot all points are equal. An earthquake of magnitude 5 releases 30x more energy than a magnitude 4. Passing per-point weights lets the density surface reflect importance, not just count.\n\nBelow: unweighted (left) vs magnitude-weighted (right). The weighted surface shifts toward wherever the larger events happened to fall.",
"metadata": {}
},
{
"cell_type": "code",
"id": "8cf1850d",
"source": "unweighted = kde(x, y, bandwidth=0.6,\n x_range=(0, 10), y_range=(0, 10), width=300, height=300)\nweighted = kde(x, y, weights=magnitudes, bandwidth=0.6,\n x_range=(0, 10), y_range=(0, 10), width=300, height=300)\n\nfig, axes = plt.subplots(1, 2, figsize=(14, 6))\nfor ax, data, title in zip(axes,\n [unweighted, weighted],\n ['Unweighted', 'Magnitude-weighted']):\n data.plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\n ax.set_title(title, fontsize=13)\n ax.set_xticks([])\n ax.set_yticks([])\n ax.set_xlabel('')\n ax.set_ylabel('')\nplt.tight_layout()\nplt.show()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "9fdd26fb",
"source": "## Line density\n\n`line_density()` works the same way but for linear features. Each line segment is uniformly sampled and the samples are convolved with the kernel. The result shows where lines are concentrated, useful for things like road network analysis or fault-line mapping.\n\nInput is four arrays: start x, start y, end x, end y for each segment. The plot below shows a synthetic road grid.",
"metadata": {}
},
{
"cell_type": "code",
"id": "eab23452",
"source": "# Synthetic road segments: a loose grid with some randomness\nroad_x1, road_y1, road_x2, road_y2 = [], [], [], []\n\n# Horizontal roads\nfor yy in np.linspace(1, 9, 6):\n offset = rng.normal(0, 0.2)\n road_x1.append(0.5)\n road_y1.append(yy + offset)\n road_x2.append(9.5)\n road_y2.append(yy + rng.normal(0, 0.3))\n\n# Vertical roads (fewer in the east = lower density there)\nfor xx in np.linspace(1, 5, 6):\n road_x1.append(xx + rng.normal(0, 0.1))\n road_y1.append(0.5)\n road_x2.append(xx + rng.normal(0, 0.1))\n road_y2.append(9.5)\n\nfor xx in np.linspace(6, 9, 2):\n road_x1.append(xx)\n road_y1.append(0.5)\n road_x2.append(xx)\n road_y2.append(9.5)\n\nroad_x1 = np.array(road_x1)\nroad_y1 = np.array(road_y1)\nroad_x2 = np.array(road_x2)\nroad_y2 = np.array(road_y2)\n\nld = line_density(road_x1, road_y1, road_x2, road_y2,\n bandwidth=0.8,\n x_range=(0, 10), y_range=(0, 10),\n width=300, height=300)\n\nfig, ax = plt.subplots(figsize=(10, 7.5))\nld.plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\n# Draw the actual road segments on top\nfor i in range(len(road_x1)):\n ax.plot([road_x1[i], road_x2[i]], [road_y1[i], road_y2[i]],\n color='white', alpha=0.4, linewidth=0.8)\nax.set_title('')\nax.set_xlabel('')\nax.set_ylabel('')\nax.set_xticks([])\nax.set_yticks([])\nax.legend(handles=[Patch(facecolor='yellow', alpha=0.78, label='High road density'),\n Patch(facecolor='white', alpha=0.4, label='Road segments')],\n loc='lower right', fontsize=11, framealpha=0.9)\nplt.show()",
"metadata": {},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"id": "43046e13",
"source": "The denser grid on the west side shows up clearly as a brighter band. Intersections where horizontal and vertical roads cross produce the brightest hotspots.",
"metadata": {}
},
{
"cell_type": "markdown",
"id": "d1036afc",
"source": "<div class=\"alert alert-block alert-info\">\n<b>Template grids.</b> Both <code>kde()</code> and <code>line_density()</code> accept a <code>template</code> DataArray instead of <code>x_range</code>/<code>y_range</code>/<code>width</code>/<code>height</code>. The output will match the template's shape, extent, and coordinates. If the template is dask-backed, the output is computed lazily in chunks.\n</div>",
"metadata": {}
},
{
"cell_type": "markdown",
"id": "5dd32bc7",
"source": "### References\n\n- [Kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation), Wikipedia\n- [Silverman, B.W. (1986). *Density Estimation for Statistics and Data Analysis*.](https://ned.ipac.caltech.edu/level5/March02/Silverman/paper.pdf) Chapman & Hall.\n- [Epanechnikov, V.A. (1969). Non-parametric estimation of a multivariate probability density.](https://doi.org/10.1137/1114019) *Theory of Probability & Its Applications*, 14(1), 153-158.",
"metadata": {}
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"name": "python",
"version": "3.10.0"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Binary file added examples/user_guide/images/kde_preview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions xrspatial/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@
from xrspatial.interpolate import idw # noqa
from xrspatial.interpolate import kriging # noqa
from xrspatial.interpolate import spline # noqa
from xrspatial.kde import kde # noqa
from xrspatial.kde import line_density # noqa
from xrspatial.fire import burn_severity_class # noqa
from xrspatial.fire import dnbr # noqa
from xrspatial.fire import fireline_intensity # noqa
Expand Down
Loading
Loading