diff --git a/README.md b/README.md
index 0d0099b4..c04b640f 100644
--- a/README.md
+++ b/README.md
@@ -517,6 +517,15 @@ Same-CRS tiles skip reprojection entirely and are placed by direct coordinate al
 
 --------
 
+### **Kernel Density Estimation**
+
+| Name | Description | Source | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
+|:-----|:------------|:------:|:------------------:|:-----------------:|:---------------------:|:---------------------:|
+| [KDE](xrspatial/kde.py) | Point-to-raster kernel density estimation (Gaussian, Epanechnikov, quartic) | Silverman 1986 | ✅️ | ✅️ | ✅️ | ✅️ |
+| [Line Density](xrspatial/kde.py) | Line-segment-to-raster density estimation | Standard | ✅️ | | | |
+
+--------
+
 ### **Multivariate**
 
 | Name | Description | Source | NumPy xr.DataArray | Dask xr.DataArray | CuPy GPU xr.DataArray | Dask GPU xr.DataArray |
diff --git a/docs/source/reference/index.rst b/docs/source/reference/index.rst
index 3b3eef32..a268b5cd 100644
--- a/docs/source/reference/index.rst
+++ b/docs/source/reference/index.rst
@@ -16,6 +16,7 @@ Reference
    focal
    hydrology
    interpolation
+   kde
    mcda
    morphology
    multispectral
diff --git a/docs/source/reference/kde.rst b/docs/source/reference/kde.rst
new file mode 100644
index 00000000..0f832d6f
--- /dev/null
+++ b/docs/source/reference/kde.rst
@@ -0,0 +1,24 @@
+..  _reference.kde:
+
+***
+KDE
+***
+
+.. note::
+
+   Kernel density estimation converts point or line data into
+   continuous density surfaces on a raster grid.
+
+KDE
+===
+.. autosummary::
+   :toctree: _autosummary
+
+   xrspatial.kde.kde
+
+Line Density
+============
+.. autosummary::
+   :toctree: _autosummary
+
+   xrspatial.kde.line_density
diff --git a/examples/user_guide/49_KDE.ipynb b/examples/user_guide/49_KDE.ipynb
new file mode 100644
index 00000000..3266e327
--- /dev/null
+++ b/examples/user_guide/49_KDE.ipynb
@@ -0,0 +1,177 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "bfd6d891",
+   "source": "# Xarray-Spatial KDE: Point density, kernel shapes, and line density\n\nTurning scattered point data into a continuous surface is one of the most common spatial analysis tasks. It comes up whenever you have event locations (earthquakes, crimes, disease cases) and want a smooth density map instead of a dot plot. xrspatial's KDE tools produce density rasters from raw coordinates, with configurable kernels and bandwidth.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "ee7bb1d4",
+   "source": "### What you'll build\n\n1. Generate synthetic earthquake cluster data\n2. Compute a Gaussian density surface with `kde()`\n3. Compare three kernel shapes: Gaussian, Epanechnikov, and quartic\n4. See how bandwidth controls smoothness\n5. Use per-point weights for magnitude-weighted density\n6. Map road density with `line_density()`\n\n![KDE preview](images/kde_preview.png)\n\n[Gaussian KDE](#Gaussian-KDE) · [Kernel comparison](#Kernel-comparison) · [Bandwidth effects](#Bandwidth-effects) · [Weighted KDE](#Weighted-KDE) · [Line density](#Line-density)",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "92a81141",
+   "source": "Standard imports plus the two KDE functions.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "a1764753",
+   "source": "import numpy as np\nimport xarray as xr\n\nimport matplotlib.pyplot as plt\nfrom matplotlib.patches import Patch\n\nfrom xrspatial.kde import kde, line_density",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "965e4d4d",
+   "source": "## Synthetic earthquake data\n\nThree clusters of varying tightness and size, plus some background scatter. Each point gets a random magnitude between 2 and 6 that we'll use as weights later.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "78a82a28",
+   "source": "rng = np.random.default_rng(42)\n\n# Three clusters with different spreads\nc1_x, c1_y = rng.normal(2, 0.8, 120), rng.normal(3, 0.6, 120)\nc2_x, c2_y = rng.normal(7, 1.2, 80), rng.normal(7, 1.0, 80)\nc3_x, c3_y = rng.normal(5, 0.5, 50), rng.normal(1, 0.5, 50)\n\n# Background scatter\nbg_x, bg_y = rng.uniform(0, 10, 30), rng.uniform(0, 10, 30)\n\nx = np.concatenate([c1_x, c2_x, c3_x, bg_x])\ny = np.concatenate([c1_y, c2_y, c3_y, bg_y])\n\n# Random magnitudes for weighted KDE later\nmagnitudes = rng.uniform(2, 6, len(x))\n\nprint(f\"{len(x)} earthquake epicenters generated\")",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3ff48b8a",
+   "source": "Raw scatter plot. The three clusters are visible but the exact density pattern is hard to read from points alone.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "ca17dc8f",
+   "source": "fig, ax = plt.subplots(figsize=(10, 7.5))\nax.plot(x, y, 'o', color='steelblue', markersize=3, alpha=0.5)\nax.set_xlim(0, 10)\nax.set_ylim(0, 10)\nax.set_aspect('equal')\nax.set_xlabel('x')\nax.set_ylabel('y')\nplt.show()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "498b0083",
+   "source": "## Gaussian KDE\n\n[Kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation) places a smooth kernel function at each data point and sums the contributions at every output pixel. The Gaussian kernel is the most common choice because it produces infinitely smooth surfaces with no hard edges.\n\nThe plot below shows the density surface with the original points overlaid. Brighter areas have more points per unit area.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "ca293129",
+   "source": "density = kde(x, y, bandwidth=0.6, kernel='gaussian',\n              x_range=(0, 10), y_range=(0, 10), width=400, height=400)\n\nfig, ax = plt.subplots(figsize=(10, 7.5))\ndensity.plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\nax.plot(x, y, 'o', color='white', markersize=1.5, alpha=0.3)\nax.set_title('')\nax.set_xlabel('')\nax.set_ylabel('')\nax.set_xticks([])\nax.set_yticks([])\nax.legend(handles=[Patch(facecolor='yellow', alpha=0.78, label='High density'),\n                   Patch(facecolor='#1a0a2e', alpha=0.78, label='Low density')],\n          loc='lower right', fontsize=11, framealpha=0.9)\nplt.show()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1612b72d",
+   "source": "The three clusters show up clearly. The tight cluster in the lower middle produces the sharpest peak, while the spread-out upper-right cluster has a broader, lower dome.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d9a7148a",
+   "source": "## Kernel comparison\n\nThree kernel shapes are available:\n\n- **Gaussian**: bell curve, infinite range, always smooth\n- **Epanechnikov**: parabolic, drops to zero at the bandwidth radius, statistically optimal for minimizing mean integrated squared error\n- **Quartic** (biweight): similar compact support but smoother falloff than Epanechnikov\n\nThe next plot shows all three side by side on the same data and bandwidth.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "cff98669",
+   "source": "kernels = ['gaussian', 'epanechnikov', 'quartic']\nresults = {}\nfor k in kernels:\n    results[k] = kde(x, y, bandwidth=0.8, kernel=k,\n                     x_range=(0, 10), y_range=(0, 10), width=300, height=300)\n\nfig, axes = plt.subplots(1, 3, figsize=(15, 5))\nfor ax, k in zip(axes, kernels):\n    results[k].plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\n    ax.set_title(k.capitalize(), fontsize=13)\n    ax.set_xticks([])\n    ax.set_yticks([])\n    ax.set_xlabel('')\n    ax.set_ylabel('')\nplt.tight_layout()\nplt.show()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "69773ca7",
+   "source": "The Gaussian surface bleeds out to the edges because it has infinite range. Epanechnikov and quartic cut off at exactly the bandwidth radius, leaving the corners at zero. For most practical work the differences are small; pick Gaussian for smoothness, Epanechnikov if you need a hard cutoff boundary.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d3e9f621",
+   "source": "## Bandwidth effects\n\nBandwidth controls how far each point's influence reaches. Too narrow and you get spiky noise; too wide and real structure gets washed out. The default `'silverman'` rule works well for unimodal data but can oversmooth multimodal clusters.\n\nBelow: four bandwidths from very narrow (0.2) to very wide (2.0).",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "4fee443f",
+   "source": "bandwidths = [0.2, 0.5, 1.0, 2.0]\n\nfig, axes = plt.subplots(1, 4, figsize=(18, 4.5))\nfor ax, bw in zip(axes, bandwidths):\n    d = kde(x, y, bandwidth=bw, x_range=(0, 10), y_range=(0, 10),\n            width=200, height=200)\n    d.plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\n    ax.set_title(f'bw = {bw}', fontsize=12)\n    ax.set_xticks([])\n    ax.set_yticks([])\n    ax.set_xlabel('')\n    ax.set_ylabel('')\nplt.tight_layout()\nplt.show()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9dac2c7a",
+   "source": "At bw=0.2 you can pick out individual points. At bw=2.0 the three clusters have merged into a single blob. Something around 0.5 to 0.8 captures the cluster structure without too much noise.\n\n<div class=\"alert alert-block alert-warning\">\n<b>Units matter.</b> Bandwidth is in the same units as your coordinates. If x and y are in meters, a bandwidth of 500 means 500 m. If they're in degrees, 0.01 means roughly 1 km at mid-latitudes. Always think about what physical distance makes sense for your data before picking a number.\n</div>",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2465dd33",
+   "source": "## Weighted KDE\n\nNot all points are equal. An earthquake of magnitude 5 releases 30x more energy than a magnitude 4. Passing per-point weights lets the density surface reflect importance, not just count.\n\nBelow: unweighted (left) vs magnitude-weighted (right). The weighted surface shifts toward wherever the larger events happened to fall.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "8cf1850d",
+   "source": "unweighted = kde(x, y, bandwidth=0.6,\n                 x_range=(0, 10), y_range=(0, 10), width=300, height=300)\nweighted = kde(x, y, weights=magnitudes, bandwidth=0.6,\n               x_range=(0, 10), y_range=(0, 10), width=300, height=300)\n\nfig, axes = plt.subplots(1, 2, figsize=(14, 6))\nfor ax, data, title in zip(axes,\n                            [unweighted, weighted],\n                            ['Unweighted', 'Magnitude-weighted']):\n    data.plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\n    ax.set_title(title, fontsize=13)\n    ax.set_xticks([])\n    ax.set_yticks([])\n    ax.set_xlabel('')\n    ax.set_ylabel('')\nplt.tight_layout()\nplt.show()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "9fdd26fb",
+   "source": "## Line density\n\n`line_density()` works the same way but for linear features. Each line segment is uniformly sampled and the samples are convolved with the kernel. The result shows where lines are concentrated, useful for things like road network analysis or fault-line mapping.\n\nInput is four arrays: start x, start y, end x, end y for each segment. The plot below shows a synthetic road grid.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "code",
+   "id": "eab23452",
+   "source": "# Synthetic road segments: a loose grid with some randomness\nroad_x1, road_y1, road_x2, road_y2 = [], [], [], []\n\n# Horizontal roads\nfor yy in np.linspace(1, 9, 6):\n    offset = rng.normal(0, 0.2)\n    road_x1.append(0.5)\n    road_y1.append(yy + offset)\n    road_x2.append(9.5)\n    road_y2.append(yy + rng.normal(0, 0.3))\n\n# Vertical roads (fewer in the east = lower density there)\nfor xx in np.linspace(1, 5, 6):\n    road_x1.append(xx + rng.normal(0, 0.1))\n    road_y1.append(0.5)\n    road_x2.append(xx + rng.normal(0, 0.1))\n    road_y2.append(9.5)\n\nfor xx in np.linspace(6, 9, 2):\n    road_x1.append(xx)\n    road_y1.append(0.5)\n    road_x2.append(xx)\n    road_y2.append(9.5)\n\nroad_x1 = np.array(road_x1)\nroad_y1 = np.array(road_y1)\nroad_x2 = np.array(road_x2)\nroad_y2 = np.array(road_y2)\n\nld = line_density(road_x1, road_y1, road_x2, road_y2,\n                  bandwidth=0.8,\n                  x_range=(0, 10), y_range=(0, 10),\n                  width=300, height=300)\n\nfig, ax = plt.subplots(figsize=(10, 7.5))\nld.plot.imshow(ax=ax, cmap='inferno', add_colorbar=False)\n# Draw the actual road segments on top\nfor i in range(len(road_x1)):\n    ax.plot([road_x1[i], road_x2[i]], [road_y1[i], road_y2[i]],\n            color='white', alpha=0.4, linewidth=0.8)\nax.set_title('')\nax.set_xlabel('')\nax.set_ylabel('')\nax.set_xticks([])\nax.set_yticks([])\nax.legend(handles=[Patch(facecolor='yellow', alpha=0.78, label='High road density'),\n                   Patch(facecolor='white', alpha=0.4, label='Road segments')],\n          loc='lower right', fontsize=11, framealpha=0.9)\nplt.show()",
+   "metadata": {},
+   "execution_count": null,
+   "outputs": []
+  },
+  {
+   "cell_type": "markdown",
+   "id": "43046e13",
+   "source": "The denser grid on the west side shows up clearly as a brighter band. Intersections where horizontal and vertical roads cross produce the brightest hotspots.",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "d1036afc",
+   "source": "<div class=\"alert alert-block alert-info\">\n<b>Template grids.</b> Both <code>kde()</code> and <code>line_density()</code> accept a <code>template</code> DataArray instead of <code>x_range</code>/<code>y_range</code>/<code>width</code>/<code>height</code>. The output will match the template's shape, extent, and coordinates. If the template is dask-backed, the output is computed lazily in chunks.\n</div>",
+   "metadata": {}
+  },
+  {
+   "cell_type": "markdown",
+   "id": "5dd32bc7",
+   "source": "### References\n\n- [Kernel density estimation](https://en.wikipedia.org/wiki/Kernel_density_estimation), Wikipedia\n- [Silverman, B.W. (1986). *Density Estimation for Statistics and Data Analysis*.](https://ned.ipac.caltech.edu/level5/March02/Silverman/paper.pdf) Chapman & Hall.\n- [Epanechnikov, V.A. (1969). Non-parametric estimation of a multivariate probability density.](https://doi.org/10.1137/1114019) *Theory of Probability & Its Applications*, 14(1), 153-158.",
+   "metadata": {}
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "name": "python",
+   "version": "3.10.0"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
\ No newline at end of file
diff --git a/examples/user_guide/images/kde_preview.png b/examples/user_guide/images/kde_preview.png
new file mode 100644
index 00000000..53bda77c
Binary files /dev/null and b/examples/user_guide/images/kde_preview.png differ
diff --git a/xrspatial/__init__.py b/xrspatial/__init__.py
index 1eff69a3..643f3ac5 100644
--- a/xrspatial/__init__.py
+++ b/xrspatial/__init__.py
@@ -35,6 +35,8 @@
 from xrspatial.interpolate import idw  # noqa
 from xrspatial.interpolate import kriging  # noqa
 from xrspatial.interpolate import spline  # noqa
+from xrspatial.kde import kde  # noqa
+from xrspatial.kde import line_density  # noqa
 from xrspatial.fire import burn_severity_class  # noqa
 from xrspatial.fire import dnbr  # noqa
 from xrspatial.fire import fireline_intensity  # noqa
diff --git a/xrspatial/kde.py b/xrspatial/kde.py
new file mode 100644
index 00000000..1f8a032b
--- /dev/null
+++ b/xrspatial/kde.py
@@ -0,0 +1,702 @@
+"""Kernel density estimation (KDE) for point-to-raster conversion.
+
+Produces a continuous density surface from point (or line) data.  Each
+output pixel accumulates weighted kernel contributions from all nearby
+features, yielding a smooth density field.
+
+Supports numpy, cupy, dask+numpy, and dask+cupy backends.
+"""
+from __future__ import annotations
+
+import math
+from math import pi, sqrt
+from functools import partial
+from typing import Optional, Tuple, Union
+
+import numpy as np
+import xarray as xr
+
+from xrspatial.utils import ngjit, has_cuda_and_cupy, has_dask_array
+
+try:
+    import cupy
+except ImportError:
+    cupy = None
+
+try:
+    import dask
+    import dask.array as da
+except ImportError:
+    da = None
+
+from numba import cuda
+
+
+# ---------------------------------------------------------------------------
+# Kernel constants
+# ---------------------------------------------------------------------------
+_KERNEL_NAMES = ('gaussian', 'epanechnikov', 'quartic')
+
+# Normalisation constants for 2-D product kernels.
+# Gaussian  : (1/(2pi))
+# Epanechnikov: (2/pi)  (product of (3/4)*(1-u^2) marginals, integrated)
+# Quartic   : (3/pi)    (product of (15/16)*(1-u^2)^2 marginals, integrated)
+_NORM_GAUSSIAN = 1.0 / (2.0 * pi)
+_NORM_EPANECHNIKOV = 2.0 / pi
+_NORM_QUARTIC = 3.0 / pi
+
+
+# ---------------------------------------------------------------------------
+# Bandwidth selection
+# ---------------------------------------------------------------------------
+
+def _silverman_bandwidth(x, y):
+    """Silverman's rule of thumb for 2-D data.
+
+    h = n^(-1/6) * mean(sigma_x, sigma_y)
+    where sigma uses the robust scale estimator min(std, IQR/1.34).
+    """
+    n = len(x)
+    if n < 2:
+        return 1.0
+
+    def _robust_scale(v):
+        s = float(np.std(v))
+        q75, q25 = float(np.percentile(v, 75)), float(np.percentile(v, 25))
+        iqr = (q75 - q25) / 1.34
+        return min(s, iqr) if iqr > 0 else s
+
+    sx = _robust_scale(x)
+    sy = _robust_scale(y)
+    sigma = (sx + sy) / 2.0
+    if sigma == 0:
+        sigma = 1.0
+    return sigma * (n ** (-1.0 / 6.0))
+
+
+# ---------------------------------------------------------------------------
+# CPU kernels (numba-jitted)
+# ---------------------------------------------------------------------------
+
+@ngjit
+def _kde_cpu(xs, ys, ws, out, x0, y0, dx, dy, bw, kernel_id):
+    """Populate *out* with kernel density values.
+
+    Parameters
+    ----------
+    xs, ys : 1-D float64 arrays of point coordinates.
+    ws : 1-D float64 array of weights (same length as xs).
+    out : 2-D float64 output array (rows, cols), pre-zeroed.
+    x0, y0 : origin (lower-left corner of first pixel centre).
+    dx, dy : pixel spacing in x and y.
+    bw : bandwidth (same units as coordinates).
+    kernel_id : 0 = gaussian, 1 = epanechnikov, 2 = quartic.
+    """
+    rows, cols = out.shape
+    n_pts = xs.shape[0]
+    inv_bw = 1.0 / bw
+    inv_bw2 = inv_bw * inv_bw
+
+    # Pre-compute normalisation
+    if kernel_id == 0:
+        norm = inv_bw2 / (2.0 * pi)
+    elif kernel_id == 1:
+        norm = inv_bw2 * 2.0 / pi
+    else:
+        norm = inv_bw2 * 3.0 / pi
+
+    # Cutoff radius in pixels for compact kernels.
+    # Gaussian uses 4*bw; compact kernels use exactly bw.
+    if kernel_id == 0:
+        cutoff = 4.0 * bw
+    else:
+        cutoff = bw
+
+    for p in range(n_pts):
+        px = xs[p]
+        py = ys[p]
+        w = ws[p]
+
+        # Pixel range that falls within the cutoff.
+        col_lo = int(max(0, (px - cutoff - x0) / dx))
+        col_hi = int(min(cols - 1, (px + cutoff - x0) / dx)) + 1
+        row_lo = int(max(0, (py - cutoff - y0) / dy))
+        row_hi = int(min(rows - 1, (py + cutoff - y0) / dy)) + 1
+
+        for r in range(row_lo, row_hi):
+            cy = y0 + r * dy
+            uy = (cy - py) * inv_bw
+            uy2 = uy * uy
+            for c in range(col_lo, col_hi):
+                cx = x0 + c * dx
+                ux = (cx - px) * inv_bw
+                u2 = ux * ux + uy2
+
+                if kernel_id == 0:
+                    # Gaussian
+                    val = norm * w * np.exp(-0.5 * u2)
+                elif kernel_id == 1:
+                    # Epanechnikov
+                    if u2 <= 1.0:
+                        val = norm * w * (1.0 - u2)
+                    else:
+                        val = 0.0
+                else:
+                    # Quartic
+                    if u2 <= 1.0:
+                        t = 1.0 - u2
+                        val = norm * w * t * t
+                    else:
+                        val = 0.0
+
+                out[r, c] += val
+
+
+@ngjit
+def _line_density_cpu(x1s, y1s, x2s, y2s, ws, out,
+                      x0, y0, dx, dy, bw, kernel_id):
+    """Compute line density on *out*.
+
+    Each line segment is sampled at sub-segment intervals (step = bw/4)
+    and each sample acts as a weighted point where the weight is
+    proportional to the step length.
+    """
+    rows, cols = out.shape
+    n_segs = x1s.shape[0]
+    inv_bw = 1.0 / bw
+    inv_bw2 = inv_bw * inv_bw
+
+    if kernel_id == 0:
+        norm = inv_bw2 / (2.0 * pi)
+    elif kernel_id == 1:
+        norm = inv_bw2 * 2.0 / pi
+    else:
+        norm = inv_bw2 * 3.0 / pi
+
+    if kernel_id == 0:
+        cutoff = 4.0 * bw
+    else:
+        cutoff = bw
+
+    step = bw / 4.0
+    if step < 1e-12:
+        return
+
+    for s in range(n_segs):
+        ax = x1s[s]
+        ay = y1s[s]
+        bx = x2s[s]
+        by = y2s[s]
+        seg_len = sqrt((bx - ax) ** 2 + (by - ay) ** 2)
+        if seg_len < 1e-12:
+            continue
+
+        w = ws[s]
+        n_steps = max(1, int(seg_len / step))
+        sub_w = w * (seg_len / n_steps)
+
+        for i in range(n_steps):
+            t = (i + 0.5) / n_steps
+            px = ax + t * (bx - ax)
+            py = ay + t * (by - ay)
+
+            col_lo = int(max(0, (px - cutoff - x0) / dx))
+            col_hi = int(min(cols - 1, (px + cutoff - x0) / dx)) + 1
+            row_lo = int(max(0, (py - cutoff - y0) / dy))
+            row_hi = int(min(rows - 1, (py + cutoff - y0) / dy)) + 1
+
+            for r in range(row_lo, row_hi):
+                cy = y0 + r * dy
+                uy = (cy - py) * inv_bw
+                uy2 = uy * uy
+                for c in range(col_lo, col_hi):
+                    cx = x0 + c * dx
+                    ux = (cx - px) * inv_bw
+                    u2 = ux * ux + uy2
+
+                    if kernel_id == 0:
+                        val = norm * sub_w * np.exp(-0.5 * u2)
+                    elif kernel_id == 1:
+                        if u2 <= 1.0:
+                            val = norm * sub_w * (1.0 - u2)
+                        else:
+                            val = 0.0
+                    else:
+                        if u2 <= 1.0:
+                            tt = 1.0 - u2
+                            val = norm * sub_w * tt * tt
+                        else:
+                            val = 0.0
+
+                    out[r, c] += val
+
+
+# ---------------------------------------------------------------------------
+# GPU kernels (CUDA)
+# ---------------------------------------------------------------------------
+
+@cuda.jit
+def _kde_cuda(xs, ys, ws, out, x0, y0, dx, dy, bw, kernel_id, n_pts):
+    """Each thread computes one output pixel."""
+    r, c = cuda.grid(2)
+    rows = out.shape[0]
+    cols = out.shape[1]
+    if r >= rows or c >= cols:
+        return
+
+    cx = x0[0] + c * dx[0]
+    cy = y0[0] + r * dy[0]
+    inv_bw = 1.0 / bw[0]
+    inv_bw2 = inv_bw * inv_bw
+    kid = kernel_id[0]
+
+    if kid == 0:
+        norm = inv_bw2 / (2.0 * 3.141592653589793)
+    elif kid == 1:
+        norm = inv_bw2 * 2.0 / 3.141592653589793
+    else:
+        norm = inv_bw2 * 3.0 / 3.141592653589793
+
+    total = 0.0
+    for p in range(n_pts[0]):
+        ux = (cx - xs[p]) * inv_bw
+        uy = (cy - ys[p]) * inv_bw
+        u2 = ux * ux + uy * uy
+
+        if kid == 0:
+            # No hard cutoff; exp decays fast enough and each thread
+            # loops independently so the extra iterations are cheap.
+            total += norm * ws[p] * math.exp(-0.5 * u2)
+        elif kid == 1:
+            if u2 <= 1.0:
+                total += norm * ws[p] * (1.0 - u2)
+        else:
+            if u2 <= 1.0:
+                t = 1.0 - u2
+                total += norm * ws[p] * t * t
+
+    out[r, c] = total
+
+
+# ---------------------------------------------------------------------------
+# Backend wrappers
+# ---------------------------------------------------------------------------
+
+def _kernel_id(kernel: str) -> int:
+    if kernel == 'gaussian':
+        return 0
+    elif kernel == 'epanechnikov':
+        return 1
+    elif kernel == 'quartic':
+        return 2
+    raise ValueError(
+        f"kernel must be one of {_KERNEL_NAMES}, got {kernel!r}"
+    )
+
+
+def _run_kde_numpy(xs, ys, ws, shape, x0, y0, dx, dy, bw, kernel_id):
+    out = np.zeros(shape, dtype=np.float64)
+    _kde_cpu(xs, ys, ws, out, x0, y0, dx, dy, bw, kernel_id)
+    return out
+
+
+def _run_kde_cupy(xs, ys, ws, shape, x0, y0, dx, dy, bw, kernel_id):
+    out = cupy.zeros(shape, dtype=cupy.float64)
+    n_pts = cupy.array([len(xs)], dtype=cupy.int64)
+    x0_d = cupy.array([x0], dtype=cupy.float64)
+    y0_d = cupy.array([y0], dtype=cupy.float64)
+    dx_d = cupy.array([dx], dtype=cupy.float64)
+    dy_d = cupy.array([dy], dtype=cupy.float64)
+    bw_d = cupy.array([bw], dtype=cupy.float64)
+    kid_d = cupy.array([kernel_id], dtype=cupy.int64)
+
+    xs_d = cupy.asarray(xs, dtype=cupy.float64)
+    ys_d = cupy.asarray(ys, dtype=cupy.float64)
+    ws_d = cupy.asarray(ws, dtype=cupy.float64)
+
+    tpb = (16, 16)
+    bpg = (
+        (shape[0] + tpb[0] - 1) // tpb[0],
+        (shape[1] + tpb[1] - 1) // tpb[1],
+    )
+    _kde_cuda[bpg, tpb](xs_d, ys_d, ws_d, out,
+                         x0_d, y0_d, dx_d, dy_d, bw_d, kid_d, n_pts)
+    return out
+
+
+def _run_kde_dask_numpy(xs, ys, ws, shape, x0, y0, dx, dy, bw, kernel_id,
+                        chunks):
+    """Dask-backed KDE: each chunk computes its own tile independently."""
+    # Determine chunk layout
+    if chunks is None:
+        chunks = (min(256, shape[0]), min(256, shape[1]))
+    row_splits = _split_sizes(shape[0], chunks[0])
+    col_splits = _split_sizes(shape[1], chunks[1])
+
+    blocks = []
+    row_off = 0
+    for rs in row_splits:
+        row_blocks = []
+        col_off = 0
+        for cs in col_splits:
+            tile_y0 = y0 + row_off * dy
+            tile_x0 = x0 + col_off * dx
+            tile_shape = (rs, cs)
+            block = dask.delayed(_run_kde_numpy)(
+                xs, ys, ws, tile_shape,
+                tile_x0, tile_y0, dx, dy, bw, kernel_id,
+            )
+            row_blocks.append(
+                da.from_delayed(block, shape=tile_shape, dtype=np.float64)
+            )
+            col_off += cs
+        blocks.append(row_blocks)
+        row_off += rs
+
+    return da.block(blocks)
+
+
+def _run_kde_dask_cupy(xs, ys, ws, shape, x0, y0, dx, dy, bw, kernel_id,
+                       chunks):
+    """Dask+CuPy KDE: each chunk uses the GPU kernel."""
+    if chunks is None:
+        chunks = (min(256, shape[0]), min(256, shape[1]))
+    row_splits = _split_sizes(shape[0], chunks[0])
+    col_splits = _split_sizes(shape[1], chunks[1])
+
+    blocks = []
+    row_off = 0
+    for rs in row_splits:
+        row_blocks = []
+        col_off = 0
+        for cs in col_splits:
+            tile_y0 = y0 + row_off * dy
+            tile_x0 = x0 + col_off * dx
+            tile_shape = (rs, cs)
+            block = dask.delayed(_run_kde_cupy)(
+                xs, ys, ws, tile_shape,
+                tile_x0, tile_y0, dx, dy, bw, kernel_id,
+            )
+            row_blocks.append(
+                da.from_delayed(block, shape=tile_shape,
+                                dtype=np.float64,
+                                meta=cupy.array((), dtype=cupy.float64))
+            )
+            col_off += cs
+        blocks.append(row_blocks)
+        row_off += rs
+
+    return da.block(blocks)
+
+
+def _split_sizes(total, chunk):
+    """Return a list of chunk sizes that sum to *total*."""
+    full, rem = divmod(total, chunk)
+    sizes = [chunk] * full
+    if rem:
+        sizes.append(rem)
+    return sizes
+
+
+# ---------------------------------------------------------------------------
+# Public API -- kde
+# ---------------------------------------------------------------------------
+
+def kde(
+    x: Union[np.ndarray, list],
+    y: Union[np.ndarray, list],
+    *,
+    weights: Optional[Union[np.ndarray, list]] = None,
+    bandwidth: Union[float, str] = 'silverman',
+    kernel: str = 'gaussian',
+    template: Optional[xr.DataArray] = None,
+    x_range: Optional[Tuple[float, float]] = None,
+    y_range: Optional[Tuple[float, float]] = None,
+    width: int = 256,
+    height: int = 256,
+    name: str = 'kde',
+) -> xr.DataArray:
+    """Compute 2-D kernel density estimation from point data.
+
+    Each output pixel accumulates weighted kernel contributions from all
+    input points, producing a smooth continuous density surface.
+
+    Parameters
+    ----------
+    x, y : array-like
+        1-D arrays of point coordinates.
+    weights : array-like, optional
+        Per-point weights.  Defaults to uniform weights of 1.
+    bandwidth : float or ``'silverman'``
+        Kernel bandwidth in the same units as *x*/*y*.
+        ``'silverman'`` (default) uses Silverman's rule of thumb.
+    kernel : ``{'gaussian', 'epanechnikov', 'quartic'}``
+        Kernel shape.
+    template : xr.DataArray, optional
+        If provided, the output matches this array's shape, extent, and
+        coordinates.  *x_range*, *y_range*, *width*, and *height* are
+        ignored when *template* is given.
+    x_range, y_range : (min, max), optional
+        Spatial extent of the output grid.  Defaults to the data extent
+        with 10 %% padding on each side.
+    width, height : int
+        Number of columns and rows in the output grid.  Ignored when
+        *template* is provided.
+    name : str
+        Name of the output DataArray.
+
+    Returns
+    -------
+    xr.DataArray
+        2-D density surface.
+    """
+    # -- Validate and coerce inputs ----------------------------------------
+    x_arr = np.asarray(x, dtype=np.float64).ravel()
+    y_arr = np.asarray(y, dtype=np.float64).ravel()
+    if x_arr.shape[0] != y_arr.shape[0]:
+        raise ValueError("x and y must have the same length")
+    n = x_arr.shape[0]
+
+    if weights is not None:
+        w_arr = np.asarray(weights, dtype=np.float64).ravel()
+        if w_arr.shape[0] != n:
+            raise ValueError("weights must have the same length as x and y")
+    else:
+        w_arr = np.ones(n, dtype=np.float64)
+
+    kid = _kernel_id(kernel)
+
+    # -- Bandwidth ---------------------------------------------------------
+    if isinstance(bandwidth, str):
+        if bandwidth != 'silverman':
+            raise ValueError(
+                "bandwidth must be a positive number or 'silverman', "
+                f"got {bandwidth!r}"
+            )
+        bw = _silverman_bandwidth(x_arr, y_arr)
+    else:
+        bw = float(bandwidth)
+        if bw <= 0:
+            raise ValueError(f"bandwidth must be positive, got {bw}")
+
+    # -- Output grid -------------------------------------------------------
+    if template is not None:
+        _validate_template(template)
+        y_coords = template.coords[template.dims[0]].values
+        x_coords = template.coords[template.dims[1]].values
+        rows, cols = template.shape
+        # Pixel spacing
+        dy = float(y_coords[1] - y_coords[0]) if rows > 1 else 1.0
+        dx = float(x_coords[1] - x_coords[0]) if cols > 1 else 1.0
+        x0 = float(x_coords[0])
+        y0 = float(y_coords[0])
+        use_dask = has_dask_array() and isinstance(template.data, da.Array)
+        use_cupy = (has_cuda_and_cupy() and cupy is not None
+                    and _is_cupy_backed(template))
+        out_chunks = template.data.chunksize if use_dask else None
+    else:
+        if x_range is None:
+            pad = max(bw, (float(x_arr.max()) - float(x_arr.min())) * 0.1)
+            x_range = (float(x_arr.min()) - pad, float(x_arr.max()) + pad)
+        if y_range is None:
+            pad = max(bw, (float(y_arr.max()) - float(y_arr.min())) * 0.1)
+            y_range = (float(y_arr.min()) - pad, float(y_arr.max()) + pad)
+        rows, cols = height, width
+        dx = (x_range[1] - x_range[0]) / max(cols - 1, 1)
+        dy = (y_range[1] - y_range[0]) / max(rows - 1, 1)
+        x0 = x_range[0]
+        y0 = y_range[0]
+        x_coords = np.linspace(x_range[0], x_range[1], cols)
+        y_coords = np.linspace(y_range[0], y_range[1], rows)
+        use_dask = False
+        use_cupy = False
+        out_chunks = None
+
+    shape = (rows, cols)
+
+    # -- Dispatch -----------------------------------------------------------
+    if use_dask and use_cupy:
+        data = _run_kde_dask_cupy(
+            x_arr, y_arr, w_arr, shape, x0, y0, dx, dy, bw, kid, out_chunks,
+        )
+    elif use_dask:
+        data = _run_kde_dask_numpy(
+            x_arr, y_arr, w_arr, shape, x0, y0, dx, dy, bw, kid, out_chunks,
+        )
+    elif use_cupy:
+        data = _run_kde_cupy(
+            x_arr, y_arr, w_arr, shape, x0, y0, dx, dy, bw, kid,
+        )
+    else:
+        data = _run_kde_numpy(
+            x_arr, y_arr, w_arr, shape, x0, y0, dx, dy, bw, kid,
+        )
+
+    # -- Build output DataArray --------------------------------------------
+    if template is not None:
+        return xr.DataArray(
+            data, name=name,
+            coords=template.coords, dims=template.dims,
+            attrs=template.attrs,
+        )
+    return xr.DataArray(
+        data, name=name,
+        dims=['y', 'x'],
+        coords={'y': y_coords, 'x': x_coords},
+    )
+
+
+# ---------------------------------------------------------------------------
+# Public API -- line_density
+# ---------------------------------------------------------------------------
+
+def line_density(
+    x1: Union[np.ndarray, list],
+    y1: Union[np.ndarray, list],
+    x2: Union[np.ndarray, list],
+    y2: Union[np.ndarray, list],
+    *,
+    weights: Optional[Union[np.ndarray, list]] = None,
+    bandwidth: Union[float, str] = 'silverman',
+    kernel: str = 'gaussian',
+    template: Optional[xr.DataArray] = None,
+    x_range: Optional[Tuple[float, float]] = None,
+    y_range: Optional[Tuple[float, float]] = None,
+    width: int = 256,
+    height: int = 256,
+    name: str = 'line_density',
+) -> xr.DataArray:
+    """Compute line density from line-segment data.
+
+    Each segment is uniformly sampled and the samples are convolved with
+    the chosen kernel, producing a smooth density surface that represents
+    the concentration of linear features.
+
+    Parameters
+    ----------
+    x1, y1, x2, y2 : array-like
+        Start and end coordinates of each line segment.
+    weights : array-like, optional
+        Per-segment weights.  Defaults to uniform weights of 1.
+    bandwidth : float or ``'silverman'``
+        Kernel bandwidth.  ``'silverman'`` uses an automatic estimate
+        based on all segment endpoints.
+    kernel : ``{'gaussian', 'epanechnikov', 'quartic'}``
+        Kernel shape.
+    template : xr.DataArray, optional
+        Output grid specification (same as :func:`kde`).
+    x_range, y_range : (min, max), optional
+        Spatial extent.
+    width, height : int
+        Grid dimensions.
+    name : str
+        Name of the output DataArray.
+
+    Returns
+    -------
+    xr.DataArray
+        2-D line-density surface.
+    """
+    x1a = np.asarray(x1, dtype=np.float64).ravel()
+    y1a = np.asarray(y1, dtype=np.float64).ravel()
+    x2a = np.asarray(x2, dtype=np.float64).ravel()
+    y2a = np.asarray(y2, dtype=np.float64).ravel()
+    n = x1a.shape[0]
+    if not (y1a.shape[0] == n and x2a.shape[0] == n and y2a.shape[0] == n):
+        raise ValueError("x1, y1, x2, y2 must all have the same length")
+
+    if weights is not None:
+        w_arr = np.asarray(weights, dtype=np.float64).ravel()
+        if w_arr.shape[0] != n:
+            raise ValueError(
+                "weights must have the same length as the segment arrays"
+            )
+    else:
+        w_arr = np.ones(n, dtype=np.float64)
+
+    kid = _kernel_id(kernel)
+
+    # Bandwidth from all endpoints
+    all_x = np.concatenate([x1a, x2a])
+    all_y = np.concatenate([y1a, y2a])
+
+    if isinstance(bandwidth, str):
+        if bandwidth != 'silverman':
+            raise ValueError(
+                "bandwidth must be a positive number or 'silverman', "
+                f"got {bandwidth!r}"
+            )
+        bw = _silverman_bandwidth(all_x, all_y)
+    else:
+        bw = float(bandwidth)
+        if bw <= 0:
+            raise ValueError(f"bandwidth must be positive, got {bw}")
+
+    # Grid
+    if template is not None:
+        _validate_template(template)
+        y_coords = template.coords[template.dims[0]].values
+        x_coords = template.coords[template.dims[1]].values
+        rows, cols = template.shape
+        dy = float(y_coords[1] - y_coords[0]) if rows > 1 else 1.0
+        dx = float(x_coords[1] - x_coords[0]) if cols > 1 else 1.0
+        x0 = float(x_coords[0])
+        y0 = float(y_coords[0])
+    else:
+        if x_range is None:
+            pad = max(bw, (float(all_x.max()) - float(all_x.min())) * 0.1)
+            x_range = (float(all_x.min()) - pad, float(all_x.max()) + pad)
+        if y_range is None:
+            pad = max(bw, (float(all_y.max()) - float(all_y.min())) * 0.1)
+            y_range = (float(all_y.min()) - pad, float(all_y.max()) + pad)
+        rows, cols = height, width
+        dx = (x_range[1] - x_range[0]) / max(cols - 1, 1)
+        dy = (y_range[1] - y_range[0]) / max(rows - 1, 1)
+        x0 = x_range[0]
+        y0 = y_range[0]
+        x_coords = np.linspace(x_range[0], x_range[1], cols)
+        y_coords = np.linspace(y_range[0], y_range[1], rows)
+
+    shape = (rows, cols)
+    out = np.zeros(shape, dtype=np.float64)
+    _line_density_cpu(x1a, y1a, x2a, y2a, w_arr, out,
+                      x0, y0, dx, dy, bw, kid)
+
+    if template is not None:
+        return xr.DataArray(
+            out, name=name,
+            coords=template.coords, dims=template.dims,
+            attrs=template.attrs,
+        )
+    return xr.DataArray(
+        out, name=name,
+        dims=['y', 'x'],
+        coords={'y': y_coords, 'x': x_coords},
+    )
+
+
+# ---------------------------------------------------------------------------
+# Helpers
+# ---------------------------------------------------------------------------
+
+def _validate_template(template):
+    if not isinstance(template, xr.DataArray):
+        raise TypeError(
+            "template must be an xr.DataArray, "
+            f"got {type(template).__qualname__}"
+        )
+    if template.ndim != 2:
+        raise ValueError(
+            f"template must be 2-D, got {template.ndim}-D"
+        )
+
+
+def _is_cupy_backed(agg):
+    """Check if a DataArray is backed by cupy (plain or via dask)."""
+    try:
+        meta = agg.data._meta
+        return type(meta).__module__.split('.')[0] == 'cupy'
+    except AttributeError:
+        if cupy is not None:
+            return isinstance(agg.data, cupy.ndarray)
+    return False
diff --git a/xrspatial/tests/test_kde.py b/xrspatial/tests/test_kde.py
new file mode 100644
index 00000000..7281a93d
--- /dev/null
+++ b/xrspatial/tests/test_kde.py
@@ -0,0 +1,304 @@
+"""Tests for xrspatial.kde (kernel density estimation)."""
+from __future__ import annotations
+
+import numpy as np
+import pytest
+import xarray as xr
+
+from xrspatial.kde import kde, line_density, _silverman_bandwidth
+
+from xrspatial.tests.general_checks import (
+    dask_array_available,
+    cuda_and_cupy_available,
+)
+
+try:
+    import dask.array as da
+except ImportError:
+    da = None
+
+
+# ---------------------------------------------------------------------------
+# Fixtures
+# ---------------------------------------------------------------------------
+
+@pytest.fixture
+def point_cluster():
+    """Tight cluster of points at the origin -- easy to validate."""
+    rng = np.random.default_rng(1143)
+    x = rng.normal(0, 1, 200)
+    y = rng.normal(0, 1, 200)
+    return x, y
+
+
+@pytest.fixture
+def simple_grid():
+    """Small 16x16 template grid centred on the origin."""
+    return xr.DataArray(
+        np.zeros((16, 16), dtype=np.float64),
+        dims=['y', 'x'],
+        coords={
+            'y': np.linspace(-4, 4, 16),
+            'x': np.linspace(-4, 4, 16),
+        },
+    )
+
+
+# ---------------------------------------------------------------------------
+# Basic correctness
+# ---------------------------------------------------------------------------
+
+class TestKDEBasic:
+    """Core KDE functionality on numpy arrays."""
+
+    def test_output_shape_from_width_height(self, point_cluster):
+        x, y = point_cluster
+        result = kde(x, y, bandwidth=1.0, width=20, height=30)
+        assert result.shape == (30, 20)
+        assert result.dims == ('y', 'x')
+
+    def test_output_shape_from_template(self, point_cluster, simple_grid):
+        x, y = point_cluster
+        result = kde(x, y, bandwidth=1.0, template=simple_grid)
+        assert result.shape == simple_grid.shape
+        np.testing.assert_array_equal(result.coords['y'], simple_grid.coords['y'])
+        np.testing.assert_array_equal(result.coords['x'], simple_grid.coords['x'])
+
+    def test_density_positive(self, point_cluster):
+        x, y = point_cluster
+        result = kde(x, y, bandwidth=1.0, width=16, height=16)
+        assert float(result.min()) >= 0.0
+        assert float(result.sum()) > 0.0
+
+    def test_peak_near_data_centre(self, point_cluster, simple_grid):
+        """Density peak should be near (0, 0) for a standard-normal cluster."""
+        x, y = point_cluster
+        result = kde(x, y, bandwidth=1.0, template=simple_grid)
+        flat_idx = int(result.values.argmax())
+        peak_row, peak_col = divmod(flat_idx, result.shape[1])
+        y_peak = float(result.coords['y'].values[peak_row])
+        x_peak = float(result.coords['x'].values[peak_col])
+        assert abs(x_peak) < 2.0
+        assert abs(y_peak) < 2.0
+
+    def test_name_propagated(self, point_cluster):
+        x, y = point_cluster
+        result = kde(x, y, bandwidth=1.0, name='density', width=8, height=8)
+        assert result.name == 'density'
+
+
+class TestKernelTypes:
+    """Each kernel produces valid, non-zero output."""
+
+    @pytest.mark.parametrize('kernel', ['gaussian', 'epanechnikov', 'quartic'])
+    def test_kernel_produces_output(self, point_cluster, kernel):
+        x, y = point_cluster
+        result = kde(x, y, bandwidth=1.0, kernel=kernel, width=16, height=16)
+        assert float(result.sum()) > 0.0
+
+    def test_compact_kernels_are_zero_outside_bandwidth(self):
+        """A single point at origin, quartic kernel: pixels beyond bw should be 0."""
+        result = kde(
+            [0.0], [0.0], bandwidth=1.0, kernel='quartic',
+            x_range=(-3, 3), y_range=(-3, 3), width=32, height=32,
+        )
+        xs = result.coords['x'].values
+        ys = result.coords['y'].values
+        for r in range(result.shape[0]):
+            for c in range(result.shape[1]):
+                dist2 = xs[c] ** 2 + ys[r] ** 2
+                if dist2 > 1.0:
+                    assert result.values[r, c] == 0.0
+
+    def test_gaussian_nonzero_everywhere(self):
+        """Gaussian kernel should have non-zero density everywhere (within cutoff)."""
+        result = kde(
+            [0.0], [0.0], bandwidth=1.0, kernel='gaussian',
+            x_range=(-2, 2), y_range=(-2, 2), width=8, height=8,
+        )
+        assert float(result.min()) > 0.0
+
+
+class TestBandwidth:
+    """Bandwidth selection and validation."""
+
+    def test_silverman_auto(self, point_cluster):
+        x, y = point_cluster
+        bw = _silverman_bandwidth(x, y)
+        assert bw > 0
+
+    def test_silverman_keyword(self, point_cluster):
+        x, y = point_cluster
+        result = kde(x, y, bandwidth='silverman', width=8, height=8)
+        assert float(result.sum()) > 0.0
+
+    def test_wider_bandwidth_smoother(self, point_cluster, simple_grid):
+        """Wider bandwidth should produce a smoother (lower peak) surface."""
+        x, y = point_cluster
+        narrow = kde(x, y, bandwidth=0.5, template=simple_grid)
+        wide = kde(x, y, bandwidth=2.0, template=simple_grid)
+        assert float(narrow.max()) > float(wide.max())
+
+    def test_negative_bandwidth_raises(self, point_cluster):
+        x, y = point_cluster
+        with pytest.raises(ValueError, match='positive'):
+            kde(x, y, bandwidth=-1.0, width=8, height=8)
+
+    def test_invalid_bandwidth_string_raises(self, point_cluster):
+        x, y = point_cluster
+        with pytest.raises(ValueError, match='silverman'):
+            kde(x, y, bandwidth='invalid', width=8, height=8)
+
+
+class TestWeights:
+    """Weighted KDE."""
+
+    def test_double_weights_double_density(self, simple_grid):
+        x = np.array([0.0, 1.0, -1.0])
+        y = np.array([0.0, 1.0, -1.0])
+        r1 = kde(x, y, weights=[1, 1, 1], bandwidth=1.0, template=simple_grid)
+        r2 = kde(x, y, weights=[2, 2, 2], bandwidth=1.0, template=simple_grid)
+        np.testing.assert_allclose(r2.values, r1.values * 2.0, rtol=1e-12)
+
+    def test_weight_length_mismatch_raises(self):
+        with pytest.raises(ValueError, match='same length'):
+            kde([0, 1], [0, 1], weights=[1], bandwidth=1.0, width=4, height=4)
+
+
+# ---------------------------------------------------------------------------
+# Edge cases
+# ---------------------------------------------------------------------------
+
+class TestEdgeCases:
+
+    def test_single_point(self):
+        result = kde([5.0], [5.0], bandwidth=1.0, width=8, height=8)
+        assert result.shape == (8, 8)
+        assert float(result.sum()) > 0.0
+
+    def test_two_identical_points(self):
+        result = kde([0.0, 0.0], [0.0, 0.0], bandwidth=1.0, width=8, height=8)
+        assert float(result.sum()) > 0.0
+
+    def test_invalid_kernel_raises(self):
+        with pytest.raises(ValueError, match='kernel must be'):
+            kde([0], [0], kernel='unknown', bandwidth=1.0, width=4, height=4)
+
+    def test_mismatched_xy_raises(self):
+        with pytest.raises(ValueError, match='same length'):
+            kde([0, 1], [0], bandwidth=1.0, width=4, height=4)
+
+    def test_template_not_dataarray_raises(self):
+        with pytest.raises(TypeError, match='xr.DataArray'):
+            kde([0], [0], bandwidth=1.0, template=np.zeros((4, 4)))
+
+    def test_template_not_2d_raises(self):
+        t = xr.DataArray(np.zeros((4, 4, 4)), dims=['z', 'y', 'x'])
+        with pytest.raises(ValueError, match='2-D'):
+            kde([0], [0], bandwidth=1.0, template=t)
+
+
+# ---------------------------------------------------------------------------
+# Line density
+# ---------------------------------------------------------------------------
+
+class TestLineDensity:
+
+    def test_basic_line(self):
+        result = line_density(
+            [0.0], [0.0], [1.0], [1.0],
+            bandwidth=0.5, width=16, height=16,
+        )
+        assert result.shape == (16, 16)
+        assert float(result.sum()) > 0.0
+
+    def test_name(self):
+        result = line_density(
+            [0], [0], [1], [1],
+            bandwidth=0.5, name='roads', width=8, height=8,
+        )
+        assert result.name == 'roads'
+
+    def test_parallel_lines_higher_density_between(self):
+        """Two parallel horizontal lines: density should be higher between them."""
+        x1 = np.array([0, 0])
+        y1 = np.array([-1, 1])
+        x2 = np.array([2, 2])
+        y2 = np.array([-1, 1])
+        result = line_density(
+            x1, y1, x2, y2,
+            bandwidth=1.0,
+            x_range=(-1, 3), y_range=(-3, 3),
+            width=16, height=32,
+        )
+        # Centre row should have higher density than edge rows
+        mid_row = result.shape[0] // 2
+        centre_density = float(result.values[mid_row, :].mean())
+        edge_density = float(result.values[0, :].mean())
+        assert centre_density > edge_density
+
+    def test_mismatched_arrays_raises(self):
+        with pytest.raises(ValueError, match='same length'):
+            line_density([0, 1], [0], [1], [1], bandwidth=1.0, width=4, height=4)
+
+    def test_weighted_lines(self):
+        r1 = line_density([0], [0], [1], [1], weights=[1],
+                          bandwidth=0.5, width=8, height=8)
+        r2 = line_density([0], [0], [1], [1], weights=[3],
+                          bandwidth=0.5, width=8, height=8)
+        np.testing.assert_allclose(r2.values, r1.values * 3.0, rtol=1e-10)
+
+
+# ---------------------------------------------------------------------------
+# Cross-backend parity
+# ---------------------------------------------------------------------------
+
+@dask_array_available
+class TestDaskParity:
+    """Dask+numpy results should match pure numpy."""
+
+    def _make_dask_template(self, simple_grid, chunks=(8, 8)):
+        return simple_grid.copy(data=da.from_array(simple_grid.values, chunks=chunks))
+
+    @pytest.mark.parametrize('kernel', ['gaussian', 'epanechnikov', 'quartic'])
+    def test_dask_matches_numpy(self, point_cluster, simple_grid, kernel):
+        x, y = point_cluster
+        np_result = kde(x, y, bandwidth=1.0, kernel=kernel, template=simple_grid)
+        dask_template = self._make_dask_template(simple_grid)
+        dask_result = kde(x, y, bandwidth=1.0, kernel=kernel, template=dask_template)
+        assert isinstance(dask_result.data, da.Array)
+        np.testing.assert_allclose(
+            dask_result.values, np_result.values, rtol=1e-4,
+        )
+
+    def test_compact_kernel_exact_match(self, point_cluster, simple_grid):
+        """Compact kernels should match exactly since there are no cutoff boundary effects."""
+        x, y = point_cluster
+        np_result = kde(x, y, bandwidth=1.0, kernel='quartic', template=simple_grid)
+        dask_template = self._make_dask_template(simple_grid)
+        dask_result = kde(x, y, bandwidth=1.0, kernel='quartic', template=dask_template)
+        np.testing.assert_allclose(
+            dask_result.values, np_result.values, rtol=1e-12,
+        )
+
+
+@cuda_and_cupy_available
+class TestCuPyParity:
+    """CuPy results should match numpy."""
+
+    def _make_cupy_template(self, simple_grid):
+        import cupy
+        return simple_grid.copy(data=cupy.asarray(simple_grid.values))
+
+    @pytest.mark.parametrize('kernel', ['gaussian', 'epanechnikov', 'quartic'])
+    def test_cupy_matches_numpy(self, point_cluster, simple_grid, kernel):
+        x, y = point_cluster
+        np_result = kde(x, y, bandwidth=1.0, kernel=kernel, template=simple_grid)
+        cupy_template = self._make_cupy_template(simple_grid)
+        cupy_result = kde(x, y, bandwidth=1.0, kernel=kernel, template=cupy_template)
+        import cupy
+        result_np = cupy_result.data.get()
+        # Gaussian: CPU uses a 4*bw box cutoff, GPU sums all points.
+        # Compact kernels match exactly.
+        tol = 1e-2 if kernel == 'gaussian' else 1e-6
+        np.testing.assert_allclose(result_np, np_result.values, rtol=tol)
diff --git a/xrspatial/viewshed.py b/xrspatial/viewshed.py
index 96b095f6..0e282052 100644
--- a/xrspatial/viewshed.py
+++ b/xrspatial/viewshed.py
@@ -1560,7 +1560,7 @@ def _viewshed_cpu(
     num_events = 3 * (n_rows * n_cols - 1)
     event_list = np.zeros((num_events, 7), dtype=np.float64)
 
-    raster.data = raster.data.astype(np.float64)
+    raster.data = raster.data.astype(np.float64, copy=False)
 
     _init_event_list(event_list=event_list, raster=raster.data,
                      vp_row=viewpoint_row, vp_col=viewpoint_col,
@@ -2167,7 +2167,9 @@ def _viewshed_dask(raster, x, y, observer_elev, target_elev):
     cupy_backed = is_dask_cupy(raster)
 
     # --- Tier B: full grid fits in memory → compute and run exact algo ---
-    r2_bytes = 280 * height * width
+    # Peak memory: event_list sort needs 2x 168*H*W + raster 8*H*W +
+    # visibility_grid 8*H*W ≈ 360 bytes/pixel, plus the computed raster.
+    r2_bytes = 360 * height * width + 8 * height * width  # working + raster
     avail = _available_memory_bytes()
     if r2_bytes < 0.5 * avail:
         raster_mem = raster.copy()
diff --git a/xrspatial/visibility.py b/xrspatial/visibility.py
index ab4de453..65139ba0 100644
--- a/xrspatial/visibility.py
+++ b/xrspatial/visibility.py
@@ -61,7 +61,9 @@ def _extract_transect(raster, cells):
     if has_dask_array():
         import dask.array as da
         if isinstance(data, da.Array):
-            data = data.compute()
+            # Only compute the needed cells, not the entire array
+            elevations = data.vindex[rows, cols].compute().astype(np.float64)
+            return elevations, x_coords, y_coords
     if has_cuda_and_cupy() and is_cupy_array(data):
         data = data.get()
 
@@ -217,7 +219,16 @@ def cumulative_viewshed(
     if not observers:
         raise ValueError("observers list must not be empty")
 
-    count = np.zeros(raster.shape, dtype=np.int32)
+    # Detect dask backend to keep accumulation lazy
+    _is_dask = False
+    if has_dask_array():
+        import dask.array as da
+        _is_dask = isinstance(raster.data, da.Array)
+
+    if _is_dask:
+        count = da.zeros(raster.shape, dtype=np.int32, chunks=raster.data.chunks)
+    else:
+        count = np.zeros(raster.shape, dtype=np.int32)
 
     for obs in observers:
         ox = obs['x']
@@ -229,11 +240,17 @@ def cumulative_viewshed(
         vs = viewshed(raster, x=ox, y=oy, observer_elev=oe,
                       target_elev=te, max_distance=md)
 
-        vs_np = vs.values
-        count += (vs_np != INVISIBLE).astype(np.int32)
-
-    return xarray.DataArray(count, coords=raster.coords,
-                            dims=raster.dims, attrs=raster.attrs)
+        vs_data = vs.data
+        if _is_dask and not isinstance(vs_data, da.Array):
+            vs_data = da.from_array(vs_data, chunks=raster.data.chunks)
+        count = count + (vs_data != INVISIBLE).astype(np.int32)
+
+    result = xarray.DataArray(count, coords=raster.coords,
+                              dims=raster.dims, attrs=raster.attrs)
+    if _is_dask:
+        chunk_dict = dict(zip(raster.dims, raster.data.chunks))
+        result = result.chunk(chunk_dict)
+    return result
 
 
 def visibility_frequency(