Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,8 +14,8 @@ chapters:
- file: fundamentals/regularization/rotated_gradients
- file: fundamentals/mesh_design
- file: fundamentals/joint_inversion
- file: fundamentals/depth_of_investigation
- file: fundamentals/optimization
- file: fundamentals/parallelization
- file: tutorials/introduction
sections:
- file: tutorials/background
Expand All @@ -34,10 +34,10 @@ chapters:
- file: case_studies/Forrestania/python_code/unconstrained_gravity_inv_training
- file: case_studies/Forrestania/python_code/unconstrained_magnetics_inv_training
- file: case_studies/Forrestania/python_code/joint_grav_mag
- file: plate-simulation/index
- file: fundamentals/depth_of_investigation
- file: plate-simulation/simulation
sections:
- file: plate-simulation/usage
title: Basic Usage
- file: plate-simulation/methodology
title: Methodology
- file: plate-simulation/standalone
- file: plate-simulation/sweep
- file: plate-simulation/match
- file: THIRD_PARTY_SOFTWARE
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,7 @@
exclude_patterns = []
todo_include_todos = True


# -- Options for auto-doc ----------------------------------------------------
# https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#module-sphinx.ext.autodoc

Expand Down
113 changes: 57 additions & 56 deletions docs/fundamentals/depth_of_investigation.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,66 @@
"source": [
"# Depth of Investigation\n",
"\n",
"In geophysics, \"depth of investigation\" refers to the maximum depth below the surface from which a geophysical survey can reliably measure. It depends on factors like the survey design and physical properties of the subsurface material. Several strategies have been proposed to assess uncertainties in models recovered from inversion. {cite:t}`nabighian_1989` used a skin depth approach for electromagnetic surveys, assuming a background halfspace resistivity. {cite:t}`li_1999` implemented a cut-off value based on two inverted models obtained with slightly different assumptions. {cite:t}`christiansen_2012` proposed a mask based on the sum-square of sensitivities to estimate a volume of low confidence. In the following, we discuss the algorithm and implementation of the sensitivity cutoff strategy."
"This application allows users to compute a depth of investigation from model values. In geophysics, \"depth of investigation\" refers to the maximum depth below the surface from which a geophysical survey can reliably measure. It depends on factors like the survey design and physical properties of the subsurface material. \n",
"\n",
"![masked_model](./images/masked_model.png)\n",
"\n",
"Several strategies have been proposed to assess uncertainties in models recovered from inversion. {cite:t}`nabighian_1989` used a skin depth approach for electromagnetic surveys, assuming a background halfspace resistivity. {cite:t}`li_1999` implemented a cut-off value based on two inverted models obtained with slightly different assumptions. \n",
"\n",
"This application uses the method proposed by {cite:t}`christiansen_2012`, based on the sum-square of sensitivities to estimate a volume of low confidence."
]
},
{
"cell_type": "markdown",
"id": "83b4ae8f-5509-40b1-98c2-39acefba438c",
"metadata": {
"jp-MarkdownHeadingCollapsed": true
},
"source": [
"## Interface\n",
"\n",
"The depth of investigation methods based on sensitivity cutoffs relies on the ui.json interface. \n",
"\n",
"![interface](./images/uijson.png)\n",
"\n",
"### Inputs\n",
"\n",
"- **Mesh**: The mesh used for inversion\n",
"- **Sensitivity**: Model selector for the sensitivities at a given iteration (see [Pre-requisite](pre-requisite))\n",
"- **Cut-off**: Percentage value to threshold the sentivities, relative to the **Method** used.\n",
"- **Method**: One of *percentile*, *percent* or *log percent*\n",
"\n",
"![cutoff_options](./images/cutoff_options.png)\n",
"\n",
"### Output\n",
"\n",
"After running, the application will create a masking attribute saved on the mesh.\n",
"\n",
"![mask](./images/sensitivity_mask.png)\n",
"\n",
"The mask can then be applied to any of the iterations to show only the cells that exceeded the sensitivity threshold.\n",
"\n",
"![apply_mask](./images/apply_mask.png)\n",
"\n",
"\n",
"(pre-requisite)=\n",
"### Pre-requisite\n",
"\n",
"In order to save the sensitivities during a SimPEG inversion, the 'Save sensitivities' option must be selected from the 'Optional parameters' tab of an inversion.\n",
"\n",
"![save_sensitivities](./images/save_sensitivities.png)\n",
"\n",
"This will result in a new model generated and saved into the computational mesh at each iteration.\n",
"\n",
"![sensitivity_models](./images/sensitivity_models.png)"
]
},
{
"cell_type": "markdown",
"id": "c13ffcec-8d95-4ee9-876f-9aa92e2c534a",
"metadata": {},
"source": [
"## Sensitivities\n",
"## Methodology\n",
"\n",
"The sensitivity matrix is calculated as part of the optimization problem solved by SimPEG while inverting geophysical data. The sensitivity matrix represents the degree to which each predicted datum changes with respect to a perturbation in each model cell. It is given in matrix form by\n",
"\n",
Expand All @@ -28,20 +79,20 @@
"The depth of investigation mask is a property of the cells of the mesh only so the rows of the sensitivity matrix (data) are sum-square normalized as follows.\n",
"\n",
"$$\n",
"\\mathbf{J}_{norm} = \\left[\\sum_{n=1}^{N}\\left(\\frac{\\mathbf{J}_{n:}}{w_n}\\right)^{2}\\right]^{(1/2)}\n",
"\\|\\mathbf{j}\\| = \\left[\\sum_{n=1}^{N}\\left(\\frac{\\mathbf{J}_{n:}}{w_n}\\right)^{2}\\right]^{(1/2)}\n",
"$$\n",
"\n",
"where $w_n$ are the data uncertainties associated with the $n^{th}$ datum.\n",
"\n",
"The resulting vector $J_{norm}$ can then be thought of as the degree to which the aggregate data changes due to a small perturbation in each model cell. The depth of investigation mask is then computed by thresholding those sensitivities"
"The resulting vector $\\|\\mathbf{j}\\|$ can then be thought of as the degree to which the aggregate data changes due to a small perturbation in each model cell. The depth of investigation mask is then computed by thresholding those sensitivities"
]
},
{
"cell_type": "markdown",
"id": "d8bb85c5-5e63-4617-ab6b-a7b6536c1d2c",
"metadata": {},
"source": [
"## Thresholding\n",
"### Thresholding\n",
"\n",
"The depth of investigation can be estimated by assigning a threshold on the sum-squared sensitivity vector. This can be done as a percentile, percentage, or log-percentage. In the percentile method, the mask is formed by eliminating all cells in which the sensitivity falls below the lowest $n$% of the number of data where $n$ is the chosen cutoff. In the percent method the data are transformed into a percentage of the largest value\n",
"\n",
Expand All @@ -51,56 +102,6 @@
"\n",
"and the mask is formed by eliminating all cells in which the sensitivity falls below the lowest $n$% of the data values where $n$ is the chosen cutoff. Finally, the log-percent mask transforms the data into log-space before carrying out the percentage thresholding described above."
]
},
{
"cell_type": "markdown",
"id": "83b4ae8f-5509-40b1-98c2-39acefba438c",
"metadata": {},
"source": [
"## Usage\n",
"\n",
"The depth of investigation methods based on sensitivity cutoffs described above are exposed to Geoscience ANALYST Pro Geophysics users through a ui.json interface. In order to save the sensitivities during a SimPEG inversion, the 'Save sensitivities' option must be selected from the 'Optional parameters' tab of the SimPEG inversion ui.json window.\n",
"\n",
"![save_sensitivities](./images/save_sensitivities.png)\n",
"\n",
"This will result in a new model generated and saved into the computational mesh at each iteration.\n",
"\n",
"![sensitivity_models](./images/sensitivity_models.png)\n",
"\n",
"The ui.json interface allows the user to select a mesh from the **simpeg-drivers** result and any of the generated sensitivity models, a cutoff threshold, method, and optional name for the output.\n",
"\n",
"![interface](./images/uijson.png)\n",
"\n",
"The cutoff methods are selectable in the dropdown\n",
"\n",
"![cutoff_options](./images/cutoff_options.png)\n",
"\n",
"Whatever the options, the application will create a sensitivity cutoff mask saved on the mesh\n",
"\n",
"![mask](./images/sensitivity_mask.png)\n",
"\n",
"which can then be applied to any of the iterations to show only the cells that exceeded the sensitivity threshold.\n",
"\n",
"![apply_mask](./images/apply_mask.png)\n",
"\n",
"![masked_model](./images/masked_model.png)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1079da0d-4ea2-4f00-b957-92199dd39cdb",
"metadata": {},
"outputs": [],
"source": []
},
{
"cell_type": "code",
"execution_count": null,
"id": "920f0637-4028-40c7-b2c6-5976f5e90afd",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand All @@ -119,7 +120,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.16"
"version": "3.10.19"
}
},
"nbformat": 4,
Expand Down
1 change: 1 addition & 0 deletions docs/fundamentals/images/distributed_parallelization.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
59 changes: 59 additions & 0 deletions docs/fundamentals/parallelization.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,59 @@
.. _parallelization:

Parallelization
===============

This section describes the different levels of parallelization used by the inversion routines and how to optimize resources.
For a given inversion routine, the application decomposes the problem into a series of sub-problems, or tiles, each assigned a mesh and a survey. During the inversion process, the application continuously requests predicted data and derivatives from the sub-problems. The application parallelizes these operations within each sub-problem, as well as externally so that sub-problems can be computed concurrently.


.. figure:: ./images/distributed_parallelization.svg
:align: center
:width: 80%

Schematic representation of the computing elements of a tiled inversion. Each tile receives an assigned mesh and a survey (single-frequency if applicable), with array operations parallelized by dask. The dask operations always bookend the direct solver when employed. Optionally, the workload can be distributed across multiple workers to optimize performance. Each worker is responsible for a sub-set of tiles and with a limited number of threads. Only 1-dimensional arrays are passed between the main process and the workers.


Dask
----

The `dask <https://www.dask.org/>`_ library handles most operations related to generating arrays. A mixture of ``dask.arrays`` and ``dask.delayed`` calls parallelizes the computations across multiple threads. If a direct solver is involved, the dask operations bookend the solver to avoid thread-safety issues. The application converts dask.arrays to numpy arrays before passing them to the direct solvers and before returning them to the main process. Only 1-dimensional arrays are returned to the main process, while higher-dimensional arrays and solvers remain in the distributed memory of the workers.

Sensitivity matrices can optionally be stored on disk using the `zarr <https://zarr.readthedocs.io/en/stable/>`_ library, which is optimized for parallel read/write access. In this case, the workers never hold the entire sensitivity matrix in memory, but rather read and write small chunks of the matrix as needed. This allows for efficient handling of large sensitivity matrices that may not fit in memory, at the cost of increased disk I/O. The use of zarr is optional and can be enabled by setting the ``store_sensitivity`` parameter to true in the ui.json file.


Direct Solvers
--------------

A direct solver is used for any method evaluated by partial differential equation (PDE), such as electromagnetics and electric surveys. The `Pardiso <https://github.com/simpeg/pydiso>`_ and `Mumps <https://gitlab.kwant-project.org/kwant/python-mumps>`_ solvers parallelize operations during the factorization and backward substitution calls. Note that the current implementation of the solvers is not thread-safe and therefore cannot be shared across parallel processes. Any other levels of parallelization need to occur outside the direct solver calls or be encapsulated within a distributed process.

The number of threads used by the solvers can be set by running the command

.. code-block::

set OMP_NUM_THREADS=X

before launching the python program. Alternatively, setting ``OMP_NUM_THREADS`` as a local environment variable will set it permanently. The default value is the number of threads available on the machine.


.. _parallelization_distributed:

Dask.distributed
----------------

It has been found that parallel processes, both for dask.delayed and the direct solver, tend to saturate on large numbers of threads. Too many small tasks tend to overwhelm the scheduler at the cost of performance. This can be alleviated by resorting to the ``dask.distributed`` library to split the computation of tiles across multiple ``workers``. Each worker can be responsible for a subset of the tiles, and can be configured to use a limited number of threads to optimize performance. This allows for better resource utilization and improved performance, especially when dealing with large problems that require significant computational resources. The number of workers and threads (per worker) can be set with the following parameters added to the ui.json file:

.. code-block::

{
...
"n_workers": i,
"n_threads": n,
"performance_report": true
}

Where ``n_workers`` is the number of processes to spawn, and ``n_threads`` is the number of threads to use for each process. Setting ``performance_report`` to true will generate an ``html`` performance report at the end of the inversion, which can be used to identify bottlenecks and optimize the parallelization settings.

It is good practice to set an even number of threads per worker to optimize the load. Setting too many workers with too few threads can lead to increased overhead from inter-process communication, while setting too few workers with too many threads can lead to saturation of the direct solvers and reduced performance. For example, if the machine has 32 threads available, setting 4 workers with 8 threads each will fully use the resources.

It is also recommended to set the number of workers as a multiple of the number of tiles, to ensure that all workers are utilized. For example, if there are 8 tiles, setting 4 workers will allow each worker to process 2 tiles concurrently. If fewer tiles than workers are available, the program will automatically split surveys into smaller chunks, while preserving the mesh, to ensure even load across the workers. This is less efficient than having a dedicated optimized mesh per tile, but will still provide performance benefits.
33 changes: 31 additions & 2 deletions docs/intro.md
Original file line number Diff line number Diff line change
@@ -1,16 +1,45 @@
# About

This document contains training material for geophysical inversions using [SimPEG](https://simpeg.xyz/) and [Geoscience ANALYST](https://www.mirageoscience.com/mining-industry-software/geoscience-analyst/).
This document contains training material for geophysical forward modeling and inversions using [SimPEG](https://simpeg.xyz/) and [Geoscience ANALYST](https://www.mirageoscience.com/mining-industry-software/geoscience-analyst/).
```{image} ./images/ore_body.png
:width: 500px
```

# Table of content

# Table of contents

```{tableofcontents}
```


# Running the applications

The main entry point to the various modules are the [*.ui.json](https://github.com/MiraGeoscience/simpeg-drivers/blob/develop/simpeg_drivers-assets/uijson)
files. The ``ui.json`` serves a dual purpose:

(1) rendering a user-interface in Geoscience ANALYST

(2) storing the input parameters chosen by the user for the program to run. See the [UIJson documentation](https://mirageoscience-geoh5py.readthedocs-hosted.com/en/latest/content/uijson_format/usage.html)
for more information about the ui.json interface.

The various user-interfaces can be accessed from the Geoscience ANALYST Pro Geophysics menu.

```{image} ./images/analyst_geophysics_menu.png
:width: 500px
```

The application can also be run from command line if all required fields in
the ui.json are provided. This approach is useful for advanced users who want to
automate the mesh creation process or re-run an existing mesh with different parameters.

To run any of the applications, assuming a valid Conda environment, use the following command:

``conda activate simpeg-drivers``

``python -m simpeg-drivers.driver [YOUR.ui.json]``

where ``[YOUR.ui.json]`` is the path to the input ui.json file on disk.


# References

Expand Down
Binary file added docs/plate-simulation/images/match_landing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/plate-simulation/images/match_template.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/plate-simulation/images/match_uijson.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/plate-simulation/images/summary.png
Binary file added docs/plate-simulation/images/sweep_landing.png
Binary file added docs/plate-simulation/images/sweep_uijson.png
Loading