Skip to content

alvii147/SHRIMP

Repository files navigation

logo

SHRIMP

SHRIMP (Scalable Heuristic Robust Iterative Matrix Profile) is a heuristic algorithm for computing the Matrix Profile with large scale time series in very short time. The project paper can be read here

Try It Out

Installation

Install depdencies using Poetry.

# install poetry package manager
pip3 install poetry
# install project dependencies
poetry install

Comparing Performance

Performance comparison results can be summarized using command line interface in src/main.py:

cd src/
python3 main.py -n 100000 -m 5000 --scrump --shrimp

This should output the following plots:

This should output the following plots incidating the results of each algorithm, and how it compares to the true matrix profile (as calculated by STUMP), as well as statistics of each algorthim, runtime (in seconds), Mean-Squared-Error, Median-Squared-Error, and Convergence Percentage:

Comparison Plot

Command Line Usage

usage: main.py [-h] [-n N] [-m M] [--shrimp] [--scrump] [--stamp] [--scrimp] [--shrimp_hill] [--shrimp_mass] [--shrimp_diag] [--shrimp_ga]

options:
  -h, --help     show this help message and exit
  -n N           length of time series
  -m M           window size
  --shrimp       run SHRIMP
  --scrump       run SCRUMP
  --stamp        run STAMP
  --scrimp       run SCRIMP
  --shrimp_hill  run SHRIMP_HILL
  --shrimp_mass  run SHRIMP_MASS
  --shrimp_diag  run SHRIMP_DIAG
  --shrimp_ga    run SHRIMP_GA

See Algorithms for more information on how to further tune each of these algorithms and their hyperparameters.

Testing

To run all tests:

pytest -v src/tests.py

Algorithms

In order to configure individual algorithms, edit the algorithms variable in src/main.py based on the configuration information found below for the corresponding algorithm.

algorithms = [
  ...
  {
      'name': 'SHRIMP',
      'callable': SHRIMP,
      'kwargs': {
          'step_size': int(m / 2),
          'max_aspirations': 2 * int(np.sqrt(n - m + 1)),
          'mass_percentage': 1.0,
      },
  },
  ...
]

STUMP

STUMP is the JIT-compiled, parallelized Matrix Profile computation module from library stumpy, implementing the STOMPopt algorithm with Pearson correlations. STUMP is always executed in the src/main.py as a baseline measure for the ground-truth.

SHRIMP

SHRIMP is the JIT-compiled, parallelized Matrix Profile computation, introduced in this project, and evolved from SHRIMP_DIAG.

Configuration

{
    'name': 'SHRIMP',
    'callable': SHRIMP,
    'kwargs': {
        'step_size': int(m / 2),                            # initial step size for first-pass compuation of MASS
        'max_aspirations': 2 * int(np.sqrt(n - m + 1)),     # size of aspriations list, i.e. number of "best distances" to store
        'mass_percentage': 1.0,                             # fraction of vertical column to include in MASS computation
    },
},

SCRUMP

SCRUMP is the JIT-compiled, parallelized Matrix Profile computation module from library stumpy, implementing SCRIMP/SCRIMP++.

Configuration

{
    'name': 'SCRUMP',
    'callable': scrump_callable,
    'kwargs': {
        'iterations': 5, # number of times to update the matrix profile by computing new distances (calls update() method)
    },
},

STAMP

STAMP is the Matrix Profile computation algorithm, defined in this paper.

Configuration

{
    'name': 'STAMP',
    'callable': STAMP,
    'kwargs': {
        'completion': 0.5, # fraction of random columns of the distance matrix to be computed
    },
},

STOMP

STOMP is the Matrix Profile computation algorithm, defined in this paper.

Configuration

{
    'name': 'STOMP',
    'callable': STOMP,
    'kwargs': {}, # STOMP provides no tunable hyperparameters
},

SCRIMP

SCRIMP is the Matrix Profile computation algorithm, defined in this paper.

{
   'name': 'SCRIMP',
   'callable': SCRIMP,
   'kwargs': {
       'completion': 0.075, # fraction of diagonals of the distance_matrix to be computed
   },
},

SHRIMP_MASS

SHRIMP_MASS is most-promising-minimum column search algorithm for computing Matrix Profile introduced in this project.

Configuration

{
    'name': 'SHRIMP_MASS',
    'callable': SHRIMP_MASS,
    'kwargs': {
        'steps': m,                 # initial step size for first-pass compuation of MASS
        'max_iter': 100,            # maximum iterations for computing "current-best" distances
        'search_window': m // 4,    # search window size for computing MASS around "current-best" distance
    },
},

SHRIMP_HILL

SHRIMP_HILL is most-promising-minimum hill climbing search algorithm for computing Matrix Profile introduced in this project.

Configuration

{
   'name': 'SHRIMP_HILL',
   'callable': SHRIMP_HILL,
   'kwargs': {
       'steps': m,          # initial step size for first-pass compuation of MASS
       'max_iter': 175,     # maximum iterations for computing "current-best" distances
       'epsilon': 0.003,    # amount of change in z-normalized-euclidian-distance before the "hill-climbing" deems it insignifiant in improvement
   },
},

SHRIMP_DIAG

SHRIMP_DIAG is most-promising-minimum diagonal hill climbing search algorithm for computing Matrix Profile introduced in this project.

Configuration

{
    'name': 'SHRIMP_DIAG',
    'callable': SHRIMP_DIAG,
    'kwargs': {
        'step_size': m // 2,                                # initial step size for first-pass compuation of MASS
        'max_aspirations': 5 * int(math.sqrt(n - m + 1)),   # size of aspriations list, how many current-best" distances we should compute diagonals for
    },
},

SHRIMP_GA

SHRIMP_DIAG is genetic algorithm for computing Matrix Profile introduced in this project.

Configuration

{
    'name': 'SHRIMP_GA',
    'callable': SHRIMP_GA,
    'kwargs': {
        'pop_size': 500,        # number of agents in the population
        'generations': 200,     # number of generations to simulation
        'mutate': 0.05          # random mutation chance
    },
},

File Descriptions

  • The first half of src/algos.py contains the implementations for the STAMP, STOMP, and SCRIMP++ algorthims as described in their respective papers. The second half of src/algos.py contains the preliminary implementations our SHRIMP algorithm, including methods that compute the matrix profile using MASS based off of the current lowest distance (SHRIMP_MASS), as well as hill climbing (SHRIMP_HILL), and finally a diagonal-based computation method based on hill-climbing (SHRIMP_DIAG, which is the basis for our final algorithm).
  • src/utils.py contains helper functions that are critical to the computation of the matrix profile, including the aforementioned MASS function, and z-normalized-euclidian-distance functions.
  • src/shrimp.py contains the current final version of our SHRIMP algorithm, using JIT, and multi-threading.
  • src/genetic.py contains a preliminary implementation of our SHRIMP algorithm using Genetic Algorithms as a basis for search. It performed below expectations and was dropped from testing.
  • src/preliminary_results.py is a precursor to the src/final_results.py script, acting as our first test-bench during preliminary testing.
  • src/final_results.py is the script used to generate the final results and comparisons.
  • src/hm.py is a spin-off to the main testbench as instead of generating a graph showing best distance vectors, it instead shows a heatmap of all distances between the two time-series, as well as which indicides are computed by the algorithm.

image

Figure 1 on the left shows the distance matrix of all pairs of incidies between the two time-series, and Figure 2 on the right shows which indicies the algorithm has done computations on. Note that only SHRIMP_MASS, SHRIMP_DIAG, and SHRIMP_GA support generating the visited heatmaps.

Credits

About

SHRIMP: Scalable Heuristic Robust Iterative Matrix Profile

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors