Skip to content

Latest commit

Β 

History

History
65 lines (53 loc) Β· 2.29 KB

File metadata and controls

65 lines (53 loc) Β· 2.29 KB

Unit-tests for ML Pipelines in Python (sklearn)

Environment

[sudo] pip install virtualenv
virtualenv [-p python3] <env_name>
echo <env_name> >> .gitignore
<env_name>/bin/activate
pip install -r requirements.txt

# To deactivate your environment
deactivate

Using Miniconda

conda create -n <env_name> [python=<python_version>]
source activate <env_name>
pip install -r requirements.txt

# To deactivate your environment
source deactivate

Project Structure

β”œβ”€β”€ resources/
β”‚   β”œβ”€β”€ data/                 <- Input data folder
β”‚   └── results/              <- Results folder
β”œβ”€β”€ tfdv/                     <- Scripts to compare the system against
|                                TFX and data-linter
β”œβ”€β”€ third_party/              <- Data-linter and facets source code
β”œβ”€β”€ analyzers.py              <- DataFrameAnalyzer
β”œβ”€β”€ error_generation.py       <- Error generation utilities
β”œβ”€β”€ evaluation.py             <- Evaluation utilities, tests
β”œβ”€β”€ hilda.py                  <- HILDA'19 showcase
β”œβ”€β”€ messages.py               <- Text messages placeholder
β”œβ”€β”€ models.py                 <- ML models
β”œβ”€β”€ openml.py                 <- Utilities for using OpenML
β”œβ”€β”€ pipelines.py              <- Pipelines
β”œβ”€β”€ profilers.py              <- DataFrameProfiler, PipelineProfiler
β”œβ”€β”€ selection.py              <- RandomSelector, PairSelector
β”œβ”€β”€ settings.py               <- Helper functionality
β”œβ”€β”€ shift_detection.py        <- Dataset shift detection utilities
β”œβ”€β”€ test_suite.py             <- TestSuite, AutomatedTestSuite
β”œβ”€β”€ transformers.py           <- Custom transformers for sklearn pipeline
└── visualization_utils.py    <- Visualization utilities

Entry Points

hilda.py                      <- Showcase on automated unit tests functionality
evaluation.py                 <- Checks whether errors in data crash the
                                 serving system or affect performance of the
                                 pipelines, and whether unit tests detect these
                                 errors
shift_detection.py            <- Snowcase on dataset shift detection