Migrate unit tests to pytest?

While I was working on #78, the test suite caused me a lot of trouble.  There were two main problems:

- Running the tests takes a really long time.  I ended up writing [some scripts](https://github.com/kalekundert/escnn_test_slurm) to run the tests on my institution's cluster, but even then ≈50 of the tests (individual test functions, not test files) run up against a 4h time limit. The whole suite takes >34h to run, not counting any tests that were killed for exceeding the time limit.  There are also a handful of tests that require >32 GB of RAM!

- There are ≈30 tests that are either broken or flaky.  This makes it a bit scary to develop new features, because you're not sure which test failures you're actually responsible for.  And of course, even if you realize that you're not responsible for a particular failure, you have no way of knowing if you broke something relating to these tests.  I do think this relates back to the first problem, because if it were practical to run the whole test suite for each commit, there would surely be fewer broken tests.

Migrating to pytest wouldn't solve the underlying issue, but it has a few features that I think would help a lot.  The biggest is the ability to "mark" tests.  There are two marks that I'd want to use extensively.  The first is [`xfail`](https://docs.pytest.org/en/6.2.x/skipping.html), which indicates that a test is currently broken and expected to fail.  The failures are still reported, so you know they're happening, but they don't affect whether or not the whole suite is considered to pass or fail.  This means that you can require new contributions to not break any currently working tests, while not having to fix all the broken/flaky tests up front.  The second mark is [`slow`](https://docs.pytest.org/en/6.2.x/mark.html), which would indicate that a test takes more than a few seconds to complete.  With this information, GitHub could be configured to automatically run all the "not slow" tests for each PR, which would help protect against breakages.  The slow tests could either be gradually made fast enough to include in the automated test suite, or just be run manually when there's some reason to.

Pytest also supports running tests in parallel across multiple processes, via the `pytest-xdist` plugin.  This alone wouldn't be enough to make the tests run in a reasonable amount of time, but it would help.  Finally, pytest makes it easier to share code between tests (because it doesn't require every test to belong to a class) and has features like parametrization that can cut down on boilerplate.  These last two things wouldn't make the tests run any faster, but would make them easier to write.

Is this something you'd be in favor of?  I'm not asking for you to do anything yourself, I'm more asking for permission to do something like this the next time I want to make a non-trivial change to the code.  (With no guarantees that I'll ever actually get around to it.)

---

In case it's of any interest, here's the test report I generated for the most recent commit (`94380ef`):
```
RUNTIME: 1 day, 10:31:15

TOTAL:   864
PASS:    771
FAIL:     39
KILLED:   54

FAILURES:
  AssertionError: ((24, 24, 10, 10, 10), 9)
      nn.test_convolution.TestConvolution.test_padding_modes_r3conv

  AssertionError: False is not true...
      group.test_combine_subgroups.TestSubgroups.test_o3
      group.test_subgroups.TestSubgroups.test_restrict_so3_o2

  AssertionError: Induction from general representations is not supported yet
      nn.test_pooling.TestPooling.test_induced_norm_pooling

  AssertionError: The error found during equivariance check with element...
      nn.test_r3upsampling.TestUpsampling.test_cyclic_even_nearest
      nn.test_r2upsampling.TestUpsampling.test_cyclic_even_nearest
      nn.test_r3upsampling.TestUpsampling.test_o3_nearest
      nn.test_r2upsampling.TestUpsampling.test_cyclic_odd_nearest
      nn.test_r3upsampling.TestUpsampling.test_dihedral_even_nearest
      nn.test_r3upsampling.TestUpsampling.test_ico_nearest
      nn.test_r3upsampling.TestUpsampling.test_cone_nearest
      nn.test_r2upsampling.TestUpsampling.test_dihedral_odd_bilinear
      nn.test_r3upsampling.TestUpsampling.test_dihedral_nearest
      nn.test_tensorproduct.TestTensor.test_with_spatial_dimns
      nn.test_r3upsampling.TestUpsampling.test_so2_nearest
      nn.test_r3upsampling.TestUpsampling.test_so3_trilinear
      nn.test_r2upsampling.TestUpsampling.test_cyclic_odd_bilinear
      nn.test_r3upsampling.TestUpsampling.test_dihedral_odd_nearest
      nn.test_r3upsampling.TestUpsampling.test_inv_nearest
      nn.test_r3upsampling.TestUpsampling.test_so3_nearest
      nn.test_r3upsampling.TestUpsampling.test_cube_nearest
      nn.test_r2upsampling.TestUpsampling.test_so2_nearest
      nn.test_r2upsampling.TestUpsampling.test_dihedral_even_nearest
      nn.test_r2upsampling.TestUpsampling.test_dihedral_odd_nearest
      nn.test_r2upsampling.TestUpsampling.test_o2_nearest
      nn.test_convolution.TestConvolution.test_ico_sparse

  NotImplementedError
      group.test_subgroups.TestSubgroups.test_restrict_ico_dn
      group.test_subgroups.TestSubgroups.test_restrict_ico_tetra
      group.test_combine_subgroups.TestSubgroups.test_ico
      group.test_subgroups.TestSubgroups.test_restrict_octa_dn
      group.test_subgroups.TestSubgroups.test_restrict_octa_tetra

  RuntimeError: Found no NVIDIA driver on your system...
      nn.test_amp.TestMixedPrecision.test_r2conv_uniform
      nn.test_amp.TestMixedPrecision.test_r3pointconv_non_uniform
      nn.test_amp.TestMixedPrecision.test_r2conv_non_uniform
      nn.test_amp.TestMixedPrecision.test_gnorm
      nn.test_amp.TestMixedPrecision.test_r3conv_non_uniform
      nn.test_amp.TestMixedPrecision.test_fourier
      nn.test_amp.TestMixedPrecision.test_fieldnorm

  RuntimeError: could not construct a memory descriptor using a format tag
      nn.test_export.TestExport.test_R2ConvTransposed
```
The upsampling equivariance tests are the ones that are flaky.  The mixed precision tests probably aren't really failures, I just wasn't running on nodes with GPUs. 

---

To end on a more positive note, I should say that I really appreciate how extensive the test suite is.  There are so many academic projects out there with not enough tests, but this is the only one I know of with *too many* tests!  It's a good problem to have, I think.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate unit tests to pytest? #79

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Migrate unit tests to pytest? #79

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions