-
Notifications
You must be signed in to change notification settings - Fork 5
Unit test guide and catch example #138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
KipHamiltons
wants to merge
16
commits into
main
Choose a base branch
from
hamiltons/unit-test-guide
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
16 commits
Select commit
Hold shift + click to select a range
e10211b
First iteration, 2/3 done.
KipHamiltons e01247c
Adds a couple of glossary definitions
KipHamiltons dc4cb91
Moves most of the guide into a general testing guide.
KipHamiltons f844658
Finishes the general guide (pre-review)
KipHamiltons 9fb9fcd
Formatting
KipHamiltons 140486b
Cooks up a basic catch example
KipHamiltons a42afbe
Adds stuff about Error to the glossary
KipHamiltons 266f0af
Merge branch 'master' into hamiltons/unit-test-guide
KipHamiltons 37a30b1
Apply suggestions from code review
KipHamiltons 1d0efdb
Merge branch 'master' into hamiltons/unit-test-guide
KipHamiltons 6b9ef0f
Strip down add function example to be super simple
KipHamiltons 40c52fd
Updates the add function in the catch guide too.
KipHamiltons 214836b
Chooses "expected output" over "ground truth"
KipHamiltons 7028b3f
Adds more suggestions from review
KipHamiltons dedb84d
Adds section about generating data
KipHamiltons 1e968eb
Adds BDD/TDD section
KipHamiltons File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,65 @@ | ||
| --- | ||
| section: Guides | ||
| chapter: Tools | ||
| title: Catch Getting Started | ||
| description: How to write your first test with catch. | ||
| slug: /guides/tools/catch-guide | ||
| --- | ||
|
|
||
| [Catch](https://github.com/catchorg/Catch2) is the C++ testing framework which we use for our unit tests. In this guide we will go over the basics of Catch by completing a concrete example. This guide assumes familiarity with C++ and testing concepts. To brush-up on testing, see the [Guide for Writing Tests](/guides/general/writing-tests). | ||
|
|
||
| Note that the [Catch docs](https://github.com/catchorg/Catch2/tree/devel/docs) should be the first thing to look at if you're wondering about a specific Catch-ism. | ||
|
|
||
| ## A Basic Example | ||
|
|
||
| Catch test cases have a few components. The most important is the `TEST_CASE(..)` macro, which wraps around groups of associated tests. Inside each `TEST_CASE` scope, you should follow the **AAA** structure, _Arranging_ the data first - both inputs and expected outputs - then _Acting_ by calling the function you're testing, then _Asserting_ that the results match the expected outputs. We'll make an example for the following toy utility function: | ||
|
|
||
| ```cpp | ||
| int add(int a, int b) { | ||
| return a + b; | ||
| } | ||
| ``` | ||
|
|
||
| To test it, we'll need pairs of inputs and their expected outputs. Usually, if there's a lot of data, we'll want to keep it separate from the test logic, but for this example we'll keep it local. Also note that for utilities like this one, we would want a much more comprehensive set of test cases. | ||
|
|
||
| ```cpp | ||
| using utility::math::add; | ||
|
|
||
| TEST_CASE("Testing integer add utility", "[utility][math][add]") { | ||
| // Arrange | ||
| static constexpr int NUM_TESTS = 5; | ||
| std::array<std::pair<int, int>, NUM_TESTS> inputs = {{0, 0}, {1, 1}, {-1, -1}, {123000, 456}, {-1000, 1000}}; | ||
| std::array<int, NUM_TESTS> expected_outputs = {0, 2, -2, 123456, 0}; | ||
| std::array<int, NUM_TESTS> actual_outputs{}; | ||
| // Act | ||
| for (int i = 0; i < NUM_TESTS; ++i) { | ||
| actual_outputs[i] = add(inputs[i].first, inputs[i].second); | ||
| } | ||
| // Assert | ||
| for (int i = 0; i < NUM_TESTS; ++i) { | ||
| INFO("In test case number " << i); | ||
| INFO("Inputs are (" << inputs[i].first << ", " << inputs[i].second << ")"); | ||
| INFO("Expected output is " << expected_outputs[i]); | ||
| INFO("Actual output is " << actual_outputs[i]); | ||
| REQUIRE(actual_outputs[i] == expected_outputs[i]); | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| ### Dissecting the Example | ||
|
|
||
| As we can see in this example, inside the scope of the `TEST_CASE` is where it all happens. We use `INFO` macros to commentate exactly what is happening for each assertion. You shouldn't worry too much about creating huge, unwieldy logs with `INFO` macros, because by default Catch only prints the `INFO` for test cases which fail. | ||
|
|
||
| The first argument for `TEST_CASE` is a string which is a name for the test. You can use the names to run the specific test. They should be specific. The second argument is a set of tags, which you can use to divide the tests into groups easily. We usually use each sub-namespace the function is located in as the tags. The Catch `TEST_CASE` has much more functionality than demonstrated here and you can find the documentation with the details [here](https://github.com/catchorg/Catch2/blob/devel/docs/test-cases-and-sections.md). | ||
|
|
||
| The rest of the example is just going through the **AAA** process (the comments are for illustrative purposes). | ||
|
|
||
| ## Floating Point Considerations | ||
|
|
||
| Floating point arithmetic is imprecise by nature. Equality comparisons between distinct non-zero floating point numbers are assumed to be false because of this imprecision. Catch has features to deal with the error from floating point operations. We recommended that if you're testing functions which compute floating point numbers that you [read about those features](https://github.com/catchorg/Catch2/blob/devel/docs/assertions.md#floating-point-comparisons). | ||
|
|
||
| Basically, you'll need to define a margin of error - either relative or absolute - that you can tolerate and use that to define your `Approx` for each floating point `REQUIRE` assertion. | ||
|
|
||
| ## Conclusions | ||
|
|
||
| Catch is concise and powerful. It has many more features than the basic example presented here - this is just enough to make you dangerous. Now go and write some tests! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,78 @@ | ||
| --- | ||
| section: Guides | ||
| chapter: General | ||
| title: How to Write Tests | ||
| description: Information About Making Tests | ||
| slug: /guides/general/writing-tests | ||
| --- | ||
|
|
||
| This guide presents some general information about unit testing and tests in general. | ||
|
|
||
| ## What is a Unit Test? | ||
|
|
||
| A unit test is a test of a small piece of a codebase - a unit. Unit tests should test a single piece of code to validate its correctness. This contrasts with integration tests, which tests the interactions of pieces of code and their behaviour together. For the rest of this guide, we'll work with the following toy utility function | ||
|
|
||
| ```cpp | ||
| int add(int a, int b) { | ||
| return a + b; | ||
| } | ||
| ``` | ||
|
|
||
| ## Anatomy of a Unit Test | ||
|
|
||
| The basic pieces you will need to create a unit test for a function are the following: | ||
|
|
||
| 1. A set of inputs for the function. These should match the input parameters. In our case, we'll need a set of pairs of integers. | ||
| 2. The expected outputs which you want for each input. These are often referred to as the "ground truth", because they are the true values your function should output. For our `add` example with the pair of inputs (2, 2), we want the output 4, because `add(2, 2) == 4`. | ||
|
|
||
| The process of running the tests is as simple as calling the function with your inputs and verifying that the function's output was as expected. In our example, one of the test cases could be running `add(2, 2)` and checking that the result is 4. | ||
|
|
||
| ### Obtaining Expected Outputs | ||
|
|
||
| For our C++ code, we can divide the things we want to test into two categories. The first category is the set of functions which do a mathematical transformation on the data and return the result. The second is everything else. Functions which just do maths - which includes things like filters - often require large sets of generated data or randomised inputs across their domain. When we generate data like this, we must document how we generated it and how we verified its correctness. To verify its correctness, you can simply use an implementation for a different platform or language that you're sure is correct. For example, if we were testing our `add` function, we could randomly generate a list of inputs, making sure that each corner case is covered, then use `numpy.add` on each input to verify the expected outputs. We can be quite sure that the `numpy` implementation is correct, so all we would have to do is make it clear that that was the process we used. | ||
|
|
||
| Functions which aren't just maths will have more "categorical" corner cases and input domains. For such functions, data generation or randomised inputs are rarely appropriate. We should identify and cover each of those cases with tests manually. Note that the simple cases should have tests too. Don't just test edge cases. | ||
|
|
||
| In general, for data generation we want to use languages and libraries that we already use regularly. Python is most often preferred for this, but Matlab is acceptable if it has an implementation of something python lacks. Other languages should be avoided where possible. | ||
|
|
||
| ## Testing Approach and Philosophy | ||
|
|
||
| There are many guidelines you can find online to help you to write good tests. Here are a few: | ||
|
|
||
| - Tests should be simple and readable enough to be correct on inspection. You don't want to think about whether a test is correct or not. Ideally you'll be able to read it and know that it's legitimate. | ||
| - Make test cases independent. The outcome of one test case shouldn't affect the outcome of others. | ||
| - Demonstrate how a piece of code should be used with its tests. We can't google for examples of people using our software, so create examples with your tests. | ||
| - Tests should be deterministic - [seed your randomness](https://en.wikipedia.org/wiki/Random_seed). If you're testing something particularly reliant on randomness or which generates randomness, compensate by using a variety of seeds with many cases each. | ||
| - Follow the **AAA** structure: Arrange, Act, Assert. Each test should follow the general design of first setting up your input variables (Arrange), calling your unit with those variables (Act), then finally checking your outputs match what they should (Assert). | ||
|
|
||
| ### General Approach | ||
|
|
||
| Write the easy tests first, then think about edge cases and code coverage. For `add`, you might make the `(2, 2) == 4` case first, then `(123000, 456) == 123456`. After you have some simple cases, you could consider throwing in some zeros and negative numbers - cases where the observed behaviour is different somehow. | ||
|
|
||
| Later, you might consider what should happen on integer overflow/wraparound, making sure that errors are handled correctly. At that stage you could also try to make a test case covering every possible branch of your code. The amount of code you execute in a set of tests is referred to as the "code coverage" of the tests. | ||
|
KipHamiltons marked this conversation as resolved.
|
||
|
|
||
| ### Regression Tests | ||
|
|
||
| When we find bugs in the codebase and fix them, we should add a test case which makes sure that that code doesn't _regress_ into the buggy behaviour. This test case is called a regression test. Regression tests should be labelled with comments to indicate the behaviour they're watching for. If there is a GitHub issue related to the bug, the comment with the test should reference it. The test should be written such that it would fail before the fix, and pass after the fix. | ||
|
|
||
| For a concrete example, imagine that a bug was found with `add` where if both inputs were negative, it always returned 0. Good practice would be to add a test case or small set of test cases where both inputs were negative - such as `(-1, -1) == -2` and `(-123, -456) == -579` - labelling them as regression tests for that bug. These would clearly fail if the bug came back (although this example is quite contrived, because there should have already been tests with both inputs negative). | ||
|
KipHamiltons marked this conversation as resolved.
|
||
|
|
||
| ### Black Box vs White Box Testing | ||
|
KipHamiltons marked this conversation as resolved.
|
||
|
|
||
| When we write tests, we can make them completely ignorant to the internals of the code. Such a test worries only about the inputs and outputs, considering the parts in the middle as a black box which we can't see inside - black box testing. | ||
|
|
||
| White box testing looks at the internals and makes tests which depend on them. This means that any significant change to the implementation of a function which doesn't change its interface is likely to break white box tests which depended on the old function. The fragility of white box tests to change is the chief reason that black box tests are preferred. | ||
|
|
||
| Grey box testing is somewhere between black box and white box testing. It isn't completely ignorant of the implementation, but grey box tests should be easily adaptable if the implementation changes. It will usually be necessary to have some sort of insight into the implementation of the unit in order to get full code coverage, but a rule of thumb is the blacker the box, the better. | ||
|
|
||
| ## TDD and BDD | ||
|
|
||
| Ideally, you're able to write tests as you design your software interfaces, writing the code afterwards such that it fulfills the needs of the tests. This is the basis of [**Test-Driven Development**](https://en.wikipedia.org/wiki/Test-driven_development) (TDD), which is a powerful means of creating high quality software. Tests written as part of a TDD process should inherently be black box tests, because the implementations they're testing don't exist yet. | ||
|
|
||
| Another conception of TDD prescribes writing the tests in parallel with the code. With this process, you write a basic test in conjunction with starting a new module or feature. As you write the code, you increase the number and depth of the tests, developing them in tandem. This incremental style is more in keeping with the [_agile_ philosophy](https://www.atlassian.com/agile). The code works at each step of development and the tests can prove it. We aren't agile die-hards though, so we don't have a particular preference between these two realisations of TDD. | ||
|
|
||
| [Behaviour-driven development](https://en.wikipedia.org/wiki/Behavior-driven_development) (BDD) is a more modern take on TDD. BDD tests can be thought of as self-documenting scripts which should be able to be understood by all stakeholders. It formally redefines the _Arrange, Act, Assert_ structure as _Given, When, Then_. Each BDD test should be of the form _Given_ some situation, _When_ a specific thing happens, _Then_ the system responds in the correct way. Following this language structure predisposes the developer to writing tests which everyone can understand. The Catch docs have [a great example](https://github.com/catchorg/Catch2/blob/devel/docs/test-cases-and-sections.md#bdd-style-test-cases) demonstrating the structure of a BDD test. | ||
|
|
||
| In terms of the ideal development cycle and testing style, it doesn't matter all too much whether you use TDD, "agile style" TDD (not a real name), or BDD. Having a mix of BDD style tests, which are by their nature designed for everyone to be able to understand, and more fine-grained, standard unit tests is good. The most important thing is writing readable, verifiable tests. Everything else is secondary. | ||
|
|
||
| So what are you waiting for? Go write some tests! | ||
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.