Skip to content

Improve applyCompMatr accuracy with compensated summation #598

@TysonRayJones

Description

@TysonRayJones

Note

For all development, please work out of the devel branch

Summary

Update applyCompMatr (specifically subroutine cpu_statevec_anyCtrlAnyTargDenseMatr_sub()) to use compensated summation and measure the effect on (single CPU) runtime and accuracy for few and many target qubits.

Context

The logic of many QuEST simulation functions involve linearly combining a fixed-number of amplitudes:

# parallel
for each pair of amplitudes:
    amp1 -> coeff1 * amp1 + coeff2 * amp2
    amp2 -> coeff1 * amp2 + coeff2 * amp1

In contrast, functions (like here) which accept any-sized matrices (applyCompMatr, mixSuperOp and mixKrausMap) have a logic resembling:

batchsize = powerOf2(length(targets))

# parallel
for each batch:
    cache = copy of each amplitude in batch

    for amplitude in batch:
        amplitude = 0

        for other_amplitude in cache:
            amplitude += somecoeff * other_amplitude

In such functions, each output amplitude is the linear combination of a dynamic number of input amplitudes; typically the matrix dimension (2 to the power of the number of target qubits). While the outer loop is parallelised, the inner loops are serially evaluated by a single thread.

In this issue, we consider the numerical precision of performing

        for other_amplitude in cache:
            amplitude += somecoeff * other_amplitude

A realistic maximum size of the input CompMatr is 16 qubits, requiring 16 GiB and containing 2^32 = 4,294,967,296 ~ 4B elements. In this end-regime, this exponentially-large loop is performing a reduction of billions of floating-point complex numbers which may vary enormously in their relative size. As such, the current implementation is numerically unstable and is liable to catastrophic cancellation, so that the modification to the state is inaccurate for many qubits. This is likely to break state normalisation and corrupt other state properties.

To remedy this, we could replace that naive sum in statevec_anyCtrlAnyTargDenseMatr_sub() with compensated summation, like Kahan summation, to improve the numerical accuracy of the final amplitudes. An example of Kahan summation of qcomp (QuEST's complex primitive type) already exists in the unit test utilities here.

However, switching to compensated summation might not be worthwhile. Kahan summation involves more total operations than naive summation, slowing things down and potentially jeopardising optimisations like auto-vectorisation. This might be evident for small Qureg before simulation becomes memory bandwidth bound. The benefit to accuracy might also be negligible until the number of target qubits is large, gratuitously slowing few-target simulation. It might be better to use compensation only above some threshold, or perhaps the benefit is never worth the increased code complexity.

Details

Modify cpu_statevec_anyCtrlAnyTargDenseMatr_sub()) to use Kahan summation and measure the runtime and accuracy effect (compared to the current implementation without compensation) at different system sizes. Visualise the results to show the costs and benefits of Kahan summation, elucidating when (if ever) it is worthwhile. Repeat this for the three possible qcomp precisions (see here).

It is sufficient to test only in single-CPU settings.

Testing

A quick and dirty way to see the effect on the accuracy is to apply a large CompMatr upon a fixed state (with and without compensation) and compare a resulting amplitude. For example:

#include "quest.h"

int main() {
    initQuESTEnv();

    // choose the matrix size, with arbitrary target qubits
    int numTargets = 13;
    int targets[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};

    // create a matrix (elements are unimportant; we'll disable unitarity validation)
    CompMatr matrix = createCompMatr(numTargets);
    for (qindex i=0; i<matrix.numRows; i++)
        for (qindex j=0; j<matrix.numRows; j++)
            matrix.cpuElems[i][j] = 1;
    syncCompMatr(matrix);

    // create a random Qureg (fixed between executions) which need not be
    // any bigger than the operator matrix in order to test accuracy
    unsigned seed = 123456789u;
    setSeeds(&seed, 1);
    Qureg qureg = createQureg(numTargets);
    initRandomPureState(qureg);

    // apply the matrix and obtain the effect on the first amplitude
    setValidationEpsilon(0);
    applyCompMatr(qureg, targets, numTargets, matrix);
    qcomp amp = getQuregAmp(qureg, 0);

    // report the amplitude at a large precision
    setMaxNumReportedSigFigs(20);
    reportScalar("amp", amp);

    destroyCompMatr(matrix);
    destroyQureg(qureg);
    finalizeQuESTEnv();
    return 0;
}

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions