You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For all development, please work out of the devel branch
Summary
Update applyCompMatr (specifically subroutine cpu_statevec_anyCtrlAnyTargDenseMatr_sub()) to use compensated summation and measure the effect on (single CPU) runtime and accuracy for few and many target qubits.
Context
The logic of many QuEST simulation functions involve linearly combining a fixed-number of amplitudes:
In such functions, each output amplitude is the linear combination of a dynamic number of input amplitudes; typically the matrix dimension (2 to the power of the number of target qubits). While the outer loop is parallelised, the inner loops are serially evaluated by a single thread.
In this issue, we consider the numerical precision of performing
A realistic maximum size of the input CompMatr is 16 qubits, requiring 16 GiB and containing 2^32 = 4,294,967,296 ~ 4B elements. In this end-regime, this exponentially-large loop is performing a reduction of billions of floating-point complex numbers which may vary enormously in their relative size. As such, the current implementation is numerically unstable and is liable to catastrophic cancellation, so that the modification to the state is inaccurate for many qubits. This is likely to break state normalisation and corrupt other state properties.
To remedy this, we could replace that naive sum in statevec_anyCtrlAnyTargDenseMatr_sub() with compensated summation, like Kahan summation, to improve the numerical accuracy of the final amplitudes. An example of Kahan summation of qcomp (QuEST's complex primitive type) already exists in the unit test utilities here.
However, switching to compensated summation might not be worthwhile. Kahan summation involves more total operations than naive summation, slowing things down and potentially jeopardising optimisations like auto-vectorisation. This might be evident for small Quregbefore simulation becomes memory bandwidth bound. The benefit to accuracy might also be negligible until the number of target qubits is large, gratuitously slowing few-target simulation. It might be better to use compensation only above some threshold, or perhaps the benefit is never worth the increased code complexity.
Details
Modify cpu_statevec_anyCtrlAnyTargDenseMatr_sub()) to use Kahan summation and measure the runtime and accuracy effect (compared to the current implementation without compensation) at different system sizes. Visualise the results to show the costs and benefits of Kahan summation, elucidating when (if ever) it is worthwhile. Repeat this for the three possible qcomp precisions (see here).
It is sufficient to test only in single-CPU settings.
Testing
A quick and dirty way to see the effect on the accuracy is to apply a large CompMatr upon a fixed state (with and without compensation) and compare a resulting amplitude. For example:
#include"quest.h"intmain() {
initQuESTEnv();
// choose the matrix size, with arbitrary target qubitsint numTargets = 13;
int targets[] = {0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20};
// create a matrix (elements are unimportant; we'll disable unitarity validation)
CompMatr matrix = createCompMatr(numTargets);
for (qindex i=0; i<matrix.numRows; i++)
for (qindex j=0; j<matrix.numRows; j++)
matrix.cpuElems[i][j] = 1;
syncCompMatr(matrix);
// create a random Qureg (fixed between executions) which need not be// any bigger than the operator matrix in order to test accuracyunsigned seed = 123456789u;
setSeeds(&seed, 1);
Qureg qureg = createQureg(numTargets);
initRandomPureState(qureg);
// apply the matrix and obtain the effect on the first amplitudesetValidationEpsilon(0);
applyCompMatr(qureg, targets, numTargets, matrix);
qcomp amp = getQuregAmp(qureg, 0);
// report the amplitude at a large precisionsetMaxNumReportedSigFigs(20);
reportScalar("amp", amp);
destroyCompMatr(matrix);
destroyQureg(qureg);
finalizeQuESTEnv();
return0;
}
Note
For all development, please work out of the
develbranchSummary
Update
applyCompMatr(specifically subroutinecpu_statevec_anyCtrlAnyTargDenseMatr_sub()) to use compensated summation and measure the effect on (single CPU) runtime and accuracy for few and many target qubits.Context
The logic of many QuEST simulation functions involve linearly combining a fixed-number of amplitudes:
In contrast, functions (like here) which accept any-sized matrices (
applyCompMatr,mixSuperOpandmixKrausMap) have a logic resembling:In such functions, each output amplitude is the linear combination of a dynamic number of input amplitudes; typically the matrix dimension (
2to the power of the number of target qubits). While the outer loop is parallelised, the inner loops are serially evaluated by a single thread.In this issue, we consider the numerical precision of performing
A realistic maximum size of the input
CompMatris16qubits, requiring16 GiBand containing2^32 = 4,294,967,296 ~ 4Belements. In this end-regime, this exponentially-large loop is performing a reduction of billions of floating-point complex numbers which may vary enormously in their relative size. As such, the current implementation is numerically unstable and is liable to catastrophic cancellation, so that the modification to the state is inaccurate for many qubits. This is likely to break state normalisation and corrupt other state properties.To remedy this, we could replace that naive sum in
statevec_anyCtrlAnyTargDenseMatr_sub()with compensated summation, like Kahan summation, to improve the numerical accuracy of the final amplitudes. An example of Kahan summation ofqcomp(QuEST's complex primitive type) already exists in the unit test utilities here.However, switching to compensated summation might not be worthwhile. Kahan summation involves more total operations than naive summation, slowing things down and potentially jeopardising optimisations like auto-vectorisation. This might be evident for small
Quregbefore simulation becomes memory bandwidth bound. The benefit to accuracy might also be negligible until the number of target qubits is large, gratuitously slowing few-target simulation. It might be better to use compensation only above some threshold, or perhaps the benefit is never worth the increased code complexity.Details
Modify
cpu_statevec_anyCtrlAnyTargDenseMatr_sub()) to use Kahan summation and measure the runtime and accuracy effect (compared to the current implementation without compensation) at different system sizes. Visualise the results to show the costs and benefits of Kahan summation, elucidating when (if ever) it is worthwhile. Repeat this for the three possibleqcompprecisions (see here).Testing
A quick and dirty way to see the effect on the accuracy is to apply a large
CompMatrupon a fixed state (with and without compensation) and compare a resulting amplitude. For example: