This repository contains the programs and data associated with the article 'DivKC: A Divide-and-Conquer Approach to Knowledge Compilation'.
-
The
cppddnnffolder contains the implementation of the algorithms presented in the paper that operate on the compiled form. The implemented algorithms allow for approximate model counting and random sampling.The folder
D4contains a slightly modified version of the D4 compiler. The modification is only present to also include the unconstrained variables in the generated .nnf file therefore simplifying further usage of the d-DNNF. -
The current implementation in
cppddnnfrequires this modification to function. Therefore, to replaceD4by a different compiler, either the compiler or the implementation incppddnnfhas to be modified. -
The
projectionfolder contains the code necessary to project a CNF formula on a subset of its variables. -
The
resultsfolder contains the raw results that are presented in the article as well as more detailed versions of the tables available in the article. -
The
splitterfolder contains the implementation of the splitting procedure presented in the paper.
To use our approach, the cppddnnf, splitter, projection, and D4 projects will need to be compiled
by following the instructions in the respective folders.
The dependencies on Debian-based systems can be installed as follows:
apt install g++ make zlib1g-dev libboost-dev libgmp-dev libgmpxx4ldbl \
ninja-build libboost-program-options-dev libboost-random-dev
These commands should build the individual projects. Detailed instructions are given in each subdirectory.
cd cppddnnf
g++ gen.cpp -o gen
./gen
ninja clean
ninja
cd ..
cd splitter
g++ gen.cpp -o gen
./gen
ninja clean
ninja
cd ..
cd projection
g++ gen.cpp -o gen
./gen
ninja clean
ninja
cd ..
cd D4/d4
make clean
make -j
cd ../..
Given that compilation can require large amounts of memory, an apptainer script
is proposed in the D4 directory. The container will limit D4 to 16 GB of memory and
one hour of computation.
apptainer build --fakeroot d4.sif d4.def
The resulting container (d4.sif) can be used just like the native executable.
Instructions on how to install apptainer are available on apptainer.org.
The cppddnnf, projection and splitter folders contain Doxyfiles.
The documentation of the code can be generated with
Doxygen
by running
doxygen Doxyfile
This generates html and LaTeX documentation in the documentation/html and documentation/latex
folders respectively.
The main page for the html documentation is located at documentation/html/index.html.
The generated LaTeX folder will contain a makefile. Running the makefile
will generate a file called refman.pdf.
-
The folder
splittercontains the used splitting heuristic. A singularity script is also provided in the folder. -
The folder
projectioncontains the program used for CNF projection. A singularity script is also provided in the folder. -
The folder
cppddnnfcontains the C++ utilities necessary to work with our compiled form.
First, we need to compute a projection set for the formula t.cnf by using the splitter.
./splitter/build/splitter --cnf t.cnf --nb 4 > t.cnf.log
cat t.cnf.log t.cnf > t.cnf.proj
We then project the formula by using the projection tool
which generates the CNF representation of the projected formula and the upper bound
as described in the paper:
./projection/build/projection --cnf t.cnf.proj
We then use D4 to compile the projected formula and the upper bound to d-DNNF.
./D4/d4/d4 -dDNNF "t.cnf.proj.p" -out="t.cnf.pnnf"
./D4/d4/d4 -dDNNF "t.cnf.proj.pup" -out="t.cnf.unnf"
appmc "t.cnf" 1000 0.01
We now have a d-DNNF representation as described in the paper.
Note that the files generated during this entire process are somewhat important.
The approximate counting and sampling executables expect as input the path to the
original CNF formula t.cnf and as a consequence they expect to find
t.cnf.proj.log as generated by projection and the formulae
t.cnf.pnnf and t.cnf.unnf as generated above.
The remaining files were useful during the process but are no longer used
afterwards and can be deleted.
For your convenience, a script called compile_formula.sh is provided to perform the above
calls and remove the unused files (however, the script uses the default parameters).
Usage:
bash compile_formula.sh t.cnf
Once the formula has been compiled, you can perform approximate model counting as follows:
./cppddnnf/build/appmc --cnf t.cnf
This will output the estimates in CSV format with the header: N,Y,Yl,Yh
, where N is the number of solutions used, Y is the estimate and Yl and Yh are
the estimated lower and upper bounds computed with the central limit theorem.
Once the formula has been compiled, you can perform approximate model counting as follows:
./cppddnnf/build/sampler --cnf t.cnf --k 50 --nb 10
This will use a buffer size of 50 and generate 10 samples.
The program will start by printing either
c true uniformity or c heuristic based uniformity to show if the algorithm
used is exact or heuristic based. This depends on the formula and on the buffer size.
The algorithm will then print the solutions.
Each solution is a list of signed integers representing the literals and separated
by spaces. Each solution is terminated by 0.
./cppddnnf/build/rsampler can be used instead of ./cppddnnf/build/sampler.
More details can be found in cppddnnf/README.md.
A Dockerfile is also provided. The image can be build with
docker build -t divkc .
The executable files are then available in /divkc on the container.
To compile a formula t.cnf by using the docker container you can either use the provided
docker_compile_formula.sh bash script or you can run the individual commands
manually to retain more control over the program options.
The docker_compile_formula.sh script is used as follows:
bash docker_compile_formula.sh t.cnf
A prebuilt image is available on Zenodo. The image can be loaded with
docker load -i divkc_docker_image.tar.gz
The individual commands to run are as follows:
docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
/divkc/splitter --cnf "t.cnf" > "t.cnf.log"
cat "t.cnf.log" "t.cnf" > "t.cnf.proj"
docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
/divkc/projection --cnf "t.cnf.proj"
docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
/divkc/wrap 16000 3600 \
/divkc/d4 -dDNNF "t.cnf.proj.p" -out="t.cnf.pnnf"
docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
/divkc/wrap 16000 3600 \
/divkc/d4 -dDNNF "t.cnf.proj.pup" -out="t.cnf.unnf"
In this example we use /divkc/wrap to limit D4.
With the parameters in the example, wrap will kill the call to D4 if it
exceeds 16000 MB of memory or if the execution takes longer than 3600 seconds.
Moreover, the wrap command prints a summary of the used resources on exit.
The summary is printed in CSV format with the following columns:
the used command, the exit status (done, err, mem, mem err, timeout or timeout err),
the used memory in KB, the execution time in seconds.
Once the compilation has been done, we can perform approximate model counting as follows:
docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
/divkc/appmc --cnf t.cnf
Similarily, we can perform random sampling with /divkc/sampler:
docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
/divkc/sampler --cnf t.cnf --nb 10 --k 50
Alternatively with /divkc/rsampler:
docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
/divkc/rsampler --cnf t.cnf --nb 10 --k 50000
Running the following command bash docker_compile_formula.sh t.cnf
produced the following output:
Exact lower and upper bounds to the true model count:
file,low,high
cnf.bound.csv: t.cnf, 2, 54000
The number of variables and the number of clauses in the formula
file,#v,#c
cnf.cls.csv: t.cnf, 45, 104
The time (s) and memory (KB) usage of the d-DNNF compilation for G_P
file, state, mem, time
pnnf.csv: t.cnf, done, 51265, 0.0246568
The time (s) and memory (KB) usage of the d-DNNF compilation for G_U
file, state, mem, time
unnf.csv: t.cnf.unnf, done, 584142, 0.0247177
The time (s) and memory (KB) usage of the splitting procedure
file, state, mem, time
split.csv: t.cnf, done, 7249, 0.00236592
The time (s) and memory (KB) usage of the projection procedure
file, state, mem, time
proj.csv: t.cnf.proj, done, 6926, 0.00233837
We can see that the splitting procedure terminated successfully because we find
split.csv, ..., done, .... Similarily for the projection procedure
(proj.csv, ..., done, ...), the compilation of G_U (unnf.csv) and the compilation
of G_P (pnnf.csv). This compilation showed us that the true model count
lies between 2 and 54000 as is shown by the line cnf.bound.csv: t.cnf, 2, 54000.
We may then run the approximate model counting procedure as follows:
docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
/divkc/appmc --cnf t.cnf
And we get:
N,Y,Yl,Yh
10000, 25989, 25414, 26564.1
We see that the algorithm used the full 10000 samples because early stopping
is disabled by default. The estimate (Y) is 25989 and the confidence interval
([Yl, Yh]) is [25414, 26564.1].
If we use D4 to compute the true model count we find that the formula has
exaclty 26256 solutions.
If we run the sampling procedure:
docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
/divkc/sampler --cnf t.cnf --nb 10 --k 50
We get:
c true uniformity
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 -13 14 15 -16 17 -18 -19 20 21 -22 23 -24 25 -26 -27 28 -29 -30 31 -32 -33 -34 -35 -36 -37 38 -39 40 41 42 43 44 -45 0
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 13 14 15 16 17 -18 -19 20 21 22 -23 -24 25 -26 -27 28 -29 -30 -31 32 -33 -34 35 36 -37 -38 -39 40 41 42 43 -44 45 0
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 -13 14 -15 -16 -17 -18 19 20 21 -22 23 -24 25 -26 -27 28 -29 -30 31 -32 -33 -34 35 36 -37 -38 -39 40 41 42 43 -44 45 0
1 -2 -3 -4 -5 -6 -7 8 -9 10 11 12 -13 14 15 -16 -17 18 -19 20 21 -22 23 -24 25 -26 -27 28 -29 -30 31 -32 -33 -34 35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 2 -3 4 -5 -6 -7 8 9 -10 11 12 13 14 15 16 17 -18 -19 20 21 22 -23 -24 25 -26 -27 28 29 -30 -31 -32 -33 34 -35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 -13 14 -15 16 17 18 -19 20 21 -22 -23 24 25 -26 -27 28 29 -30 -31 -32 -33 34 35 -36 37 -38 -39 40 41 42 43 -44 45 0
1 2 3 -4 -5 -6 -7 8 -9 10 11 12 -13 14 -15 16 17 -18 -19 20 21 -22 -23 24 25 -26 -27 28 -29 30 -31 -32 -33 -34 35 36 -37 -38 -39 40 41 42 43 -44 45 0
1 2 -3 4 -5 -6 -7 8 9 -10 11 12 13 14 15 16 -17 18 -19 20 21 -22 -23 24 25 -26 -27 28 29 -30 -31 -32 33 -34 35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 2 3 -4 -5 -6 -7 8 -9 10 11 12 13 14 15 -16 17 -18 -19 20 21 22 -23 -24 25 -26 -27 28 -29 -30 -31 32 -33 -34 35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 2 -3 4 -5 -6 -7 8 -9 10 11 12 -13 14 -15 -16 17 -18 19 20 21 22 -23 -24 25 -26 -27 28 -29 -30 -31 32 -33 -34 -35 -36 -37 38 -39 40 41 42 43 44 -45 0
The c true uniformity indicates that the path count of G_P is low enough for the
procedure to use the exact uniform random sampling procedure. Therefore, the heuristic-based
algorithm is not used here.
If we run (notice the value for --k 1):
docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
/divkc/sampler --cnf t.cnf --nb 10 --k 1
We get:
c heuristic based uniformity
1 -2 -3 -4 -5 -6 -7 8 -9 10 11 12 -13 14 15 -16 -17 -18 -19 20 -21 -22 -23 -24 -25 -26 27 -28 -29 -30 -31 -32 -33 -34 -35 -36 -37 -38 39 40 41 42 43 44 -45 0
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 13 14 15 -16 -17 -18 -19 20 21 -22 -23 24 25 -26 -27 28 -29 30 -31 -32 -33 -34 -35 36 -37 -38 -39 40 41 42 43 -44 45 0
1 2 -3 4 -5 -6 -7 8 -9 10 11 12 -13 14 15 16 -17 18 19 20 21 -22 23 -24 25 -26 -27 28 -29 -30 31 -32 -33 -34 35 36 -37 -38 -39 40 41 42 43 -44 45 0
1 -2 -3 -4 5 6 -7 8 9 -10 11 12 -13 14 -15 -16 -17 18 -19 20 -21 -22 -23 -24 -25 26 -27 -28 -29 -30 -31 -32 -33 -34 -35 -36 -37 38 -39 -40 41 42 -43 -44 -45 0
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 13 14 -15 16 17 18 -19 20 21 -22 -23 24 25 -26 -27 28 -29 30 -31 -32 -33 -34 -35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 -2 -3 -4 -5 -6 -7 8 9 -10 11 12 -13 14 -15 16 17 -18 19 20 -21 -22 -23 -24 -25 -26 27 -28 -29 -30 -31 -32 -33 -34 -35 -36 -37 -38 39 40 41 42 43 44 -45 0
1 -2 -3 -4 -5 -6 -7 8 -9 10 11 12 -13 14 15 16 -17 18 19 20 21 -22 23 -24 25 -26 -27 28 29 -30 -31 -32 -33 34 -35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 -2 -3 -4 -5 -6 -7 8 9 -10 11 12 13 14 15 16 17 -18 19 20 21 22 -23 -24 25 -26 -27 28 29 -30 -31 -32 33 -34 35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 -2 -3 -4 -5 -6 -7 8 9 -10 11 12 -13 14 -15 16 -17 -18 -19 20 21 22 -23 -24 25 -26 -27 28 29 -30 -31 -32 -33 34 35 -36 -37 -38 39 40 41 42 43 44 -45 0
1 -2 -3 -4 5 6 -7 8 -9 10 11 12 -13 14 -15 -16 -17 -18 19 20 -21 -22 -23 -24 -25 26 -27 -28 -29 -30 -31 -32 -33 -34 -35 -36 -37 -38 39 -40 41 42 -43 -44 -45 0
Notice the c heuristic based uniformity line indicating that the heuristic-based algirithm is
used.