Skip to content

serval-uni-lu/divkc

Repository files navigation

DivKC: A Divide-and-Conquer Approach to Knowledge Compilation

This repository contains the programs and data associated with the article 'DivKC: A Divide-and-Conquer Approach to Knowledge Compilation'.

Contents

  • The cppddnnf folder contains the implementation of the algorithms presented in the paper that operate on the compiled form. The implemented algorithms allow for approximate model counting and random sampling.

    The folder D4 contains a slightly modified version of the D4 compiler. The modification is only present to also include the unconstrained variables in the generated .nnf file therefore simplifying further usage of the d-DNNF.

  • The current implementation in cppddnnf requires this modification to function. Therefore, to replace D4 by a different compiler, either the compiler or the implementation in cppddnnf has to be modified.

  • The projection folder contains the code necessary to project a CNF formula on a subset of its variables.

  • The results folder contains the raw results that are presented in the article as well as more detailed versions of the tables available in the article.

  • The splitter folder contains the implementation of the splitting procedure presented in the paper.

Usage

To use our approach, the cppddnnf, splitter, projection, and D4 projects will need to be compiled by following the instructions in the respective folders.

Dependencies

The dependencies on Debian-based systems can be installed as follows:

apt install g++ make zlib1g-dev libboost-dev libgmp-dev libgmpxx4ldbl \
    ninja-build libboost-program-options-dev libboost-random-dev

Building

These commands should build the individual projects. Detailed instructions are given in each subdirectory.

cd cppddnnf
g++ gen.cpp -o gen
./gen
ninja clean
ninja
cd ..

cd splitter
g++ gen.cpp -o gen
./gen
ninja clean
ninja
cd ..

cd projection
g++ gen.cpp -o gen
./gen
ninja clean
ninja
cd ..

cd D4/d4
make clean
make -j
cd ../..

Given that compilation can require large amounts of memory, an apptainer script is proposed in the D4 directory. The container will limit D4 to 16 GB of memory and one hour of computation.

apptainer build --fakeroot d4.sif d4.def

The resulting container (d4.sif) can be used just like the native executable. Instructions on how to install apptainer are available on apptainer.org.

Doxygen

The cppddnnf, projection and splitter folders contain Doxyfiles. The documentation of the code can be generated with Doxygen by running

doxygen Doxyfile

This generates html and LaTeX documentation in the documentation/html and documentation/latex folders respectively. The main page for the html documentation is located at documentation/html/index.html. The generated LaTeX folder will contain a makefile. Running the makefile will generate a file called refman.pdf.

Basic components of DivKC

  • The folder splitter contains the used splitting heuristic. A singularity script is also provided in the folder.

  • The folder projection contains the program used for CNF projection. A singularity script is also provided in the folder.

  • The folder cppddnnf contains the C++ utilities necessary to work with our compiled form.

Example Usage

Compiling a Formula

First, we need to compute a projection set for the formula t.cnf by using the splitter.

./splitter/build/splitter --cnf t.cnf --nb 4 > t.cnf.log
cat t.cnf.log t.cnf > t.cnf.proj

We then project the formula by using the projection tool which generates the CNF representation of the projected formula and the upper bound as described in the paper:

./projection/build/projection --cnf t.cnf.proj

We then use D4 to compile the projected formula and the upper bound to d-DNNF.

./D4/d4/d4 -dDNNF "t.cnf.proj.p" -out="t.cnf.pnnf"
./D4/d4/d4 -dDNNF "t.cnf.proj.pup" -out="t.cnf.unnf"

appmc "t.cnf" 1000 0.01

We now have a d-DNNF representation as described in the paper. Note that the files generated during this entire process are somewhat important. The approximate counting and sampling executables expect as input the path to the original CNF formula t.cnf and as a consequence they expect to find t.cnf.proj.log as generated by projection and the formulae t.cnf.pnnf and t.cnf.unnf as generated above. The remaining files were useful during the process but are no longer used afterwards and can be deleted.

For your convenience, a script called compile_formula.sh is provided to perform the above calls and remove the unused files (however, the script uses the default parameters). Usage:

bash compile_formula.sh t.cnf

Approximate Model Counting

Once the formula has been compiled, you can perform approximate model counting as follows:

./cppddnnf/build/appmc --cnf t.cnf

This will output the estimates in CSV format with the header: N,Y,Yl,Yh , where N is the number of solutions used, Y is the estimate and Yl and Yh are the estimated lower and upper bounds computed with the central limit theorem.

Approximate Uniform Random Sampling

Once the formula has been compiled, you can perform approximate model counting as follows:

./cppddnnf/build/sampler --cnf t.cnf --k 50 --nb 10

This will use a buffer size of 50 and generate 10 samples.

The program will start by printing either c true uniformity or c heuristic based uniformity to show if the algorithm used is exact or heuristic based. This depends on the formula and on the buffer size.

The algorithm will then print the solutions. Each solution is a list of signed integers representing the literals and separated by spaces. Each solution is terminated by 0.

./cppddnnf/build/rsampler can be used instead of ./cppddnnf/build/sampler. More details can be found in cppddnnf/README.md.

Docker

A Dockerfile is also provided. The image can be build with

docker build -t divkc .

The executable files are then available in /divkc on the container. To compile a formula t.cnf by using the docker container you can either use the provided docker_compile_formula.sh bash script or you can run the individual commands manually to retain more control over the program options.

The docker_compile_formula.sh script is used as follows:

bash docker_compile_formula.sh t.cnf

A prebuilt image is available on Zenodo. The image can be loaded with

docker load -i divkc_docker_image.tar.gz

The individual commands to run are as follows:

docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
    /divkc/splitter --cnf "t.cnf" > "t.cnf.log"

cat "t.cnf.log" "t.cnf" > "t.cnf.proj"

docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
    /divkc/projection --cnf "t.cnf.proj"

docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
    /divkc/wrap 16000 3600 \
    /divkc/d4 -dDNNF "t.cnf.proj.p" -out="t.cnf.pnnf"

docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
    /divkc/wrap 16000 3600 \
    /divkc/d4 -dDNNF "t.cnf.proj.pup" -out="t.cnf.unnf"

In this example we use /divkc/wrap to limit D4. With the parameters in the example, wrap will kill the call to D4 if it exceeds 16000 MB of memory or if the execution takes longer than 3600 seconds. Moreover, the wrap command prints a summary of the used resources on exit. The summary is printed in CSV format with the following columns: the used command, the exit status (done, err, mem, mem err, timeout or timeout err), the used memory in KB, the execution time in seconds.

Once the compilation has been done, we can perform approximate model counting as follows:

docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
    /divkc/appmc --cnf t.cnf

Similarily, we can perform random sampling with /divkc/sampler:

docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
    /divkc/sampler --cnf t.cnf --nb 10 --k 50

Alternatively with /divkc/rsampler:

docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
    /divkc/rsampler --cnf t.cnf --nb 10 --k 50000

Example with outputs

Running the following command bash docker_compile_formula.sh t.cnf produced the following output:

Exact lower and upper bounds to the true model count:
               file,low,high
cnf.bound.csv: t.cnf, 2, 54000

The number of variables and the number of clauses in the formula
             file,#v,#c
cnf.cls.csv: t.cnf, 45, 104

The time (s) and memory (KB) usage of the d-DNNF compilation for G_P
          file, state, mem, time
pnnf.csv: t.cnf, done, 51265, 0.0246568

The time (s) and memory (KB) usage of the d-DNNF compilation for G_U
          file, state, mem, time
unnf.csv: t.cnf.unnf, done, 584142, 0.0247177

The time (s) and memory (KB) usage of the splitting procedure
           file, state, mem, time
split.csv: t.cnf, done, 7249, 0.00236592

The time (s) and memory (KB) usage of the projection procedure
          file, state, mem, time
proj.csv: t.cnf.proj, done, 6926, 0.00233837

We can see that the splitting procedure terminated successfully because we find split.csv, ..., done, .... Similarily for the projection procedure (proj.csv, ..., done, ...), the compilation of G_U (unnf.csv) and the compilation of G_P (pnnf.csv). This compilation showed us that the true model count lies between 2 and 54000 as is shown by the line cnf.bound.csv: t.cnf, 2, 54000.

We may then run the approximate model counting procedure as follows:

docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
    /divkc/appmc --cnf t.cnf

And we get:

N,Y,Yl,Yh
10000, 25989, 25414, 26564.1

We see that the algorithm used the full 10000 samples because early stopping is disabled by default. The estimate (Y) is 25989 and the confidence interval ([Yl, Yh]) is [25414, 26564.1]. If we use D4 to compute the true model count we find that the formula has exaclty 26256 solutions.

If we run the sampling procedure:

docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
    /divkc/sampler --cnf t.cnf --nb 10 --k 50

We get:

c true uniformity
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 -13 14 15 -16 17 -18 -19 20 21 -22 23 -24 25 -26 -27 28 -29 -30 31 -32 -33 -34 -35 -36 -37 38 -39 40 41 42 43 44 -45 0
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 13 14 15 16 17 -18 -19 20 21 22 -23 -24 25 -26 -27 28 -29 -30 -31 32 -33 -34 35 36 -37 -38 -39 40 41 42 43 -44 45 0
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 -13 14 -15 -16 -17 -18 19 20 21 -22 23 -24 25 -26 -27 28 -29 -30 31 -32 -33 -34 35 36 -37 -38 -39 40 41 42 43 -44 45 0
1 -2 -3 -4 -5 -6 -7 8 -9 10 11 12 -13 14 15 -16 -17 18 -19 20 21 -22 23 -24 25 -26 -27 28 -29 -30 31 -32 -33 -34 35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 2 -3 4 -5 -6 -7 8 9 -10 11 12 13 14 15 16 17 -18 -19 20 21 22 -23 -24 25 -26 -27 28 29 -30 -31 -32 -33 34 -35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 -13 14 -15 16 17 18 -19 20 21 -22 -23 24 25 -26 -27 28 29 -30 -31 -32 -33 34 35 -36 37 -38 -39 40 41 42 43 -44 45 0
1 2 3 -4 -5 -6 -7 8 -9 10 11 12 -13 14 -15 16 17 -18 -19 20 21 -22 -23 24 25 -26 -27 28 -29 30 -31 -32 -33 -34 35 36 -37 -38 -39 40 41 42 43 -44 45 0
1 2 -3 4 -5 -6 -7 8 9 -10 11 12 13 14 15 16 -17 18 -19 20 21 -22 -23 24 25 -26 -27 28 29 -30 -31 -32 33 -34 35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 2 3 -4 -5 -6 -7 8 -9 10 11 12 13 14 15 -16 17 -18 -19 20 21 22 -23 -24 25 -26 -27 28 -29 -30 -31 32 -33 -34 35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 2 -3 4 -5 -6 -7 8 -9 10 11 12 -13 14 -15 -16 17 -18 19 20 21 22 -23 -24 25 -26 -27 28 -29 -30 -31 32 -33 -34 -35 -36 -37 38 -39 40 41 42 43 44 -45 0

The c true uniformity indicates that the path count of G_P is low enough for the procedure to use the exact uniform random sampling procedure. Therefore, the heuristic-based algorithm is not used here.

If we run (notice the value for --k 1):

docker run --rm -v "$(pwd):/work:Z" -w "/work" divkc \
    /divkc/sampler --cnf t.cnf --nb 10 --k 1

We get:

c heuristic based uniformity
1 -2 -3 -4 -5 -6 -7 8 -9 10 11 12 -13 14 15 -16 -17 -18 -19 20 -21 -22 -23 -24 -25 -26 27 -28 -29 -30 -31 -32 -33 -34 -35 -36 -37 -38 39 40 41 42 43 44 -45 0
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 13 14 15 -16 -17 -18 -19 20 21 -22 -23 24 25 -26 -27 28 -29 30 -31 -32 -33 -34 -35 36 -37 -38 -39 40 41 42 43 -44 45 0
1 2 -3 4 -5 -6 -7 8 -9 10 11 12 -13 14 15 16 -17 18 19 20 21 -22 23 -24 25 -26 -27 28 -29 -30 31 -32 -33 -34 35 36 -37 -38 -39 40 41 42 43 -44 45 0
1 -2 -3 -4 5 6 -7 8 9 -10 11 12 -13 14 -15 -16 -17 18 -19 20 -21 -22 -23 -24 -25 26 -27 -28 -29 -30 -31 -32 -33 -34 -35 -36 -37 38 -39 -40 41 42 -43 -44 -45 0
1 2 3 -4 -5 -6 -7 8 9 -10 11 12 13 14 -15 16 17 18 -19 20 21 -22 -23 24 25 -26 -27 28 -29 30 -31 -32 -33 -34 -35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 -2 -3 -4 -5 -6 -7 8 9 -10 11 12 -13 14 -15 16 17 -18 19 20 -21 -22 -23 -24 -25 -26 27 -28 -29 -30 -31 -32 -33 -34 -35 -36 -37 -38 39 40 41 42 43 44 -45 0
1 -2 -3 -4 -5 -6 -7 8 -9 10 11 12 -13 14 15 16 -17 18 19 20 21 -22 23 -24 25 -26 -27 28 29 -30 -31 -32 -33 34 -35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 -2 -3 -4 -5 -6 -7 8 9 -10 11 12 13 14 15 16 17 -18 19 20 21 22 -23 -24 25 -26 -27 28 29 -30 -31 -32 33 -34 35 -36 -37 -38 39 40 41 42 43 -44 45 0
1 -2 -3 -4 -5 -6 -7 8 9 -10 11 12 -13 14 -15 16 -17 -18 -19 20 21 22 -23 -24 25 -26 -27 28 29 -30 -31 -32 -33 34 35 -36 -37 -38 39 40 41 42 43 44 -45 0
1 -2 -3 -4 5 6 -7 8 -9 10 11 12 -13 14 -15 -16 -17 -18 19 20 -21 -22 -23 -24 -25 26 -27 -28 -29 -30 -31 -32 -33 -34 -35 -36 -37 -38 39 -40 41 42 -43 -44 -45 0

Notice the c heuristic based uniformity line indicating that the heuristic-based algirithm is used.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors