By **Pavel Surynek**

Charles University in Prague, Faculty of Mathematics and Physics

**Abstract.**

A novel eager encoding of the ALLDIFFERENT constraint over bit-vectors is presented in this short paper. It is based on 1‑to‑1 mapping of the input bit-vectors to a linearly ordered set of auxiliary bit-vectors. Experiments with four SAT solvers showed that the new encoding could be solved order of magnitudes faster than the standard encoding in a hard unsatisfiable case.

### 1 INTRODUCTION AND MOTIVATION

Models of many real-life problems require a subset of modeling variables to be pair-wise distinct. This requirement is known as an ALLDIFFERENT constraint [4] in the constraint programming context. As the SAT solving technology [1], [3], [5] is becoming a tool of choice in many practical applications, efficient manipulation with the ALLDIFFERENT constraint in SAT solvers is of interest. Unlike other works on translating the ALLDIFFERENT constraint into SAT that use direct encoding of variable’s domains [4], we study how to encode the constraint over the set of bit-vectors, which essentially use binary encoding. We present a new eager encoding that maps the given set of bit-vectors to a linearly ordered set of auxiliary bit-vectors. We show that the new encoding is more efficient for hard unsatisfiable cases of the constraint on which SAT solvers struggle with the existent encoding for bit-vectors [2].

### 2 BACKGROUND – STANDARD MODEL

Suppose to have a set of bit-vectors {B^{1}, … , B^{n} } each of length *l*. Bit-vectors are interpreted as non-negative integers. The ALLDIFFERENT constraint over B^{1}, … , B^{n} – denoted as ALLDIFFERENT(B^{1}, … , B^{n} ) – requires that numbers represented by the bit-vectors are all distinct.

The standard encoding [2] basically follows the scheme where pair-wise differences are encoded:

Trivially, it is possible to encode the individual inequalities as follows. Let the **i**-th bit of the **k**-th bit-vector with i in {1,…,l} and k in {1,…,n} be denoted as B^{k}_{i}

However, if unfolded into the CNF representation though the distributive rule it results into too many clauses which is impractical. Therefore encoding using auxiliary propositional variables is used. It follows the standard technique of Tseitin’s hierarchical encoding. A fresh propositional variable is introduced for each inequality between individual bits of the involved bit-vectors. That is, there is a new variable a_{i}^{i,j} for every i,j in {1,…,n} with i<j and I in {1,…,*l*} *(remark by Editors: boldface i and normal i refer to different counters, the difference is clear in the original, word, submission-see the PDF in case of doubt)*. The auxiliary variable indicates if the corresponding bits in the inequality between bit-vectors differ or not. Thus, the following clauses are included to express this interpretation:

Bit-vectors B^{i} and B^{j} differ if they differ in at least one position; that is, the following clauses should be included:

Notice that auxiliary variables are linked to the original bits only in one direction. If a_{i}^{i,j }is set to **TRUE** then B_{i}^{i} and B_{i}^{j} are forced to differ. However, if a_{i}^{i,j }is set to FALSE then B_{i}^{i} and B_{i}^{j} are left unconstrained.

**Proposition ****1**** (STANDARD ENCODING SIZE).** The standard encoding of the ALLDIFFERENT constraint requires *l n* propositonal variables to represent the bit-vectors and *l n (n+1) / 2* auxiliary propositional variables; that is O(l n^{2}) variables altogether. The number of clauses is *1 + l n (n+1)*, ; that is, that is O(*l n ^{2}*). ■

**Figure 1:** Illustration of the standard and the bijection ALLDIFFERENT encodings. In the bijection encoding, a 1-to-1 mapping of the bit-vectors is found first. Then the values of bit-vectors are forced to be linearly ordered according to their position in the mapping.

# 3 ALTERNATIVE BIJECTION ENCODING

We observed that a SAT solver struggles over the standard encoding especially in the unsatisfiable case according to our preliminary experiments. Therefore we developed an alternative encoding that is more suitable for this case. It maps the original bit-vectors to a linearly ordered set of auxiliary bit-vectors. First, a 1-to-1 mapping (*bijection*) between sets of bit-vectors needs to be modeled to enable this encoding style (see Figure 1 for illustration).

Let the new linearly ordered bit-vectors be denoted as A^{1}, … , A^{n} . Additionally bit‑vectors α^{1},..,α^{n} and β^{1},…,β^{n} of size ⌈lg_{2} n⌉ are introduced to model the bijection. The bit-vector α^{k} indicates what A^{i }with i in {1,…,m} the original B^{k} will be mapped to. Bit-vectors β^{j} are used to enforce that at most one original bit-vector is mapped to a single ordered bit-vector. The following integer constraints are to establish the bijection:

It is crucial that domains of bit-vectors α^{k }and β^{j} consists of exactly* n* values to ensure that the bijection is modeled correctly (extra values are forbidden). The individual integer implication is encoded with a single auxiliary propositional variable e^{k}_{j} as follows:

where in the last line integer constraints enforcing the ordering are introduced.

The individual inequality is encoded as a strict lexicographic ordering over the two bit-vectors. Now, *l* fresh propositional variables are introduced to indicate the first bit where A^{i} is less than A^{j}. The ordering itself then just means that there exists such a first bit where bit-vectors differ:

**Proposition 2 **(BIJECTION ENCODING SIZE). The bijection encoding requires *2l n* propositional variables to represent the bit-vectors, *2 n ⌈lg _{2} n⌉* variables to represent the bijection, and

*n*auxiliary propositional variables; that is propositional variables altogether. The number of clauses is

^{2}+ l*n*; that is, O(n

^{2}( 1 + l + ⌈lg_{2}n⌉) + (n-1)l(l+1)^{2}max{⌈lg

_{2}n⌉,l) + n l

^{2}). ■

**Table ****1****.** Comparison of sizes of the standard and the bijection encoding.

#bit-vectors (16-bits) |
Standard | Bijection | ||

#Variables | #Clauses | #Variables | #Clauses | |

64 | 67584 | 133056 | 9968 | 176943 |

128 | 266240 | 536448 | 28400 | 690031 |

256 | 1056768 | 2154240 | 90096 | 2756591 |

# 4 EXPERIMENTAL EVALUATION

As shown in Table 1, the bijection encoding has fewer variables while the number of clauses is slightly higher than in the standard encoding. Nevertheless, we also need runtime comparison. A setup where a *transition-phase* behavior was observed is presented. We used 32 bit-vectors consisting of 6 bits. Additionally, there was a lower bound and an upper bound per each bit‑vector.

If *d ≤ 34* is a given domain size, then the lower bound b^{k}_{L }and the upper bound b^{k}_{U } for the bit-vector B^{k }were generated randomly as follows: b^{k}_{L} was selected uniformly from [0..34-d] and b_{U }was set to b^{k}_{U+d}. Thus, b^{k}_{L } ≤ B^{k }≤^{k }b^{k}_{U }is enforced for each *k*. Finally, a single ALLDIFFERENT over 32 bit-vectors was added.

Four SAT solvers were used in the evaluation: MINISAT [3], GLUCOSE [1], and CRYPTOMINISAT [5]. The runtime was measured for different domain sizes ranging from to – Figure 2. For small unsatisfiability could be checked easily; for large the same could be done for satisfiability. The most interesting behavior occurred around *d=13* which represent difficult cases.

None of the tested SAT solvers was able to solve all the instances over the standard encoding in the time limit of 1 hour (wall clock limit per instance). The best performing over the standard encoding was GLUCOSE, which solved 29 instances out of 33 and was also the fastest. Over the bijection encoding, MINISAT and CRYPTOMINISAT solved all the instances and very importantly, the runtime of CRYPTOMINISAT was always below 2 seconds. GLUCOSE also performed relatively well compared to the standard encoding with 30 solved instances.

Generally, the standard encoding can be solved faster in the satisfiable case. However, the bijection encoding is significantly better in the hard unsatisfiable case. This is because it can be checked more easily for this encoding if there are enough values in domains of bit-vectors to establish the required pair-wise difference (at least by some SAT solvers). A single linearly ordered set of bit-vectors is matched into the domains while in case of the standard encoding all the orderings (permutations) of the original bit-vectors may be checked.

**Figure ****2****.** Instances are sorted according to the increasing runtime.

# 5 CONCLUSION

A new encoding for the ALLDIFFERENT constraint over bit-vectors based on 1-to-1 mapping has been proposed. It has fewer variables and it is more efficient in difficult unsatisfiable cases than the existent encoding [2] that uses pair-wise differences. In the future work, it would be also interesting to investigate how the presented eager encodings performs with respect to the strong ALLDIFFERENT propagators [4] integrated with the solver lazily via the SMT framework and also how it performs in applications.

### REFERENCES

[1] G. Audemard, L. Simon, ‘Predicting Learnt Clauses Quality in Modern SAT Solver’, *Proceedings of IJCAI 2009*, (2009).

[2] A. Biere, R. Brummayer, ‘Consistency Checking of All Different Constraints over Bit-Vectors within a SAT Solver’, *Proceedings of FMCAD 2008*, 1-4, (2008).

[3] N. Eén, N. Sörensson, ‘An Extensible SAT-solver’, *Proceedings of SAT 2003*, 502-518, (2003).

[4] P. Nightingale, I. Gent, ‘A New Encoding of AllDifferent into SAT’, *CP 2004 Workshop on Modelling and Reformulating CSPs*, (2004).

[5] J.-C. Régin, ‘A Filtering Algorithm for Constraints of Difference in CSPs’, *Proceedings of AAAI 1994*, 362-367, (1994).

[6] M. Soos, K. Nohl, C. Castelluccia, ‘Extending SAT Solvers to Cryptographic Problems’, *Proceedings of SAT 2009*, 244-257, (2009).