redundancy.dvi

Alternation and Redundancy Analysis

of the Intersection Problem

EMY BARBAY (*)

University of Waterloo

and

CLAIRE KENYON (**)

Brown University

The intersection of sorted arrays problem has applications in search engines such as Google.

Previous work propose and compare deterministic algorithms for this problem, in an adaptive

analysis based on the encoding size of a certiﬁcate of the result (cost analysis). We deﬁne the

alternation analysis, based on the non-deterministic complexity of an instance. In this analysis we

prove that there is a deterministic algorithm asymptotically performing as well as any randomized

algorithm in the comparison model. We deﬁne the redundancy analysis, based on a measure of

the internal redundancy of the instance. In this analysis we prove that any algorithm optimal

in the redundancy analysis is optimal in the alternation analysis, but that there is a randomized

algorithm which performs strictly better than any deterministic algorithm in the comparison

model. Finally, we describe how those results can be extended b eyond the comparison model.

Keywords: randomized algorithm, intersection of sorted arrays, alternation and redundancy

adaptive analysis.

Categories and Subject Descriptors: F.2.2 [Analysis of Algorithms and Problem Complexity]:

Nonnumerical Algorithms and Problems—Sorting and searching; H.3.3 [Information Storage

and Retrieval]: Information Search and Retrieval—Search process

General Terms: Algorithms, Theory

Additional Key Words and Phrases: Adaptive Analysis, Alternation Analysis, Intersection, Redundancy

Analysis

1. INTRODUCTION

We consider search engines where queries ar e composed of several keywords, each

one b e ing associated with a sorted array of references to entries in some data base [Witten

et al. 1994, p. 136]. The answer to a conjunctive query is the intersection of the

sorted arrays corresponding to each keyword. Most search engines implement these

queries. The algor ithms are in the comparison model, where comparisons are the

only operations permitted on references.

There is an extensive literature on the merging [Hwang and Lin 1971; 1 972;

(*) School of Computer Science, University of Waterloo, Waterloo, Ontario N2L 3G1 Canada

(**) Computer Science Department, Brown University, Providence, RI 02912, United States

Permission to make digital/hard copy of all or part of this material without fee for personal

or classroom use provided that the copies are not made or distributed for proﬁt or commercial

advantage, the ACM copyright/server notice, the title of the publication, and its date appear, and

notice is given that copying is by permission of the ACM, Inc. To copy otherwise, to republish,

to post on servers, or to redistribute to lists requires prior speciﬁc permission and/or a fee.

 2006 ACM 0000-0000/2006/0000-0001 $5.00

ACM Journal Name, Vol. V, No. N, October 2006, Pages 1–18.

2 · J. Barbay and C. Kenyon

Christen 1978; Manacher 1979; de la Vega et al. 1993; de la Vega et al. 1998]

or intersection [Baeza-Yates 2004] of two sorted arrays. The two problems are

similar, as both re quire the algorithm to place each element in the context of the

other elements. In relational data bases, the intersection of more than two arrays is

computed by intersec ting the arrays two by two. The only optimization available

in this context consist in choosing the order in which those se ts are intersected,

and the literature explore s how to use statistics precomputed on the content of the

database to choose the best order [Chaudhuri 1998, and its references].

Demaine et al. [2001] showed that a holis tic algorithm, which considers the query

as a whole rather than as a decomposition of it in smaller two-by-two intersection

queries, is more eﬃcient, both in theory and in practice.

In this paper we present another theoretical analysis, called the alternation

analysis [B arbay and Kenyon 2002], ba sed on the non-deterministic complexity of

the instance, and prove tight bounds on the randomized computational complexity

of the intersection. One intriguing fact of this analysis is that the lower bound apply

to randomized algorithms, whereas a deterministic algorithm is optimal. Does it

mean that no randomized algorithm can perform better than a deterministic o ne

on the intersection problem ? To answer this question, we extend the alternation

analysis to the redundancy ana lysis [Ba rbay 2003], based o n a measure of the

internal redundancy of the instance. This analysis permits to prove that fo r the

intersection problem, randomized algorithms perform better than deterministic

algorithms in term of the number of comparisons.

The redundancy analysis also makes more natur al assumptions on the instances:

the worst case in the a lternation a nalysis is such that an element considered by

the algo rithm is matched by almost all of the keywords, while in the r e dundancy

analysis the maximum number of keywords matching such an element is parameterized

by the measure of diﬃculty.

We deﬁne formally the interse ction problem in Section 2, and sketch the alternation

analysis and its results in Section 3. We deﬁne the redundancy analysis and study

it in Section 4: we give and analyze a randomized algor ithm in Section 4.1, and we

prove that this algorithm is optimal in Sec tion 4.2.

We ans wer the question of the usefulness o f randomized algorithms for the intersection

problem in Section 5: no deterministic algorithm can be optimal in the redundancy

analysis, hence the superiority of randomized algorithms. We list in Section 6

several perspectives of this work.

2. DEFINITIONS

We consider queries composed of several keywords, each associated to a sorted array

of references. The references can be for ins tance addresses of web pages, the only

requirement being a total order on them, i.e. that all unequal pairs of references can

be ordered. To study the intersection problem, we consider any set of two arrays

or more, of elements from a totally o rdered space, to form an insta nce . To perfo rm

any complexity analysis on such instances, we nee d to deﬁne a measure representing

the size of the instance. We deﬁne for this the signature of an instance .

Deﬁnition 2.1. We consider U to be a totally o rdered space. An instance is

composed of k sorted arrays A

, . . . , A

of positive sizes n

, . . . , n

and composed

ACM Journal Name, Vol. V, No. N, October 2006.

Alternation and Redundancy Analysis of the Intersection Problem · 3

A = 9

B = 1 2 9 11

C = 3 9 12 13

D = 9 14 15 16

E = 4 10 17 18

F = 5 6 7 10

G = 8 10 19 20

A : 9

B : 1 2 9 11

C : 3 9 12 13

D : 9 14 15 16

E : 4 10 17 18

F : 5 6 7 10

G : 8 10 19 20

Fig. 1. An instance of the intersection problem: on the left is the array representation of the

instance, on the right is a representation which expresses i n a better way the structure of the

instance, where the x-coordinate of each element is equal to its value.

of elements from U. The signature of such an instance is (k, n

, . . . , n

). An instance

is “of s ignature at m ost” (k, n

, . . . , n

) if it can be completed by adding arrays and

elements to form an instance of signature exactly (k, n

, . . . , n

Example 2.2. Consider the instance of Figure 1 , where the ordered space is the

set of positive integers: it has signature (7, 1, 4, 4, 4, 4, 4, 4)

Deﬁnition 2.3. The Intersection of an instance is the set A

∩ . . . ∩ A

composed

of the elements that are present in k distinct arrays.

Example 2.4. The intersection A ∩ B ∩ . . . ∩ G of the instance of Figure 1 is

empty, as no element is present in more than 4 arrays.

Any algorithm (even a non-deterministic one) computing the intersection must

prove the correctness of the output: ﬁrst, it must certify that all the elements of

the output are indeed elements of the k arrays; second, it must certify that no

element of the intersection has been omitted, by exhibiting some certiﬁcate that

there can be no other elements in the intersection than those output. We deﬁne

the partition-certiﬁcate as such a proof.

Deﬁnition 2.5. A partition-certiﬁcate is a partition (I

)

j≤δ

of U into intervals

such that any singleton {x} corresponds to an element x of ∩

, and each other

interval I has an empty intersec tion I ∩ A

with at least one array A

3. ALTERNATION ANALYSIS

Imagine a function which indicates for each element x ∈ U the name of an ar ray

not containing x if x is not in the intersection, and “all” if x is in the intersection.

The minimal number of times such a function alternates names, for x sca nning U

in increasing order, is just one less than the minimal size of a partition-certiﬁcate

of the instance, which is called the alternation of the instance.

Deﬁnition 3.1. The alternation δ of an instance (A

, . . . , A

) is the minimal

number of intervals forming a partition-certiﬁcate of this instance.

Example 3.2. The alternation of the instance in Figure 1 is δ=3, as we can see on

the right re presentation that the partition (−∞, 9), [9, 10), [10, +∞) is a partition-

certiﬁcate of size 3, and that none can be smaller.

The alternation of an instance I is als o the complexity o f the best non-deterministic

algorithm on I (plus 1 ), i.e. the non-deterministic complexity. This non-deterministic

ACM Journal Name, Vol. V, No. N, October 2006.

4 · J. Barbay and C. Kenyon

complexity forms a weak lower bound on the complexity of any randomized or

deterministic algorithm solving I, and hence a natural measure of the diﬃculty of

the instance.

Indeed, among instances of same signature and alternation, it is possible to

prove a tight bound on the randomized complexity of the intersection problem:

by providing a diﬃcult distribution of ins tances a nd using the minimax principle,

we prove a lower bound on the complexity of any randomized algorithm solving the

problem [Barbay a nd Kenyon 2002].

Theorem 3.3 Alternation Lower Bound [Barbay and Kenyon 2002].

For any k≥2, 0<n

≤ . . . ≤n

and δ∈{4, . . . , 4n

}, and for any randomized algorithm

for the intersection problem, there is an instance of signature at most (k, n

, . . . , n

)

and alternation at most δ, such that A

performs Ω(δ

i=1

log(n

/δ)) comparisons

on average on it.

Proof. This is a simple application of Lemma 4.9 (stated and proved in Section 4.2)

and of the Yao-von Neumann principle [Neumann and Morgenstern 1944 ; Sion 1958;

Yao 1977]:

—Lemma 4.9 gives a distribution for δ ∈ {4, . . . , 4n

} on instances of alternation

at most δ,

—Then the Yao-von Neumann principle permits to deduce from this distr ibution a

lower bound on the worst case complexity of randomized algorithms.

On the other hand, a simple deterministic algorithm reaches this lower bound.

As the class of deterministic algorithms is co ntained in the c lass of randomized

algorithms, this proves that the bound is tight for randomized algorithms.

Theorem 3.4 Alternation Upper Bound [Barbay and Kenyon 2002].

There is a deterministic algorithm which performs O(δ

i=1

log(n

/δ)) comparisons

on any instance of signature (k, n

, . . . , n

) and alternation δ.

Proof. The deterministic version of Algorithm Rand Intersection (see Section 4.1),

where the choice of a random arr ay is replaced by the choice of the next array in a

ﬁxed order, performs O(δ

i=1

log(n

/δ)) comparisons on an instance of signature

(k, n

, . . . , n

) and of alternation δ. Its analysis is very simila r to the one of the

randomized version given in the proof of Theorem 4.7.

Note that this algorithm is distinct from the algo rithm presented previously [Barbay

and Kenyon 2002], wher e the a lgorithm was p e rforming unbounded searches in

parallel in the arrays. Here the algorithm performs one unbounded search at a

time, which saves some compa risons in many cases, for any arbitr ary signature

(k, n

, . . . , n

) (but not in the worst case).

The lower bound apply to any randomized algor ithm, when a mere deterministic

algorithm is optimal. Do es it mean that no randomized algorithms can do better

than a deterministic one on the intersection problem? We reﬁne the analys is to

answer this question.

4. REDUNDANCY ANALYSIS

By deﬁnition of the partition-c e rtiﬁcate:

ACM Journal Name, Vol. V, No. N, October 2006.

Alternation and Redundancy Analysis of the Intersection Problem · 5

—for each singleton {x} of the partition, any algorithm must ﬁnd the position of

x in all arrays A

, which takes k searches;

—for each interval I

of the partition, any algorithm must ﬁnd an array, or a s e t of

arrays, s uch that the intersection o f I

with this array, or with the intersection

of those arr ays, is empty.

The cost for ﬁnding such a set of arrays can vary, and depends on the choices

performed by the algorithm. In general, it requires fewer searches if there are many

possible answers. To take this into account, for each interval I

of the par tition-

certiﬁcate we will count the number r

of arrays whose intersection with I

is empty.

The smaller is r

, the har der is the instance: 1/r

measures the contribution of this

interval to the diﬃculty of the instance.

Example 4.1. Consider for instance the interval I

= [10, 11) in the instance

of Figure 1: r

= 4 arrays have an empty intersection with it. A randomized

algorithm, choosing an array uniformly at random, has probability r

/k to ﬁnd an

array which does not intersect I

, and will do so after at most ⌈k/r

⌉ trials on

avera ge, even if it tries several times in the same array because it doesn’t memorize

which array it tried before . As the number of arrays k is ﬁxed, the value 1/r

measures the diﬃculty of proving that no element of I

is in the intersection of the

instance.

We name the sum of those contributions the redundancy of the instance, and it

forms our new measure of diﬃculty:

Deﬁnition 4.2. Let A

, . . . , A

be k sorted arrays, and let (I

)

j≤δ

be a partition-

certiﬁcate for this instance.

—The redundancy ρ(I) of an interval or singleton I is deﬁned as equal to 1 if I is

a singleton, and equal to 1/#{i, A

∩ I = ∅} otherwise.

—The redundancy ρ((I

)

j≤δ

) of a partition-certiﬁcate (I

)

j≤δ

is the sum

ρ(I

)

of the redundancies of the intervals composing it.

—The redundancy ρ ((A

)

i≤k

) of an instance of the intersection problem is the

minimal redundancy min{ρ ((I

)

j≤δ

) , ∀(I

)

j≤δ

} of a partition-certiﬁcate of the

instance.

Note that the redundancy is always well deﬁned and ﬁnite: if I is not a singleton

then by deﬁnition there is at least o ne array A

whose intersection with I is empty,

hence #{i, A

∩ I = ∅} > 0.

Example 4.3. The partition-certiﬁcate {(−∞, 9), [9, 10), [10 , 11), [11, +∞)} has

redundancy at most 1/2+1/3+1/4+1/2 = 7/6 for the instance given Figure 1,

and no other partition-certiﬁcate has a smaller re dundancy, hence the instance has

redundancy 7/6.

The main idea is that the redundancy analysis permits to measure the diﬃculty

of the instance in a ﬁner way than the alternation analysis: for ﬁxed k, n

, . . . , n

and δ, several instances of signa tur e (k, n

, . . . , n

) and alternation δ may present

various levels of diﬃculty, and the redundancy helps to distinguish between those.

ACM Journal Name, Vol. V, No. N, October 2006.

6 · J. Barbay and C. Kenyon

A = 9

B = 1 2 9 11

C = 3 9 12 13

D = 9 14 15 16

E = 4 10 17 18

F = 5 6 7 9

G = 8 9 19 20

A : 9

B : 1 2 9 11

C : 3 9 12 13

D : 9 14 15 16

E : 4 10 17 18

F : 5 6 7 9

G : 8 9 19 20

Fig. 2. A much more diﬃcult variant of the instance of Figure 1: only two elements changed,

F [4] and G[2] whi ch were equal to 10 and are now equal to 9, but the redundancy is now ρ =

1/2+1+1/6+1/2 = 2+1/6.

Example 4.4. In the instance from Figure 1, the only way to prove the e mptiness

of the intersection is to co mpute the intersection of one of the arrays chosen from

{A, B, C, D} with one of the ar rays chosen fro m {E, F, G}, beca use 9 ∈ A∩B∩C∩D

and 10 ∈ E ∩ F ∩ G. For simplicity, and without lo ss of generality, suppose that

the algorithm searches to intersect A with another array in {B, C, D, E, F, G},

and consider the number of unbounded searches performed, instead of the number

of comparisons. The randomized algorithm looking for the element of A in a

random a rray from {B, C, D, E, F, G} performs on avera ge only 2 searches, as the

probability to ﬁnd an array whose intersection is empty with A is then 1/2.

On the o ther hand, consider the instance of Figure 2, a variant of the instance of

Figure 1, where element 9 is present in all the arrays but E. As the two instances

have the same signature and alterna tio n, the alternation analysis yields the same

lower bound for both instances. But the randomized algorithm described above now

performs on average k/2 searches, as oppo sed to 2 searches on the original instance.

This diﬀerence in diﬃculty, between those very similar instances, is not expressed

by a diﬀerence of alternation, but it is expressed by a diﬀerence of redundancy:

the new instance has a redundancy of 1/2+1+1/6+1/2 = 2+1/6, which is larger

by one than the redundancy 7/6 of the original instance . This diﬀerence of one

corresponds to k more doubling searches for this simple instance. This diﬀerence

is used in Section 5 to crea te instances where a deterministic algorithm performs

O(k) times more searches and comparisons than a randomized algorithm.

4.1 Randomized algorithm

For simplicity, we ass ume that all a rrays contain the e lement −∞ at position 0

and the element +∞ at position n

+1. Given this convention, the intersection

algorithm can ignore the sizes of the sets. This is the case in particular in pipe-lined

computations, wher e the sets are not completely computed when the intersection

starts, for instance in parallel applications.

An unbounded search lo oks for an element x in a sorted array A of unknown

size, starting at position init. It returns a value p such that A[p−1]<x≤A[p], called

the insertion rank of x in A. It can be perfo rmed combining the doubling search

and binary search algorithms [Barbay and Kenyon 2002; Demaine et al. 2000; 2001],

and is then of complexity 2⌈log

(p−init)⌉, o r in a more complicated way [Bentley

and Yao 19 76] to improve the complexity by a c onstant factor of less than 2.

Using unbounded search rather than binary search is crucial to the complexity

ACM Journal Name, Vol. V, No. N, October 2006.

Alternation and Redundancy Analysis of the Intersection Problem · 7

Algorithm Rand Intersection (A

, . . . , A

)

for all i do p

← 1 end for

Result ← ∅; s ← 1

repeat

m ← A

]

#NO ← 0; #YES ← 1;

while YES < k and #NO = 0

Let A

be a random array s.t. A

] 6= m.

← Unbounded Search(m, A

, p

)

if A

] 6= m then #NO ← 1 else #YES ← YES + 1 end if

endwhile

if #YES = k then Result ← Result ∪ {m} end if

for all i such that A

] = m do p

← p

+ 1 end for

until m = +∞

return Result

Fig. 3. The algorithm Rand Intersection: Given k non-empty sorted sets A

, . . . , A

of sizes

, . . . , n

, the al gorithm computes in variable Result the intersection A

∩ . . . ∩ A

. Note that

the only random instruction is the choice of the array in the inner loop.

of the intersection algorithm. Consider the task of searching d e lements x

≤ x

≤

. . . ≤ x

in a sorted array of size n. It requires d log n

comparisons using binary

search, but less than 2d log(n

/d) comparisons using unbounded search. To see

that, deﬁne p

such that p

= 0 and A[p

] = x

∀j ∈ {1, . . . , d}: the jth doubling

search performs, no more than 2 log(p

− p

j−1

) comparisons. By co ncavity of the

log, the sums

j≤d

2 log(p

− p

j−1

) is no larger than 2d log(

j≤d

− p

j−1

)/d).

The sum

j≤d

−p

j−1

) is equa l to p

−p

, which is smaller than the size n of the

array. Hence the d doubling searches perform less than 2d log(n

/d) comparisons.

Theorem 4.5. Algorithm Rand Intersection (see Fig. 3) computes the intersection

of the arrays given as input.

Proof. Given k non-empty sorted sets A

, . . . , A

of sizes n

, . . . , n

, the Rand

Intersection algorithm (Fig. 3) c omputes the intersection A

∩ . . . ∩A

. The

algorithm is composed of two nested loops. The outer loop iterates through potential

elements of the intersection in variable m and in increasing order, and the inner

loop checks for each value of m if it is in the intersection.

In each pass of the inner loop, the algorithm searches for m in one array A

which potentially contains it. The invariant of the inner loop is that, at the start of

each pass and for each arr ay A

, p

denotes the ﬁrst potential position for m in A

− 1] < m. The varia bles #YES and #NO count how many arrays are known

to contain m, a nd are updated depending on the result of each search.

A new value for m is chosen every time we enter the outer loop, at which time the

current subproblem is to compute the inter section on the sub-arrays A

, . . . , n

]

for all values of i. Any ﬁrst element A

] of a sub-array could be a candidate, but

a better c andidate is one which is lar ger than the last value of m: the algorithm

chooses A

], which is by deﬁnition larger than m. Then only one array A

known to contain m, hence #YES ← 1, and no array is known n ot to contain it,

hence #NO ← 0. The algorithm terminates when all the values of the current array

have been considered, and m has taken the last value + ∞.

ACM Journal Name, Vol. V, No. N, October 2006.

8 · J. Barbay and C. Kenyon

We now analyze the complexity of Algorithm Rand Intersection (Fig. 3) as a

function of the redundancy ρ of the instance. To understand the intuition behind

the analysis, consider the following example:

Example 4.6. For a ﬁxed interval I

, suppose that the algorithm rece ives six

arrays such that A

, A

and A

contain many elements from I

but have none

in common, and such that A

and A

contain no elements from I

. Ignore all steps

of the algorithm where m takes values out of the interval I

: the interval deﬁnes

a phase of the algorithm. Suppose that m takes a value in I

at some point, for

instance from A

. At each iteration of the external loop, the algorithm ignores

the array from which the current value of m was taken, chooses one between the

four remaining arrays, searches in the chosen one, and updates the value of m

accordingly.

—With probability 3/5 the algorithm chooses the set A

, A

or A

(depending

of which set the current value of m comes from) and potentially fails to terminate

the phase.

—With probability 2/5 the algorithm chooses A

or A

, performs a sea rch in it

(there might be elements left from intervals I

∪ . . . ∪ I

j−1

), and updates m to a

value from I

j+1

, which terminates the current phase.

We are interested in the number C

of searches performed in each array A

during

this phase. As m takes a value outside of I

after a sea rch in A

or A

, C

and C

are random boolean variables, which depend only on the las t choice of the algorithm

befo re changing phas e : the expectation of C

(resp. C

) is exactly the probability

that A

(resp. A

) is picked knowing that one of those is picked, i.e. 1/2.

The algorithm can perfor m many sear ches in A

, A

and A

, so the variables

, C

and C

are r andom integer variables, which depend on all the choices

of the algorithm but the last. The proba bility that A

is chosen is null if m comes

from A

. Otherwise it is less than the probability that A

is chosen knowing that

m does n’t come from A

: Pr[A

is chosen ] = Pr[A

is chosen and m does not come

from A

] ≤ Pr[A

is chosen |m does not come from A

]. Hence the probability that

is chosen is less than 1/4.

is increased each time A

is chosen (probability a ≤ 1/5), is ﬁnalized as s oon

as A

or A

is chosen (probability b = 2/5), and stays the same each time another

array is chosen (probability c ≥ 2/5). Ignore all the steps where C

, C

or C

are

increased: knowing that C

, C

or C

are not increased, the probability that C

increased is a/(a+b) ≤ 1/3, and the probability that it is ﬁnalized is b/(a+b) ≥ 2/3.

Such a system will iterate a t most 3/2 times on average, and increment C

each

time but the last, i.e. 3/2 − 1 = 1/2 times on average. The same reasoning holds

for A

, A

and A

. Hence in this example E(C

) = 1/2 for each set A

, where 2 is

the numb e r of arrays which contain no elements from I

The proof of Theorem 4.7 argues similarly in the more general case.

Theorem 4.7 Redundancy Upper Bound [Barbay 2003]. Algorithm Rand

Intersection (Fig. 3) performs on average O(ρ

i=1

log(n

/ρ)) comparisons on

any instance of signature (k, n

, . . . , n

) and of redundancy ρ.

ACM Journal Name, Vol. V, No. N, October 2006.

Alternation and Redundancy Analysis of the Intersection Problem · 9

Proof. Let (I

)

j≤δ

be a partition-certiﬁcate of minimal redundancy ρ. Each

comparison perfor med by the algorithm is said to be performed in phase j if m ∈ I

for some interval I

of the partition. Let C

be the number of searches performed

by the algorithm during phase j in array A

, let C

be the number of

searches performed by the algorithm in array A

over the whole execution, and let

)

j≤δ

be such that r

is equal to 1 if I

is a singleton, and to #{i, A

∩ I

= ∅}

otherwise.

Let us consider a ﬁxed phase j ∈ {1, . . . , δ}, and compute the average number

of searches E(C

) performed in each array A

during phase j. At each iteration

of the internal loop, the algorithm chooses an a rray in which m is not known to

be. As m always comes from one array, there are at most k − 1 of those arrays,

hence each array is chosen with probability at least 1/(k − 1). If the element m

currently considered is in the intersection, then each arr ay A

will be searched and

is equal to 1. In this case 1/r

is also equal to 1, so that C

=1/r

=E(C

Suppose that m is not in the intersection, and that A

∩ I

is empty. Either A

is never chosen, and C

= 0; or A

is chosen, and C

= 1, because the algorithm

will terminate the phase after searching in A

. The probability that A

is chosen is

at most the probability that it is chos en knowing tha t this is the last search of the

phase:

Pr[A

is chosen] = Pr[A

is chosen and last search] ≤ Pr[A

is chosen| last search].

As the arrays are chosen uniformly, this proba bility is Pr{C

= 1} ≤ 1/r

, and

the average number of searches is at most E(C

) = 1 ∗ Pr{C

= 1} ≤ 1/r

The interesting c ase is when m is not in the intersection but A

∩ I

6= ∅. At each

new search, either

(1) C

is incremented by one because the search occur red in A

, which occurs with

probability less than 1/(k − 1);

(2) or C

is ﬁxe d in a ﬁnal way beca us e an array was found which intersection with

is empty, which occurs with probability r

/(k − 1);

(3) or C

is neither incremented nor ﬁxed, if another array was se arched but its

intersection with I

is not empty.

The co mbined probability of the ﬁrst and second case is 1/(k − 1) + r

/(k − 1).

Ignoring the third case where C

never changes, the c onditional probability of

the ﬁrst case is

k−1

). Hence this system is equivalent to a system

where C

is incremented by one with probability at least 1/(1 + r

), and ﬁxed with

the remaining probability, at most r

/(1 + r

). Such a system iterates at most

(1 + r

)/r

times on average, a nd increments C

at e ach iteration but the last: the

ﬁnal value of C

is at most (1 + r

)/r

− 1 = 1/r

Hence the average number of searches performed in each array A

during phase j

is E(C

) ≤ 1/r

. Summing over all phas e s, it implies that the algorithm performs

on avera ge E(C

) ≤

1/r

= ρ searches in each array A

Let g

ℓ

i,j

be the increment of p

due to the ℓth unbounded search in array A

during

phase j. Notice that

j,ℓ

ℓ

i,j

≤ n

. The algorithm performs at most 2 log(g

ℓ

i,j

+ 1)

comparisons during the ℓth search of phase j in array A

. So it performs at most

ACM Journal Name, Vol. V, No. N, October 2006.

10 · J. Barbay and C. Kenyon

j,ℓ

log(g

ℓ

i,j

+ 1) comparisons between m and an element of array A

during the

whole execution. Because of the concavity of the function log(x+ 1), this is sma ller

than 2C

log(

j,ℓ

ℓ

i,j

+1), and beca use of the preceding remark



j,ℓ

ℓ

i,j

≤n



this is smaller than 2C

log(n

+ 1).

The functions f

(x)=2x log(n

/x+1) are concave for x≤n

, so E(f

))≤f

(E(C

)).

As the average complexity of the algorithm in array A

is E(f(C

)), and as E(C

) =

ρ, on average the algorithm performs less than 2ρ log(n

/ρ+1) comparisons between

m and an element in array A

. Summing over i we get the ﬁnal result, which is

O(ρ

log n

/ρ).

4.2 Randomized Complexity Lowe r Bound

We prove now that no rando mized algorithm can do asymptotically better in

(k, n

, . . . , n

). The proof is quite similar to the lower bound of the alternation

analysis [B arbay and Kenyon 2002], and diﬀers mostly in Lemma 4.8, which must

be adapted to the redundancy.

The Lemma s 4.8 and 4.9 are used to prove the alternation lower bound in

Theorem 3.3 and to prove the redundancy lower bound in T heorem 4.10.

In Lemma 4.8 we prove a lower bound on average on a distribution of ins tances

of alternation and redundancy at most ρ = 4 and of intersection s ize at most 1. We

use this result in Lemma 4.9 to deﬁne a distribution on instances of alternation and

redundancy at most ρ ∈ {4, 4n

} by c ombining p = θ(ρ) sub- ins tances. Applying

the Yao-von Neumann principle [Neumann and Morg e nstern 1 944; Sion 1958; Yao

1977] in Theore m 4.10 gives us a lower bound of Ω(ρ

i=2

log(n

/ρ)) on the

complexity of any randomized algorithm for the intersection problem.

Finally in Lemma 4.11, we prove that any instance of signature (k, n

, . . . , n

)

has redundancy ρ at most 2n

+1, so that the redundancy analysis of Theorem 4.10

covers totally all instances for a given signature (k, n

, . . . , n

Lemma 4.8 . For any k ≥ 2, 0<n

≤ . . . ≤n

, there is a distribution on instances

of the intersection problem with signature at most (k, n

, . . . , n

), alternation and

redundancy at most 4, such that any deterministic algorithm performs at least

(1/4)

i=2

log(2n

+ 1) +

i=2

1/(2n

+1) − k+2 comparisons on average.

Proof. Let C be the total number of comparisons performed by the algorithm,

and for each array A

note F

= log

(2n

+ 1), and F =

i=2

Let us draw an index w ∈ {2, . . . , k} equal to i with probability F

/F , and

(k − 1) positions (p

)

i∈{2,...,k}

such that ∀i each p

is cho sen uniformly at random

in {1, . . . , n

}. Let P and N be two instances such that in both P and N, for

any 1<i<j≤k, a∈A

, b, c∈A

and d, e∈A

then b<A

]<c and d< A

]<e imply

b<d<a<c<e (see Figure 4); in P , A

]=A

[1]; in N A

]>A

[1]; and such

that the elements at position p

in all other arrays than A

are equal to A

[1].

Let x = A

[1] be the ﬁrst element of the ﬁrst array. Deﬁne x-comparisons to

be the comparisons between any element and x. Because of the special relative

positions of the elements, a compar ison between two elements b and d in a ny

arrays does not yield mo re information than the two comparisons between x and

b and between x and d: the positions of elements b and d relative to x permit

to deduce their order. Hence any algorithm performing C comparisons between

ACM Journal Name, Vol. V, No. N, October 2006.

Alternation and Redundancy Analysis of the Intersection Problem · 11

d e

Fig. 4. Distribution on (P, N ): each element of value v is represented by a dot of x-coordinate v,

and large dots correspond to the element at position p

in each array A

arbitrary elements can be express e d as an algorithm performing no more than

2C x-comparisons, and any lower bo und L on the complexity of algorithms using

only x-comparisons is an L/2 lower bound on the complexity of algorithms using

comparisons between arbitrary elements.

The alternation of such instances is at most 4, and the redundancy of such

instances is no more than 3 + 1/(k −1), which is less than 4:

—the interval (−∞, A

[1]) is suﬃcient to certify that no element smaller than x is

in the intersection, and stands for a redundancy of at most 1;

—the interval (A

], +∞, ) is suﬃcient to certify that no element larger than

] is in the intersection, and stands for a redundancy of at most 1;

—the interval [A

[1], A

]] is suﬃcient in N to complete the partition-certiﬁcate,

and stands for a redundancy of at most 1;

—the singleton {x} and the interval (A

[1], A

]] are suﬃcient in P to complete

the partition-certiﬁcate, and stand for a redundancy of at most 1+1/(k − 1 ).

The only diﬀerence betwe e n instances P and N is the relative pos itio n of the

element A

] to the other elements composing the instance, as described in

Figure 4. Any algorithm computing the intersection of P has to ﬁnd the (k − 1)

positions {p

, . . . , p

}. Any algorithm c omputing the inte rsection of N has to

ﬁnd w and the associated p osition p

. Any algorithm distinguishing between

P and N has to ﬁnd p

: we will prove that it needs on average almost F/2 =

(1/2)

i=2

log

(2n

+ 1) x-comparisons to do so on a distribution corresponding to

the uniform choice between a n instance N and an instance P .

Consider a deterministic algorithm using only x-comparisons to compute the

intersection. As the algorithm has not distinguished between P and N till it found

w, let X

denote the number of x-co mparisons performed in a rray A

for both P or

N. Let Y

denote the number of x-comparisons performed in arr ay A

for N; and

let ξ

be the indicator var iable which equals 1 exactly if p

has been determined on

instance P . The number of comparisons performed is C =

i=2

. Restricting

ourselves to arr ays in which the position p

has been determined, we can write

C ≥

i=2

ACM Journal Name, Vol. V, No. N, October 2006.

12 · J. Barbay and C. Kenyon

Let us consider E(Y

): the expectancy can be decomposed as a sum of probabilities

E(Y

Pr{Y

≥h}, and in particular E(Y

)≥

h=1

Pr{Y

≥h}. Those

terms can be decomposed using the proper ty Pr{a∨b} ≤ Pr{a}+ Pr{b}:

Pr{Y

≥ h} = Pr{Y

≥ h ∧ ξ

= 1}

= 1 − Pr{Y

< h ∨ ξ

= 0}

≥ 1 − Pr{Y

< h} − Pr{ξ

= 0}

= Pr{ξ

= 1} − Pr{Y

< h} (1)

The probability Pr{Y

< h} is bounded by the usual decision tree lower bound:

if we consider the binary x-co mparisons performed in s e t A

, there are at most

leaves at depth less than h. Since the insertion rank of x in A

is uniformly

chosen, these leaves have the same probability and have total probability at most

Pr{Y

<h}≤2

/(2n

+ 1)=2

h−F

. Those terms for h ∈ {1, . . . , F

} form a geometric

sequence whose sum is equal to 2(1−2

−F

), so E(Y

) ≥ F

Pr{ξ

= 1}−2(1−2

−F

Then

E(C) ≥

i=2

E(Y

) ≥

i=2

Pr{ξ

= 1} −

i=2

2(1 − 2

−F

)

≥

i=2

Pr{ξ

= 1} + 2

i=2

−F

− 2(k − 2). (2)

Let us ﬁx p = (p

, . . . , p

). There are only k − 1 possible choices for w. The

algorithm can only diﬀerentiate between P and N when it ﬁnds w. Let σ denote

the order in which these instances are dealt with for p ﬁxed. Then ξ

= 1 if and

only if σ

≤ σ

, and so Pr{ξ

= 1|p} =

j:σ

≥σ

/F .

Summing over p, and then over i, we get an expression of the ﬁrs t term in

Equation (2):

Pr{ξ

= 1} =

Pr{ξ

= 1|p} Pr{p} =

j:σ

≥σ

Pr{p}

i=2

Pr{ξ

= 1} =

i=2

j:σ

≥σ

Pr{p} =

Pr{p}

i=2

j:σ

≥σ

In the sum, each term “F

” appears exactly once, and

= 2

i≤j

−

hence

i=2

j:σ

≥σ





i=2





ACM Journal Name, Vol. V, No. N, October 2006.

Alternation and Redundancy Analysis of the Intersection Problem · 13

Fig. 5. p elementary instances uniﬁed to form a single large instance.

which is independent of p. Then we can conclude:

i=2

Pr{ξ

= 1} =





i=2





Pr{p} =

i=2

Plugging this into Equation (2), we obtain a lower bound on the average number

of x-comparisons E(C) performed by any deterministic algorithm which performs

only x-comparisons, of (1/2)

i=2

+ 2

i=2

−F

− 2(k−2), w hich is equal to

(1/2)

i=2

log

(2n

+1) + 2

i=2

1/(2n

+1) − 2(k−2). This implies a lower bound

of (1/4 )

i=2

log

(2n

+1) +

i=2

1/(2n

+1) − (k−2) on the average number of

comparisons performed by any deterministic algorithm, hence the result.

Lemma 4.9 . For any k ≥ 2, 0<n

≤ . . . ≤n

and ρ∈{4, . . . , 4 n

}, there is a

distribution on instances of the intersection problem of signature at most (k, n

, . . . , n

of alternation and redundancy at most ρ, such that any deterministic algorithm

performs on average Ω(ρ

i=1

log(n

/ρ)) comparisons.

Proof. Let’s draw p=⌊ρ/4⌋ pairs (P

, N

)

j∈{1,...,p}

of sub-instances of signature

(k, ⌊n

/p⌋, . . . , ⌊n

/p⌋) from the distribution of Lemma 4.8. As ρ ≤ 4n

, p ≤ n

and ⌊n

/p⌋ > 0, the sizes of all the arrays are positive. Let’s choose uniformly at

random e ach sub-instance I

between the sub-instance P

which intersection is a

singleton and the sub-instance N

which intersection is empty, and form a larg e r

instance I by unifying the arrays of same index from each sub-instance, such that

the elements from two diﬀerent sub-instances never interleave, as in Figure 5.

This deﬁnes a distribution on instances of alter nation and redundancy at most

ρ (a s 4p = 4⌊ρ/4⌋ ≤ ρ), and of signatur e at most (k, n

, . . . , n

). Solving this

instance implies to solve all the p sub-instances. Lemma 4.8 gives a lower bound

of (1/4)

i=2

log(2n

/p + 1 ) +

i=2

1/(2n

+1) − k+ 2 comparisons on average for

each of the p sub problems, hence a lower bound of

(p/4)

i=2

log(2n

/p + 1) + p

i=2

1/(2n

/p+1) − k+2

which is Ω(ρ

i=1

log(n

/ρ)).

ACM Journal Name, Vol. V, No. N, October 2006.

14 · J. Barbay and C. Kenyon

Theorem 4.10 Redundancy Lower Bound [Barbay 2003]. For any k ≥

2, 0<n

≤ . . . ≤n

and ρ ∈ {4, . . . , 4n

}, and for any randomized algorithm A

for

the int ersection problem, there is an instance of signatu re at most (k, n

, . . . , n

and redundancy at most ρ, such that A

performs Ω(ρ

i=1

log(n

/ρ)) comparisons

on average on it.

Proof. The proof is identical to the proof of Theorem 3.3, as the instances

generated by the proof are of alternatio n equal to their redundancy. This is a

simple application of Lemma 4.9 and of the Yao- von Neumann principle [Neumann

and Morgenstern 1944 ; Sion 1958; Yao 1977]:

—Lemma 4.9 gives a distribution for ρ ∈ {4, . . . , 4n

} on instances of redundancy

at most ρ,

—Then the Yao-von Neumann principle permits to deduce from this distr ibution a

lower bound on the worst case complexity of randomized algorithms.

This analys is is more prec ise than the lower bound previously presented [B arbay

and Kenyon 2002], where the additive term in −k was ignored, although it makes the

lower bound trivially negative for large values of the diﬃculty δ. Here the additive

term is suppressed for min

≥ 128, and the multiplicative factor between the

lower bound and the upper bound is reduced to 16 instead of 64. This technique

can be applied to the alternation analysis of the intersection with the same result.

Note also that a multiplicative factor of 2 in the gap comes from the unbounded

searches in the algorithm, and can be reduced using a more c omplicated algorithm

for the unbounded search [Be ntley and Yao 1976].

One could wonder how the lower b ound evolves for redundancy values larger than

. The following result shows that no instance with such redundancy can exist.

Lemma 4.1 1. For any k ≥ 2 and 0<n

≤ . . . ≤n

, any instance of signature

(k, n

, . . . , n

) has redundancy ρ at most 2n

+1.

Proof. First observe that there is always a partition-certiﬁcate of size 2n

+ 1.

Then that the redundancy of any partition-certiﬁcate is by deﬁnition smaller than

the size of the partition. Hence the result.

Note that this does not contradict the result from Lemma 4.9, which deﬁnes a

distribution of instances of redundancy at most 4n

5. COMPARISONS BETWEEN THE ANALYSIS

The redundancy analysis is strictly ﬁner than the alternation analysis: s ome algorithms,

optimal for the alternation analysis, are not optimal anymore in the redundancy

analysis (Theorem 5.1), and any algorithm optimal in the redundancy analys is is

optimal in the alternation analysis (Theorem 5.2). So the Rand Intersection

algorithm is theoretically better than its deterministic variant in the comparison

model, and the redundancy analysis permits a better analysis than the alternation

analysis.

Theorem 5.1. For any k ≥ 2, 0<n

≤ . . . ≤n

and ρ ∈ {4, . . . , 4n

}, and for

any deterministic algorithm for the intersection problem, there is an instance of

signature at most (k, n

, . . . , n

), and redundancy at most ρ, such that this algorithm

performs Ω(kρ

log(n

/kρ)) comparisons on it.

ACM Journal Name, Vol. V, No. N, October 2006.

Alternation and Redundancy Analysis of the Intersection Problem · 15

≤

= 1ρ

= 1

Fig. 6. Element x is present in

half of the arrays of the sub-

instance.

= 1

Fig. 7. The adversary performs several strategies in parallel, one

for each sub-instance.

Proof. The proof uses the same decomposition than the proof of Theorem 4.10,

but uses an adversary argument to obtain a deterministic lower bound. Build

δ = kρ/3 sub-instances of signature (k, ⌊n

/δ⌋, . . . , ⌊n

/δ⌋), redundancy at most 3,

such that x = A

[1] is present in roughly half of the other arrays, as in Figure 6 .

On each sub-instance an adversary can for c e any deterministic algorithm to

perform a search in each of the arrays co ntaining x, and in a single array which does

not contain x. Then the deterministic algorithm performs (1/2 )

i=2

log (n

/δ)

comparisons for each sub-instance. In total over all sub-instance s, the adversary

can force any deterministic algorithm to per form (δ/2)

i=2

log (n

/δ) comparisons,

i.e. (kρ/4)

i=2

log (n

/kρ), which is Ω(kρ

i=2

log (n

/kρ)).

As x log(n/x) is a function increasing with x, kρ

log(n

/kρ) is several times

larger than the lower bound ρ

log(n

/ρ), hence no deterministic algorithm can

be optimal in the redundancy analysis.

Theorem 5.2. Any algorithm optimal in the redundancy analysis is optimal in

the alternation analysis.

Proof. By deﬁnition of the redundancy ρ and of the alternation δ of an ins tance,

ρ ≤ δ. So if an algorithm performs O(ρ

log n

/ρ) comparisons, it also performs

O(δ

log n

/δ) comparisons. Hence the result, as this is the lower bound in the

alternation analysis .

This proves also that the measure of diﬃculty of Demaine et al. [2000] is not

comparable w ith the measure of redundancy, as it is not comparable with the

measure of alternation [Barbay and Kenyon 2003, Section 2.3]. This mea ns that

the two measures are complementary, without being redundant in any way, as it

was for the alternation. All those measures describe the diﬃculty of the instance:

—the alternation [Barbay and Kenyon 2003, Section 2.3] describes the number of

key blocks of consecutive elements in the instance;

—the gap cost [Demaine et al. 2000] describe s the repartition o f the size of those

blocks;

—the redundancy [Barbay 2003] describe s the diﬃculty to ﬁnd each block.

But only the gap cost and the redundancy matter, because the alternation analysis

is reduced to the redundancy analy sis.

ACM Journal Name, Vol. V, No. N, October 2006.

16 · J. Barbay and C. Kenyon

6. PERSPECTIVES

The t-threshold set and opt-threshold set problems [Barbay and Kenyon 2003] a re

natural g e neralizations of the inters e c tion problem, which could be useful in indexed

search engines. The redundancy seems to be important in the complexity of these

problems as well, but a proper measure is harder to deﬁne in this context. As similar

techniques are applied to solve queries on s e mi- structured documents [Barbay 2004],

the redundancy could be useful in this domain too, but the deﬁnition of the proper

measure of diﬃculty is even more evasive in this context.

Demaine et al. [2001] performed experimental measurements of the performance

of various deterministic algo rithms for the intersection on their own data using

some queries provided by Google. We perfo rmed similar measurements for the

deterministic and randomized version of our algorithm, using the same queries and

a larger set of data, also provided by Google. The results are quite disappointing,

as the randomized version of the algorithm does not perform better than the

deterministic one in term of the number of comparisons or searches, and much worst

in term of runtime. The fact that the number of comparisons and the number of

searches are roughly the same indicates that most instances of this data set either

have a redundancy close to the alternation, because the elements searched are in

many of the arrays, or are so easy that both algorithms perform equally well on

it. The fact that the runtime is worse is probably linked to the performance of

prediction heuristics in the hardware: a deterministic algorithm is easier to predict

than a randomized one. It would be interesting to see if those negative results still

holds for queries with more keywords and on some data sets such as those from

relational databases, which can exhibit more correlatio n b e tween keywords.

While we restricted our deﬁnition of the intersection problem to set of arrays and

analyzed it in the comparis on model, it makes sense to consider other structures

for sorted sets, especially in the context of cached or swapped memory, or succinct

encodings of dictionaries. The hierarchical memory [Frigo et al. 1 999] s e ems promising

for this kind of applica tion, and Bender et al. [2002] proposed a data structure

and a cache oblivious algorithm to perform unbounded searches (implemented as

ﬁnger searches). Our algorithm can easily be a dapted to this model, to perform

O(ρ

(log

/ρ) + log

∗

/ρ))) I/O transfers at the level of cache size B.

In most of the intersection algorithms, the interactions with each set are limited

to accessing an element given its rank (select o perator) and searching for the

insertion rank of an element in it (rank operator ): those algorithms can be used

with any set implementation which provides those op e rators. For instance , using

sorted arrays such as in this paper, the select operator takes constant time while

the rank operato r takes logarithmic time in the size of the set. While the results of

this paper are optimal in the comparison model, it is not necess ary optimal in more

general models: the computational complexity of the search operators is a trade-oﬀ

with the size of the encoding o f the set. For instance, consider a set of n elements

from a universe of size m: Rama n et al. [2002] propose a succinct encoding of

Fully Indexable Dictionaries using log





+ o(m) bits to provide select and rank

operators in constant time. On the other side of the time/space trade-oﬀ, Beame

and Fich [2002] proposed a more compact encoding, using O(n) wo rds of log m

bits to provide select and rank opera tors in time O(

log n / log log n). Encoding

ACM Journal Name, Vol. V, No. N, October 2006.

Alternation and Redundancy Analysis of the Intersection Problem · 17

the sets using any of those schema would tremendously improve the computational

complexity of the intersection, at a small cost in space, which could result in much

faster search engines.

ACKNOWLEDGMENTS

This paper covers work performed in many places including the University of Paris-

Sud, Orsay, France; the University of British Columbia, Vancouver, Canada; and

the University of Waterloo, Waterloo, Canada. The authors wish to thank all the

institutions involved, as well as Joe l Friedman and the reviewers who provided

many useful comments.

REFERENCES

Baeza-Yates, R. A. 2004. A fast set i ntersection algorithm for sorted sequences. In CPM, S. C.

Sahinalp, S. Muthukrishnan, and U. Dogrus¨oz, Eds. Lecture Notes in Computer Science, vol.

3109. Spr inger, 400–408.

Barbay, J. 2003. Optimality of randomized algorithms for the intersection problem. In

Proceedings of the Symposium on Stochastic Algorithms, Foundations and Applications (SAGA

2003), in Lecture Notes in Computer Science, A. Albrecht, Ed. Vol. 2827 / 2003. Springer-

Verlag Heidelberg, 26–38. 3-540-20103-3.

Barbay, J. 2004. Index-trees for descendant tree queries in the comparison model. Tech. Rep.

TR-2004-11, University of Br itish Columbia. July.

Barbay, J. and Kenyon, C. 2002. Adaptive intersection and t- threshold problems. In Proceedings

of the thirteenth ACM-SIAM Symposium On Discrete Algorithms (SODA). ACM-SIAM, ACM,

390–399.

Barbay, J . and Kenyon, C. 2003. Deterministic algorithm for the t-threshold set problem.

In Lecture Notes in Computer Science, H. O. Toshi hide Ibaraki, Noki Katoh, Ed. Springer-

Verlag, 575–584. Proceedings of the 14th Annual International Symposium on Algorithms And

Computation (ISAAC).

Beame, P. and Fich, F. E. 2002. Optimal bounds for the predecessor problem and related

problems. J. Comput. Syst. Sci. 65, 1, 38–72.

Bender, M. A., Cole, R., a nd Raman, R. 2002. Exponential structures for eﬃcient cache-

oblivious al gorithms. In Proceedings of the 29th International Colloquium on Automata,

Languages and Programming. Springer-Verlag, 195–207.

Bentley, J. L. and Yao, A. C.-C. 1976. An almost optimal algorithm for unbounded searching.

Information processing letters 5, 3, 82–87.

Chaudhuri, S. 1998. An overview of query optimization in relational systems. 34–43.

Christen, C. 1978. Improving the bound on optimal merging. In Proceedings of 19th FOCS.

259–266.

de la Vega, W. F., Frieze, A. M., and Santha, M. 1998. Average-case analysis of the merging

algorithm of hwang and lin. Algorithmica 22, 4, 483–489.

de la Vega, W. F., Kannan, S., and Santha, M. 1993. Two probabilistic results on merging.

SIAM J. Comput. 22, 2, 261–271.

Demaine, E. D., L

opez-Ortiz, A., and Munro, J. I. 2000. Adaptive set intersections, unions,

and diﬀerences. In Proceedings of the 11

ACM-SIAM Symposium on Discrete Algorithms

(SODA). 743–752.

Demaine, E. D., L

opez-Ortiz, A., and Munro, J. I. 2001. Experiments on adaptive set

intersections for text retrieval systems. In Proceedings of the 3rd Workshop on Algorithm

Engineering and Experiments, Lecture Notes in Computer Science. Washington DC, 5–6.

Frigo, M., Leiserson, C. E., Prokop, H., and Ramachandran, S. 1999. Cache-oblivious

algorithms. In Proceedings of the 40th Annual Symposium on Foundations of Computer

Science. IEEE Computer Society, 285.

ACM Journal Name, Vol. V, No. N, October 2006.

18 · J. Barbay and C. Kenyon

Hwang, F. K. an d Lin, S. 1971. Optimal merging of 2 elements wi th n elements. Acta

Informatica, 145–158.

Hwang, F. K. and Lin, S. 1972. A simple algorithm for merging two disjoint linearly ordered

sets. SIAM Journal of Computing 1, 1, 31–39.

Manacher, G . K. 1979. Signiﬁcant improvements to the hwang-ling mer ging algorithm.

JACM 26, 3, 434–440.

Neumann, J. V. and M orgenstern, O. 1944. Theory of games and economic behavior. 1st ed.

Princeton University Press.

Raman, R., Raman, V., and Rao, S. S. 2002. Succinct i ndexable dictionaries with applications to

encoding k-ary trees and multisets. In SODA ’02: Proceedings of the thirteenth annual ACM-

SIAM symposium on Discrete algorithms. Society for Industrial and Applied Mathematics,

Philadelphia, PA, USA, 233–242.

Sion, M. 1958. On general minimax theorems. Pacic Journal of Mathematics, 171–176.

Witten, I. H., Moffat, A., a nd Bell, T. C. 1994. Managing Gigabytes. VanNostrand Reinhold,

115 Fifth Avenue, New York, NY 10003.

Yao, A. C. 1977. Probabilistic computations: Toward a uniﬁed measure of complexity. In Proc.

18th IEEE Symposium on Foundations of Computer Science (FOCS). 222–227.

ACM Journal Name, Vol. V, No. N, October 2006.