Transcription of Closest Pair Problem
1 Subhash SuriUC Santa BarbaraClosest Pair Problem Givennpoints ind-dimensions, find twowhose mutual distance is smallest. Fundamental Problem in manyapplications as well as a key step in A naive algorithm takesO(dn2)time. Element uniqueness reduces to ClosestPair, so (nlogn)lower bound. We will develop a divide-and-conquerbasedO(nlogn)algorithm ; dimensiondassumed SuriUC Santa Barbara1-Dimension Problem 1D Problem can be solved inO(nlogn)viasorting. Sorting, however, does not generalize tohigher dimensions. So, let s develop adivide-and-conquer for 1D.
2 Divide the pointsSinto two setsS1, S2bysomex-coordinate so thatp < qfor allp S1andq S2. Recursively compute Closest pair(p1, p2)inS1and(q1, q2) m Let be the smallest separation found sofar: = min(|p2 p1|,|q2 q1|)Subhash SuriUC Santa Barbara1D Divide & Conquerp1p2p3q3q1q2S1S2median m The Closest pair is{p1, p2}, or{q1, q2}, orsome{p3, q3}wherep3 S1andq3 S2. Key Observation:Ifmis the dividingcoordinate, thenp3, q3must be within ofm. In 1D,p3must be the rightmost point ofS1andq3the leftmost point ofS2, butthese notions do not generalize to higherdimensions.
3 How many points ofS1can lie in theinterval(m , m]? By definition of , at most one. Sameholds SuriUC Santa Barbara1D Divide & Conquerp1p2p3q3q1q2S1S2median m Closest -Pair(S). If|S|= 1, output = .If|S|= 2, output =|p2 p1|.Otherwise, do the following (S). , 1= Closest -Pair(S1).4. 2= Closest -Pair(S2).5. 12is minimum distance across the = min( 1, 2, 12). Recurrence isT(n) = 2T(n/2) +O(n), whichsolves toT(n) =O(nlogn).Subhash SuriUC Santa Barbara2-D Closest Pair We partitionSintoS1, S2by vertical line`defined by medianx-coordinate inS. Recursively compute Closest pair distances 1and 2.)
4 Set = min( 1, 2). Now compute the Closest pair with onepoint each inS1andS2. 2 1p1p2q1q2S1S2 P1P2 In each candidate pair(p, q), wherep S1andq S2, the pointsp, qmust both liewithin of`.Subhash SuriUC Santa Barbara2-D Closest Pair At this point, complications arise, whichweren t present in 1D. It s entirelypossible that alln/2points ofS1(andS2)lie within of`. 2 1p1p2q1q2S1S2 P1P2 Naively, this would requiren2/4calculations. We show that points inP1, P2( striparound`) have a special structure, andsolve the conquer step SuriUC Santa BarbaraConquer Step Consider a pointp S1.
5 All points ofS2within distance ofpmust lie in a 2 p RR How many points can be insideRif eachpair is at least apart? In 2D, this number is at most 6! So, we only need to perform6 n/2distance comparisons! We don t have anO(nlogn)time algorithmyet. Why?Subhash SuriUC Santa BarbaraConquer Step Pairs In order to determine at most 6 potentialmates ofp, projectpand all points ofP2onto line`.P2P1 p RR Pick out points whose projection is within ofp; at most six. We can do this for allp, by walking sortedlists ofP1andP2, in totalO(n)time. The sorted lists forP1, P2can be obtainedfrom pre-sorting ofS1, S2.
6 Final recurrence isT(n) = 2T(n/2) +O(n),which solves toT(n) =O(nlogn).Subhash SuriUC Santa Barbarad-Dimensional Closest Pair Two key features of the divide andconquer strategy are step where subproblems arecombined takes place in one subproblems in the combine stepsatisfy a sparsity Condition:Any cube with sidelength2 containsO(1)points that the original Problem does notnecessarily have this HSubhash SuriUC Santa BarbaraThe Sparse Problem Givennpoints with -sparsity condition,find all pairs within distance . Divide the set intoS1, S2by a medianplaceH.
7 Recursively solve the Problem intwo halves. Project all points lying within thick slabaroundHontoH. Call this setS . S inherits the -sparsity condition. Why?. Recursively solve the Problem forS ind 1space. The algorithms satisfies the recurrenceU(n, d) = 2U(n/2, d) +U(n, d 1) +O(n).which solves toU(n, d) =O(n(logn)d 1).Subhash SuriUC Santa BarbaraGetting Sparsity Recall that divide and conquer algorithmsolves the left and right half problemsrecursively. The sparsity holds for the merge Problem ,which concerns points within thick H Set P2 IfSis a set where inter-point distance isat least , then the -cube centered atpcontains at most a constant number ofpoints ofS, depending SuriUC Santa BarbaraProof of Sparsity LetCbe the -cube centered atp.
8 LetLbe the set of points inC. Imagine placing a ball of radius /2around each point ofL. No two balls can intersect. Why? The volume of cubeCis(2 )d. The volume of each ball is1cd( /2)d, for aconstantcd. Thus, the maximum number of balls, orpoints, is at mostcd4d, which isO(1).ab /2ballballSubhash SuriUC Santa BarbaraClosest Pair Algorithm Divide the inputSintoS1, S2by themedian hyperplane normal to some axis. Recursively compute 1, 2forS1, S2. Set = min( 1, 2). LetS be the set of points that are within ofH,projected ontoH. Use the -sparsity condition to recursivelyexamine all pairs inS there are onlyO(n)pairs.
9 The recurrence for the final algorithm is:T(n, d) = 2T(n/2, d) +U(n, d 1) +O(n)= 2T(n/2, d) +O(n(logn)d 2) +O(n)=O(n(logn)d 1).Subhash SuriUC Santa BarbaraImproving the Algorithm If we could show that the Problem size inthe conquer step ism n/(logn)d 2, thenU(m, d 1) =O(m(logm)d 2) =O(n). This would give final recurrenceT(n, d) = 2T(n/2, d) +O(n) +O(n), whichsolves toO(nlogn). Theorem:Given a setSwith -sparsity,there exists a hyperplaneHnormal tosome axis such that1.|S1|,|S2| of points within ofHisO(n(logn)d 2). be found inO(n) SuriUC Santa BarbaraSparse Hyperplane We prove the theorem for 2D.
10 Show thereis a line with npoints within of it, forsome constant . For contradiction, assume no such lineexists. Partition the plane by placing verticallines at distance2 from each other, wheren/8points to the left of leftmost line, andright of rightmost line.> n/2 points2 2 k linesl linesn/8 pointsn/8 pointsn/8 pointsn/8 pointsSubhash SuriUC Santa BarbaraSparse Hyperplane If there arekslabs, we havek n 3n/4,which givesk 34 n.> n/2 points2 2 k linesl linesn/8 pointsn/8 pointsn/8 pointsn/8 points Similarly, if there is no horizontal linewith desired properties, we getl 34 n.