Example: confidence

11. Sorting and Computer Science ... - Princeton University

Computer Science SEDGEWICK/WAYNE PART II: ALGORITHMS, THEORY, AND O B E R T S E D G E W I C K KEVIN WAYNEC omputer ScienceComputerScienceAn Interdisciplinary Approach11. Sorting and SearchingSection Searching and Sorting A typical client Binary search Insertion sort Mergesort Longest repeated substringCOMPUTER Science SEDGEWICK/WAYNE PART II: ALGORITHMS, THEORY, AND typical client: Whitelist filter3 Whitelist filter Read a list of strings from a whitelist file. Read strings from StdIn and write to StdOut only those in the blacklist is a list of entities to be rejected for : Overdrawn account SpammersA whitelist is a list of entities to be accepted for : Account in good standing Friends and relativesalice@home bob@office carl@beach dave@boat whitelistbob@office carl@beach marvin@spam bob@office bob@office mallory@spam dave@boat eve@airport alice@home ..StdInbob@office carl@beach bob@office bob@office dave@boat alice@home.

A typical client: Whitelist filter 3 Whitelist filter •Read a list of strings from a whitelist file. •Read strings from StdIn and write to StdOut only those in the whitelist. A blacklist is a list of entities to be rejected for service. Examples: Overdrawn account

Tags:

  Computer, Sciences, Princeton, Computer science

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of 11. Sorting and Computer Science ... - Princeton University

1 Computer Science SEDGEWICK/WAYNE PART II: ALGORITHMS, THEORY, AND O B E R T S E D G E W I C K KEVIN WAYNEC omputer ScienceComputerScienceAn Interdisciplinary Approach11. Sorting and SearchingSection Searching and Sorting A typical client Binary search Insertion sort Mergesort Longest repeated substringCOMPUTER Science SEDGEWICK/WAYNE PART II: ALGORITHMS, THEORY, AND typical client: Whitelist filter3 Whitelist filter Read a list of strings from a whitelist file. Read strings from StdIn and write to StdOut only those in the blacklist is a list of entities to be rejected for : Overdrawn account SpammersA whitelist is a list of entities to be accepted for : Account in good standing Friends and relativesalice@home bob@office carl@beach dave@boat whitelistbob@office carl@beach marvin@spam bob@office bob@office mallory@spam dave@boat eve@airport alice@home ..StdInbob@office carl@beach bob@office bob@office dave@boat alice@home.

2 StdOutExample. Email spam filter (message contents omitted) Search client: Whitelist filterpublic class WhiteFilter { public static int search(String key, String[] a) // Search method (stay tuned). public static void main(String[] args) { In in = new In(args[0]); String[] words = (); while (! ()) { String key = (); if (search(key, words) != -1) (key); } } }4% more alice@home bob@office carl@beach dave@boat % more bob@office carl@beach marvin@spam bob@office bob@office mallory@spam dave@boat eve@airport alice@home % java WhiteFilter < bob@office carl@beach bob@office bob@office dave@boat alice@home Alice and Bob5 Hey, Alice. I think I'm going to start an Internet luck! BTW, you're going to need a whitelist too. I'm thinking about having 1 thousand customers next month and 1 million next 'm going to take a few CS courses 're hoping to grow even faster than , I know.

3 I'm going to a hackathon to knock it implementation: Sequential search (first try)public static int search(String key, String[] a) { for (int i = 0; i < ; i++) if (a[i] == key) return i; return -1; }6ia[i]0alice1bob2carlos3carol4craig5dav e6erin7eve8frank9mallory10oscar11peggy12 trent13walter14wendyoscar?Sequential search Check each array entry 0, 1, 2, 3, .. for match with search string. If match found, return index of matching string. If not, return references, not strings! @#$%$#@@%#!!Strawman implementation: Sequential searchpublic static int search(String key, String[] a) { for (int i = 0; i < ; i++) if (a[i].compareTo(key) == 0) return i; return -1; }7ia[i]0alice1bob2carlos3carol4craig5dav e6erin7eve8frank9mallory10oscar11peggy12 trent13walter14wendyoscar?Sequential search Check each array entry 0, 1, 2, 3, .. for match with search string. If match found, return index of matching string. If not, return found.

4 Return 10 Still, this was even easier than I thought!Mathematical analysis of whitelist filter using sequential search8xwnzblnuqvlnuqvczpwxczpwxdqwakidh lddobqidobqitsirvdqwakdobqiidhlddqwakdob qilnuqvxwnzbidhldbshlaxwnzbModel N strings on the whitelist. cN transactions for constant c. String length not A random search hit checks about half of the N strings on the whitelist, on average. A random search miss checks all of the N strings on the whitelist, on average. Expected order of growth of running time: representative inputs for searching and sorting9public class Generator { public static String randomString(int L, String alpha) { char[] a = new char[L]; for (int i = 0; i < L; i++) { int t = ( ()); a[i] = (t); } return new String(a); } public static void main(String[] args) { int N = (args[0]); int L = (args[1]); String alpha = args[2]; for (int i = 0; i < N; i++) (randomString(L, alpha)).}}

5 } }Generate N random strings of length L from a given alphabet% java Generator 10 3 abc bab bab bbb cac aba abb bab ccb cbc bab% java Generator 15 8 0123456789 62855405 83179069 79061047 27258805 54441080 76592141 95956542 19442316 75032539 10528640 42496398 34226197 10320073 80072566 87979201good chance of duplicatesnot much chance of duplicates% java Generator 1 60 actg tctatagggtcgtttgcgaagcctacacaaaagtagttgt tggacaacgattgacaaacaTest client for sequential search10public class TestSS { public static int search(String key, String[] a) { for (int i = 0; i < ; i++) if ( a[i].compareTo(key) == 0 ) return i; return -1; } public static void main(String[] args) { String[] words = (); int N = ; double start = () ; for (int i = 0; i < 10*N; i++) { String key = words[ (N)]; if (search(key, words) == -1) (key); } double now = () ; ( (now-start) + " seconds"); } }Print time required for 10N searches in a whitelist of length N % java Generator 10000 10 a-z | java TestSS 3 secondsgenerate 10,000 ten-letter words (lowercase)a-z = abcdefghijklmnopqrstuvwxyzprint time for 100,000 searchesrandom successful search (no output)Empirical tests of sequential search11 Whitelist filter scenario Whitelist of size N.

6 10N hypothesis that order of growth is N2 .Does NOT java Generator 10000 .. 3 seconds % java Generator 20000 .. 9 seconds % java Generator 40000 .. 35 seconds % java Generator 80000 .. 149 = 10 a-z | java TestSSNTN (seconds)TN/TN/2transactions per second10,00033,33320,00092,22240, ,14380, million38, million transactions at a rate of 34 per second and droppingHmmm. That doesn't seem too than hoursCOMPUTER Science SEDGEWICK/WAYNE PART I: PROGRAMMING IN sources 11. Sorting and Searching A typical client Binary search Insertion sort Mergesort Longest repeated substringCOMPUTER Science SEDGEWICK/WAYNE PART II: ALGORITHMS, THEORY, AND [i]0alice1bob2carlos3carol4craig5dave6er in7eve8frank9mallory10oscar11peggy12tren t13walter14wendyBinary searchpublic static int search(String key, String[] a) { for (int i = 0; i < ; i++) if ( a[i].compareTo(key) == 0 ) return i; return -1; }14 Binary search Keep the array in sorted order (stay tuned).

7 Examine the middle key. If it matches, return its index. If it is larger, search the half with lower indices. If it is smaller, search the half with upper found. Return 10 Binary search arithmetic15lohiSearch in a[lo,hi)midlohiLower half: a[lo,mid)midlohimid = lo + (hi-lo)/2midlohiUpper half: a[mid+1,hi)mid+1 Notation. a[lo,hi) means a[lo], a[lo+1] .. a[hi-1] (does not include a[hi]).Tricky! Needs search: Java implementationpublic static int search(String key, String[] a) { return search(key, a, 0, ); } public static int search(String key, String[] a, int lo, int hi) { if (hi <= lo) return -1; int mid = lo + (hi - lo) / 2; int cmp = a[mid].compareTo(key); if (cmp > 0) return search(key, a, lo, mid); else if (cmp < 0) return search(key, a, mid+1, hi); else return mid; }16himidloStill, this was easier than I thought!search("oscar") return search(.. 0, 15);Recursion trace for binary search17public static int search(String key, String[] a) { return search(key, a, 0, ); } public static int search(String key, String[] a, int lo, int hi) { if (hi <= lo) return -1; int mid = lo + (hi - lo) / 2; int cmp = a[mid].]]]]}

8 CompareTo(key); if (cmp > 0) return search(key, a, lo, mid); else if (cmp < 0) return search(key, a, mid+1, hi); else return mid; }search("oscar", a, 0, 15) mid = 7; > "eve" return search(.. 8, 15);search("oscar", a, 8, 15) mid = 11; < "peggy" return search(.. 8, 10);search("oscar", a, 8, 11) mid = 9; > "mallory" return search(.. 10, 11);search("oscar", a, 10, 11) mid = 10; == "oscar" return 10;ia[i]0alice1bob2carlos3carol4craig5da ve6erin7eve8frank9mallory10oscar11peggy1 2trent13walter14wendy10101010 Exact analysis for search miss for N = 2n 1 Note that n = lg(N+1) ~ lgN. Subarray size for 1st call is 2n 1. Subarray size for 2nd call is 2n 1 1. Subarray size for 3rd call is 2n 2 1.. Subarray size for nth call is 1. Total # compares (one per call): n ~ analysis of binary search18 Proposition. Binary search uses ~lg N compares for a search An (easy) exercise in discrete Binary search uses ~lg N compares for a random search A slightly more difficult exercise in discrete in details?

9 Take a course in search miss is a top-to-bottom path in this !Empirical tests of binary search19 Whitelist filter scenario Whitelist of size N. 10N hypothesis that order of growth is java Generator 100000 .. 1 seconds % java Generator 200000 .. 3 seconds % java Generator 400000 .. 6 seconds % java Generator 800000 .. 14 seconds % java Generator 1600000 .. 33 = 10 a-z | java TestBSa-z = abcdefghijklmnopqrstuvwxyzNTN (seconds)TN/TN/2transactions per second100,0001200,0003400,0006267,000800 , ,0001,600, , million264248,000nearly 50,000 transactions per second, and holdingGreat! But how do I get the list into sorted order at the beginning? Computer Science SEDGEWICK/WAYNE PART I: PROGRAMMING IN Sorting and Searching A typical client Binary search Insertion sort Mergesort Longest repeated substringCOMPUTER Science SEDGEWICK/WAYNE PART II: ALGORITHMS, THEORY, AND : Rearrange N items to put them in ascending orderApplications Binary search Statistics Databases Data compression Bioinformatics Computer graphics Scientific computing.

10 [Too numerous to list] 220wendy1alice2dave3walter4carlos5carol6 erin7oscar8peggy9trudy10eve11trent12bob1 3craig14frank15victor0alice1bob2carlos3c arol4craig5dave6erin7eve8frank9oscar10pe ggy11trent12trudy13victor14walter15wendy Pop quiz 0 on sortingQ. What s the most efficient way to sort 1 million 32-bit integers?23 Insertion sort algorithm240wendy1alice2dave3walter4carl os5carol6erin7oscar8peggy9trudy10eve11tr ent12bob13craig14frank15victoralicedavew alterwaltercarloswendycarolInsertion sort Move down through the array. Each item bubbles up above the larger ones above it. Everything above the current item is in order. Everything below the current item is bubble sort, but not bubble sort. We don't teach bubble sort any more because this is simpler and sort trace250wendy1alice2dave3walter4carlos5c arol6erin7oscar8peggy9trudy10eve11trent1 2bob13craig14frank15victoralicewendydave waltercarloscarolerinoscarpeggytrudyevet rentbobcraigfrankvictoralicedavewendywal tercarloscarolerinoscarpeggytrudyevetren tbobcraigfrankvictoralicedavewalterwendy carloscarolerinoscarpeggytrudyevetrentbo bcraigfrankvictoralicecarlosdavewalterwe ndycarolerinoscarpeggytrudyevetrentbobcr aigfrankvictoralicecarloscaroldavewalter wendyerinoscarpeggytrudyevetrentbobcraig frankvictoralicecarloscaroldaveerinwalte rwendyoscarpeggytrudyevetrentbobcraigfra nkvictoralicecarloscaroldaveerinoscarwal terwendypeggytrudyevetrentbobcraigfrankv ictoralicecarloscaroldaveerinoscarpeggyw alterwendytrudyevetrentbobcraigfrankvict oralicecarloscaroldaveerinoscarpeggytrud ywalterwendyevetrentbobcraigfrankvictora licecarloscaroldaveerineveoscarpeggytrud ywalterwendytrentbobcraigfrankvictoralic ecarloscaroldaveerineveoscarpeggytrenttr udywalterwendybobcraigfrankvictoralicebo bcarloscaroldaveerineveoscarpeggytrenttr udywalterwendycraigfrankvictoralicebobca rloscarolcraigdaveerineveoscarpeggytrent trudywalterwendyfrankvictoralicebobcarlo scarolcraigdaveerinevefrankoscarpeggytre nttrudywalterwendyvictoralicebobcarlosca rolcraigdaveerinevefrankoscarpeggytrentt rudyvictorwalterwendyInsertion sort: Java implementationpublic class Insertion { public static void sort(String[] a) { int N = ; for (int i = 1; i < N; i++) for (int j = i; j > 0; j--) if (a[j-1].)}}


Related search queries