Transcription of Open Data Structures
1 Open data Structures (in Java)Edition MorinContentsAcknowledgmentsixWhy This Book?xi1 The Need for Efficiency .. Interfaces .. ,Stack, andDequeInterfaces .. : Linear Sequences .. : Unordered Sets .. : Sorted Sets .. Mathematical Background .. and Logarithms .. Notation .. and Probability .. The Model of Computation .. Correctness, Time Complexity, and Space Complexity .. Code Samples .. List of data Structures .. Discussion and Exercises .. 262 Array-Based : Fast Stack Operations Using an Array .. Basics .. and Shrinking .. : An Optimized ArrayStack .. : An Array-Based Queue .. : Fast Deque Operations Using an Array .. : Building a Deque from Two Stacks .. : A Space-Efficient Array Stack .. of Growing and Shrinking .. Usage.
2 Square Roots .. Discussion and Exercises .. 593 Linked : A Singly-Linked List .. Operations .. : A Doubly-Linked List .. and Removing .. : A Space-Efficient Linked List .. Requirements .. Elements .. an Element .. an Element .. Analysis of Spreading and Gathering .. Discussion and Exercises .. 824 The Basic Structure .. : An EfficientSSet.. : An Efficient Random-AccessList.. Analysis of Skiplists .. Discussion and Exercises .. 1025 Hash : Hashing with Chaining .. Hashing .. : Linear Probing .. of Linear Probing .. Hashing .. Hash Codes .. Codes for Primitive data Types .. Codes for Compound Objects .. Codes for Arrays and Strings .. Discussion and Exercises .. 1286 Binary : A Basic Binary Tree.
3 Algorithms .. Binary Trees .. : An Unbalanced Binary Search Tree .. Discussion and Exercises .. 1477 Random Binary Search Random Binary Search Trees .. of Lemma .. : A Randomized Binary Search Tree .. Discussion and Exercises .. 168vContents8 Scapegoat : A Binary Search Tree with Partial Rebuild-ing .. of Correctness and Running-Time .. Discussion and Exercises .. 1819 Red-Black 2-4 Trees .. a Leaf .. a Leaf .. : A Simulated 2-4 Tree .. Trees and 2-4 Trees .. Red-Black Trees .. Summary .. Discussion and Exercises .. 20610 : An Implicit Binary Tree .. Summary .. : A Randomized Meldable Heap .. Analysis ofmerge(h1,h2) .. Summary .. Discussion and Exercises .. 22211 Sorting Comparison-Based Sorting .. Merge-Sort .. Quicksort.
4 Heap-sort .. A Lower-Bound for Comparison-Based Sorting .. Counting Sort and Radix Sort .. Counting Sort .. Radix-Sort .. Discussion and Exercises .. 24312 : Representing a Graph by a Matrix .. : A Graph as a Collection of Lists .. Graph Traversal .. Breadth-First Search .. Depth-First Search .. Discussion and Exercises .. 26113 data Structures for : A digital search tree .. : Searching in Doubly-Logarithmic Time .. : A Doubly-Logarithmic TimeSSet.. Discussion and Exercises .. 28014 External Memory The Block Store .. B-Trees .. Searching .. Addition .. Removal .. Amortized Analysis ofB-Trees .. Discussion and Exercises .. 304 Bibliography309 Index317viiAcknowledgmentsI am grateful to Nima Hoda, who spent a summer tirelessly proofread-ing many of the chapters in this book; to the students in the Fall 2011offering of COMP2402/2002, who put up with the first draft of this bookand spotted many typographic, grammatical, and factual errors; and toMorgan Tunzelmann at Athabasca University Press, for patiently editingseveral near-final This Book?
5 There are plenty of books that teach introductory data Structures . Someof them are very good. Most of them cost money, and the vast majorityof computer science undergraduate students will shell out at least somecash on a data Structures free data Structures books are available online. Some are verygood, but most of them are getting old. The majority of these books be-came free when their authors and/or publishers decided to stop updat-ing them. Updating these books is usually not possible, for two reasons:(1) The copyright belongs to the author and/or publisher, either of whommay not allow it. (2) Thesource codefor these books is often not avail-able. That is, the Word, WordPerfect, FrameMaker, or LATEX source forthe book is not available, and even the version of the software that han-dles this source may not be goal of this project is to free undergraduate computer science stu-dents from having to pay for an introductory data Structures book.
6 I havedecided to implement this goal by treating this book like an Open Sourcesoftware project. The LATEX source, Java source, and build scripts for thebook are available to download from the author s website1and also, moreimportantly, on a reliable source code management source code available there is released under a Creative CommonsAttribution license, meaning that anyone is free toshare: to copy, dis-tribute and transmit the work; and toremix: to adapt the work, includingthe right to make commercial use of the work. The only condition onthese rights isattribution: you must acknowledge that the derived workcontains code and/or text This Book?Anyone can contribute corrections/fixes using thegitsource-codemanagement system. Anyone can also fork the book s sources to developa separate version (for example, in another programming language). Myhope is that, by doing things this way, this book will continue to be a use-ful textbook long after my interest in the project, or my pulse, (whichevercomes first) has 1 IntroductionEvery computer science curriculum in the world includes a course on datastructures and algorithms.
7 data Structures arethatimportant; they im-prove our quality of life and even save lives on a regular basis. Manymulti-million and several multi-billion dollar companies have been builtaround data can this be? If we stop to think about it, we realize that we inter-act with data Structures constantly. Open a file: File system data Structures are used to locate the partsof that file on disk so they can be retrieved. This isn t easy; diskscontain hundreds of millions of blocks. The contents of your filecould be stored on any one of them. Look up a contact on your phone: A data structure is used to lookup a phone number in your contact list based on partial informationeven before you finish dialing/typing. This isn t easy; your phonemay contain information about a lot of people everyone you haveever contacted via phone or email and your phone doesn t have avery fast processor or a lot of memory.
8 Log in to your favourite social network: The network servers useyour login information to look up your account information. Thisisn t easy; the most popular social networks have hundreds of mil-lions of active users. Do a web search: The search engine uses data Structures to find theweb pages containing your search terms. This isn t easy; there are1 billion web pages on the Internet and each page contains alot of potential search terms. Phone emergency services (9-1-1): The emergency services networklooks up your phone number in a data structure that maps phonenumbers to addresses so that police cars, ambulances, or fire truckscan be sent there without delay. This is important; the person mak-ing the call may not be able to provide the exact address they arecalling from and a delay can mean the difference between life The Need for EfficiencyIn the next section, we look at the operations supported by the most com-monly used data Structures .
9 Anyone with a bit of programming experi-ence will see that these operations are not hard to implement can store the data in an array or a linked list and each operation canbe implemented by iterating over all the elements of the array or list andpossibly adding or removing an kind of implementation is easy, but not very efficient. Does thisreally matter? Computers are becoming faster and faster. Maybe the ob-vious implementation is good enough. Let s do some rough calculationsto find of operations:Imagine an application with a moderately-sizeddata set, say of one million (106), items. It is reasonable, in most appli-cations, to assume that the application will want to look up each itemat least once. This means we can expect to do at least one million (106)searches in this data . If each of these 106searches inspects each of the106items, this gives a total of 106 106= 1012(one thousand billion) speeds:At the time of writing, even a very fast desktop com-puter can not do more than one billion (109) operations per speeds are at most a few gigahertz (billions of cycles per second), and eachoperation typically takes a few Need for Efficiency that this application will take at least 1012/109= 1000 seconds, orroughly 16 minutes and 40 seconds.
10 Sixteen minutes is an eon in com-puter time, but a person might be willing to put up with it (if he or shewere headed out for a coffee break).Bigger data sets:Now consider a company like Google, that indexesover billion web pages. By our calculations, doing any kind of queryover this data would take at least seconds. We already know that thisisn t the case; web searches complete in much less than seconds, andthey do much more complicated queries than just asking if a particularpage is in their list of indexed pages. At the time of writing, Google re-ceives approximately 4,500 queries per second, meaning that they wouldrequire at least 4,500 = 38,250 very fast servers just to keep solution:These examples tell us that the obvious implementationsof data Structures do not scale well when the number of items,n, in thedata structure and the number of operations,m, performed on the datastructure are both large.