Example: biology

A Scalable, Commodity Data Center Network …

A scalable , Commodity data Center Network ArchitectureMohammad of Computer Science and EngineeringUniversity of California, San DiegoLa Jolla, CA 92093-0404 ABSTRACTT oday s data centers may contain tens of thousands of computerswith significant aggregate bandwidth requirements. The networkarchitecture typically consists of a tree of routing and switchingelements with progressively more specialized and expensive equip-ment moving up the Network hierarchy. Unfortunately, even whendeploying the highest-end IP switches/routers, resulting topologiesmay only support 50% of the aggregate bandwidth available at theedge of the Network , while stillincurring tremendous cost.

A Scalable, Commodity Data Center Network Architecture Mohammad Al-Fares malfares@cs.ucsd.edu Alexander Loukissas aloukiss@cs.ucsd.edu Amin Vahdat

Tags:

  Architecture, Network, Center, Data, Commodity, Scalable, Commodity data center network, Commodity data center network architecture

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of A Scalable, Commodity Data Center Network …

1 A scalable , Commodity data Center Network ArchitectureMohammad of Computer Science and EngineeringUniversity of California, San DiegoLa Jolla, CA 92093-0404 ABSTRACTT oday s data centers may contain tens of thousands of computerswith significant aggregate bandwidth requirements. The networkarchitecture typically consists of a tree of routing and switchingelements with progressively more specialized and expensive equip-ment moving up the Network hierarchy. Unfortunately, even whendeploying the highest-end IP switches/routers, resulting topologiesmay only support 50% of the aggregate bandwidth available at theedge of the Network , while stillincurring tremendous cost.

2 Non-uniform bandwidth among data Center nodes complicates applica-tion design and limits overall system this paper, we show how to leverage largely Commodity Eth-ernet switches to support the full aggregate bandwidth of clustersconsisting of tens of thousands of elements. Similar to how clustersof Commodity computers have largely replaced more specializedSMPs and MPPs, we argue that appropriately architected and inter-connected Commodity switches may deliver more performance atless cost than available from today s higher-end solutions. Our ap-proach requires no modifications to the end host Network interface,operating system, or applications; critically, it is fully backwardcompatible with Ethernet, IP, and and Subject [ Network architecture and Design]: Network topology; [ Network Protocols]: Routing protocolsGeneral TermsDesign, Performance, Management, ReliabilityKeywordsData Center topology, equal-cost routing1.

3 INTRODUCTIONG rowing expertise with clusters of Commodity PCs have enableda number of institutions to harness petaflops of computation powerand petabytes of storage in a cost-efficient manner. Clusters con-sisting of tens of thousands of PCs are not unheard of in the largestPermission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a 08,August 17 22, 2008, Seattle, Washington, 2008 ACM 978-1-60558-175-0/08/08.

4 $ and thousand-node clusters are increasingly commonin universities, research labs, and companies. Important applica-tions classes include scientific computing, financial analysis, dataanalysis and warehousing, and large-scale Network , the principle bottleneck in large-scale clusters is ofteninter-node communication bandwidth. Many applications must ex-change information with remote nodes to proceed with their localcomputation. For example, MapReduce [12] must perform signif-icant data shuffling to transport the output of its map phase beforeproceeding with its reduce phase. Applications running on cluster-based file systems [18, 28, 13, 26] often require remote-node ac-cess before proceeding with their I/O operations.

5 A query to aweb search engine often requires parallel communication with ev-ery node in the cluster hosting the inverted index to return the mostrelevant results [7]. Even between logically distinct clusters, thereare often significant communication requirements, , when up-dating the inverted index for individual clusters performing searchfrom the site responsible for building the index. Internet servicesincreasingly employ service oriented architectures [13], where theretrieval of a single web page can require coordination and commu-nication with literallyhundreds of individual sub-services runningon remote nodes.

6 Finally, the significant communication require-ments of parallel scientific applications are well known [27, 8].There are two high-level choices for building the communicationfabric for large-scale clusters. One option leverages specializedhardware and communication protocols, such as InfiniBand [2] orMyrinet [6]. While these solutions can scale to clusters of thou-sands of nodes with high bandwidth, they do not leverage com-modity parts (and are hence more expensive) and are not nativelycompatible with TCP/IP applications. The second choice lever-ages Commodity Ethernet switches and routers to interconnect clus-ter machines.

7 This approach supports a familiar management in-frastructure along with unmodified applications, operating systems,and hardware. Unfortunately, aggregate cluster bandwidth scalespoorly with cluster size, and achieving the highest levels of band-width incurs non-linear cost increases with cluster compatibility and cost reasons, most cluster communicationsystems follow the second approach. However, communicationbandwidth in large clusters may become oversubscribed by a sig-nificant factor depending on the communication patterns. That is,two nodes connected to the same physical switch may be able tocommunicate at full bandwidth ( , 1 Gbps) but moving betweenswitches, potentially across multiple levels in a hierarchy, maylimit available bandwidth severely.

8 Addressing these bottlenecksrequires non- Commodity solutions, , large 10 Gbps switches androuters. Further, typical single path routing along trees of intercon-nected switches means that overall cluster bandwidth is limited bythe bandwidth available at the root of the communication as we are at a transitionpoint where 10 Gbps technology isbecoming cost-competitive, the largest 10 Gbps switches still incursignificant cost and still limit overall available bandwidth for thelargest this context, the goal of this paper is to design a data centercommunication architecture that meets the following goals: scalable interconnection bandwidth.

9 It should be possible foran arbitrary host in the data Center to communicate with anyother host in the Network at the full bandwidth of its localnetwork interface. Economies of scale: just as Commodity personal computersbecame the basis for large-scale computing environments,we hope to leverage the same economies of scale to makecheap off-the-shelf Ethernet switches the basis for large-scale data Center networks. Backward compatibility: the entire system should be back-ward compatible with hosts running Ethernet and IP. That is,existing data centers, which almost universally leverage com-modity Ethernet and run IP, should be able to take advantageof the new interconnect architecture with no show that by interconnecting Commodity switches in a fat-tree architecture , we can achieve the full bisection bandwidth ofclusters consisting of tens of thousands of nodes.

10 Specifically, oneinstance of our architecture employs 48-port Ethernet switches ca-pable of providing full bandwidth to up 27,648 hosts. By leveragingstrictly Commodity switches, we achieve lower cost than existingsolutions while simultaneously delivering more bandwidth. Our so-lution requires no changes to end hosts, is fully TCP/IP compatible,and imposes only moderate modifications to the forwarding func-tions of the switches themselves. We also expect that our approachwill be theonlyway to deliver full bandwidth for large clustersonce 10 GigE switches become Commodity at the edge, given thecurrent lack of any higher-speed Ethernet alternatives (at any cost).


Related search queries