The Google File System

The Google File SystemSanjay Ghemawat, Howard Gobioff, and Shun-Tak LeungGoogle ABSTRACTWe have designed and implemented the Google File Sys-tem, a scalable distributed file System for large distributeddata-intensive applications. It provides fault tolerance whilerunning on inexpensive commodity hardware, and it delivershigh aggregate performance to a large number of sharing many of the same goals as previous dis-tributed file systems, our design has been driven by obser-vations of our application workloads and technological envi-ronment, both current and anticipated, that reflect a markeddeparture from some earlier file System assumptions. Thishas led us to reexamine traditional choices and explore rad-ically different design file System has successfully met our storage is widely deployed within Google as the storage platformfor the generation and processing of data used by our ser-vice as well as research and development efforts that requirelarge data sets.

The largest cluster to date provides hun-dreds of terabytes of storage across thousands of disks onover a thousand machines, and it is concurrently accessedby hundreds of this paper, we present file System interface extensionsdesigned to support distributed applications, discuss manyaspects of our design, and report measurements from bothmicro-benchmarks and real world and Subject DescriptorsD[4]: 3 Distributed file systemsGeneral TermsDesign, reliability, performance, measurementKeywordsFault tolerance, scalability, data storage, clustered storage The authors can be reached at the following to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page.

To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a 03,October 19 22, 2003, Bolton Landing, New York, 2003 ACM 1-58113-757-5/03/0010 ..$ INTRODUCTIONWe have designed and implemented the Google File Sys-tem (GFS) to meet the rapidly growing demands of Google sdata processing needs. GFS shares many of the same goalsas previous distributed file systems such as performance,scalability, reliability, and availability. However, its designhas been driven by key observations of our application work-loads and technological environment, both current and an-ticipated, that reflect a marked departure from some earlierfile System design assumptions.

We have reexamined tradi-tional choices and explored radically different points in thedesign , component failures are the norm rather than theexception. The file System consists of hundreds or eventhousands of storage machines built from inexpensive com-modity parts and is accessed by a comparable number ofclient machines. The quantity and quality of the compo-nents virtually guarantee that some are not functional atany given time and some will not recover from their cur-rent failures. We have seen problems caused by applicationbugs, operating System bugs, human errors, and the failuresof disks, memory, connectors, networking, and power sup-plies. Therefore, constant monitoring, error detection, faulttolerance, and automatic recovery must be integral to , files are huge by traditional standards.

Multi-GBfiles are common. Each file typically contains many applica-tion objects such as web documents. When we are regularlyworking with fast growing data sets of many TBs comprisingbillions of objects, it is unwieldy to manage billions of ap-proximately KB-sized files even when the file System couldsupport it. As a result, design assumptions and parameterssuch as I/O operation and block sizes have to be , most files are mutated by appending new datarather than overwriting existing data. Random writes withinafilearepracticallynon-existent. Oncewritten,thefilesare only read, and often only sequentially. A variety ofdata share these characteristics. Some may constitute largerepositories that data analysis programs scan through.

Somemay be data streams continuously generated by running ap-plications. Some may be archival data. Some may be in-termediate results produced on one machine and processedon another, whether simultaneously or later in time. Giventhis access pattern on huge files, appending becomes the fo-cus of performance optimization and atomicity guarantees,while caching data blocks in the client loses its , co-designing the applications and the file systemAPI benefits the overall System by increasing our example, we have relaxed GFS s consistency model tovastly simplify the file System without imposing an onerousburden on the applications. We have also introduced anatomic append operation so that multiple clients can appendconcurrently to a file without extra synchronization betweenthem.

These will be discussed in more details later in GFS clusters are currently deployed for differentpurposes. The largest ones have over 1000 storage nodes,over 300 TB of disk storage, and are heavily accessed byhundreds of clients on distinct machines on a AssumptionsIn designing a file System for our needs, we have beenguided by assumptions that offer both challenges and op-portunities. We alluded to some key observations earlierand now lay out our assumptions in more details. The System is built from many inexpensive commoditycomponents that often fail. It must constantly monitoritself and detect, tolerate, and recover promptly fromcomponent failures on a routine basis. The System stores a modest number of large files.

Weexpect a few million files, each typically 100 MB orlarger in size. Multi-GB files are the common caseand should be managed efficiently. Small files must besupported, but we need not optimize for them. The workloads primarily consist of two kinds of reads:large streaming reads and small random reads. Inlarge streaming reads, individual operations typicallyread hundreds of KBs, more commonly 1 MB or operations from the same client often readthrough a contiguous region of a file. A small ran-dom read typically reads a few KBs at some arbitraryoffset. Performance-conscious applications often batchand sort their small reads to advance steadily throughthe file rather than go back and forth. The workloads also have many large, sequential writesthat append data to files.

Typical operation sizes aresimilar to those for reads. Once written, files are sel-dom modified again. Small writes at arbitrary posi-tions in a file are supported but do not have to beefficient. The System must efficiently implement well-defined se-mantics for multiple clients that concurrently appendto the same file. Our files are often used as producer-consumer queues or for many-way merging. Hundredsof producers, running one per machine, will concur-rently append to a file. Atomicity with minimal syn-chronization overhead is essential. The file may beread later, or a consumer may be reading through thefile simultaneously. High sustained bandwidth is more important than lowlatency. Most of our target applications place a pre-mium on processing data in bulk at a high rate, whilefew have stringent response time requirements for anindividual read or InterfaceGFS provides a familiar file System interface, though itdoes not implement a standard API such as POSIX.

Files areorganized hierarchically in directories and identified by path-names. We support the usual operations tocreate,delete,open,close,read, , GFS hassnapshotandrecord appendopera-tions. Snapshot creates a copy of a file or a directory treeat low cost. Record append allows multiple clients to ap-pend data to the same file concurrently while guaranteeingthe atomicity of each individual client s append. It is use-ful for implementing multi-way merge results and producer-consumer queues that many clients can simultaneously ap-pend to without additional locking. We have found thesetypes of files to be invaluable in building large distributedapplications. Snapshot and record append are discussed fur-ther in Sections and ArchitectureAGFS clusterconsistsofasinglemasterand multiplechunkserversand is accessed by multipleclients,asshownin Figure 1.

The Google File System

Tags:

Information

Transcription of The Google File System

Related search queries

The Google File System

Tags:

Information

Documents from same domain

Related documents

Related search queries