Transcription of MapReduce: Simplified Data Processing on Large Clusters
{{id}} {{{paragraph}}}
MapReduce: Simplified Data Processing on Large is a programming model and an associ-ated implementation for Processing and generating largedata sets. Users specify amapfunction that processes akey/value pair to generate a set of intermediate key/valuepairs, and areducefunction that merges all intermediatevalues associated with the same intermediate key. Manyreal world tasks are expressible in this model, as shownin the written in this functional style are automati-cally parallelized and executed on a Large cluster of com-modity machines. The run-time system takes care of thedetails of partitioning the input data, scheduling the pro-gram s execution across a set of machines, handling ma-chine failures, and managing the required inter-machinecommunication. This allows programmers without anyexperience with parallel and distributed systems to eas-ily utilize the resources of a Large distributed implementation of MapReduce runs on a largecluster of commodity machines and is highly scalable:a typical MapReduce computation processes many ter-abytes of data on thousands of machines.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat jeff@google.com, sanjay@google.com Google, Inc. Abstract MapReduce is a programming model and an associ-ated implementation for processing and generating large data sets. Users specify a map function that processes a key/valuepairtogeneratea ...
Domain:
Source:
Link to this page:
Please notify us if you found a problem with this document:
{{id}} {{{paragraph}}}