Example: confidence

Introduction to Multicore Programming - Computer Science

Introduction to Multicore ProgrammingMarc Moreno MazaUniversity of Western Ontario, London, Ontario (Canada)CS 4435 - CS 9624(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96241 / 60 Plan1 Multi-core ArchitectureMulti-core processorCPU CacheCPU Coherence2 Concurrency PlatformsPThreadsTBBOpen MPCilk ++Race Conditions and CilkscreenMMM in Cilk++(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96242 / 60 Multi-core ArchitecturePlan1 Multi-core ArchitectureMulti-core processorCPU CacheCPU Coherence2 Concurrency PlatformsPThreadsTBBOpen MPCilk ++Race Conditions and CilkscreenMMM in Cilk++(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96243 / 60 Multi-core ArchitectureMulti-core processor(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96244 / 60 Multi-core ArchitectureMulti-core processor(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96245 / 60 Multi-core ArchitectureMulti-core processor(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96246 / 60 Multi-core ArchitectureMulti-core $$$PPPChip Multiprocessor (CMP)(Moreno Maza)Introductio

(Moreno Maza) Introduction to Multicore Programming CS 433 - CS 9624 14 / 60 Multi-core Architecture CPU Cache Cache Performance for SPEC CPU2000 by J.F. Cantin and M.D. Hill.

Tags:

  Introduction, Programming, Computer, Multicore, Introduction to multicore programming

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Advertisement

Transcription of Introduction to Multicore Programming - Computer Science

1 Introduction to Multicore ProgrammingMarc Moreno MazaUniversity of Western Ontario, London, Ontario (Canada)CS 4435 - CS 9624(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96241 / 60 Plan1 Multi-core ArchitectureMulti-core processorCPU CacheCPU Coherence2 Concurrency PlatformsPThreadsTBBOpen MPCilk ++Race Conditions and CilkscreenMMM in Cilk++(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96242 / 60 Multi-core ArchitecturePlan1 Multi-core ArchitectureMulti-core processorCPU CacheCPU Coherence2 Concurrency PlatformsPThreadsTBBOpen MPCilk ++Race Conditions and CilkscreenMMM in Cilk++(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96243 / 60 Multi-core ArchitectureMulti-core processor(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96244 / 60 Multi-core ArchitectureMulti-core processor(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96245 / 60 Multi-core ArchitectureMulti-core processor(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96246 / 60 Multi-core ArchitectureMulti-core $$$PPPChip Multiprocessor (CMP)(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96247 / 60 Multi-core ArchitectureMulti-core processorMulti-core processorA multi-core processor is an integrated circuit to which two or moreindividual processors (called cores in this sense)

2 Have been a many-core processor the number of cores is large enough thattraditional multi-processor techniques are no longer on a multi-core device can be coupled tightly or loosely:may share or may not share a cache,implement inter-core communications methods or message on a multi-core implement the same architecture features assingle-core systems such as instruction pipeline parallelism (ILP),vector-processing, SIMD or applications do not realize yet large speedup factors:parallelizing algorithms and software is a major on-going research area.(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96248 / 60 Multi-core ArchitectureCPU CacheCPU Cache (1/7)A CPU cache is an auxiliary memory which is smaller, faster memorythan the main memory and which stores copies of of the mainmemory locations that are expectedly frequently modern desktop and server CPUs have at least threeindependent caches: the data cache, the instruction cache and thetranslation look-aside buffer.

3 (Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 96249 / 60 Multi-core ArchitectureCPU CacheCPU Cache (2/7)Each location in each memory (main or cache) hasa datum (cache line) which ranges between 8 and 512 bytes in size,while a datum requested by a CPU instruction ranges between 1 unique index (called address in the case of the main memory)In the cache, each location has also a tag (storing the address of thecorresponding cached datum).(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962410 / 60 Multi-core ArchitectureCPU CacheCPU Cache (3/7)When the CPU needs to read or write a location, it checks the cache:if it finds it there, we have a cache hitif not, we have a cache miss and (in most cases) the processor needs tocreate a new entry in the room for a new entry requires a replacement policy: the LeastRecently Used (LRU) discards the least recently used items first; thisrequires to use age bits.

4 (Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962411 / 60 Multi-core ArchitectureCPU CacheCPU Cache (4/7)Read latency (time to read a datum from the main memory) requiresto keep the CPU busy with something else:out-of-order execution: attempt to execute independent instructionsarising after the instruction that is waiting due to thecache misshyper-threading (HT): allows an alternate thread to use the CPU(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962412 / 60 Multi-core ArchitectureCPU CacheCPU Cache (5/7)Modifying data in the cache requires a write policy for updating themain memory- write-through cache: writes are immediately mirrored to mainmemory- write-back cache: the main memory is mirrored when that data isevicted from the cacheThe cache copy may become out-of-date or stale, if other processorsmodify the original entry in the main memory.

5 (Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962413 / 60 Multi-core ArchitectureCPU CacheCPU Cache (6/7)The replacement policy decides where in the cache a copy of aparticular entry of main memory will go:- fully associative: any entry in the cache can hold it- direct mapped: only one possible entry in the cache can hold it-N-way set associative:Npossible entries can hold it(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962414 / 60 Multi-core ArchitectureCPU CacheCache Performance for SPEC CPU2000by Cantin and SPEC CPU2000 suite is a collection of 26 compute-intensive, non-trivialprograms used to evaluate the performance of a Computer s CPU, memorysystem, and compilers ( ).(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962415 / 60 Multi-core ArchitectureCPU CoherenceCache Coherence (1/6)x= xx=3 PPPF igure: ProcessorP1readsx=3first from the backing store (higher-level memory)(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962416 / 60 Multi-core ArchitectureCPU CoherenceCache Coherence (2/6)x= xx=3x=3 PPPF igure: Next, ProcessorP2loadsx=3from the same memory(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962417 / 60 Multi-core ArchitectureCPU CoherenceCache Coherence (3/6)x= xx=3x=3x=3 PPPF igure.

6 ProcessorP4loadsx=3from the same memory(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962418 / 60 Multi-core ArchitectureCPU CoherenceCache Coherence (4/6)x= x=5x=3x=3x=3 PPPF igure: ProcessorP2issues a writex=5(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962419 / 60 Multi-core ArchitectureCPU CoherenceCache Coherence (5/6)x= x=5x=3x=5x=3 PPPF igure: ProcessorP2writesx=5in his local cache(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962420 / 60 Multi-core ArchitectureCPU CoherenceCache Coherence (6/6)x= xx=3x=5x=3 PPPF igure: ProcessorP1issues a readx, which is now invalid in its cache(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962421 / 60 Multi-core ArchitectureCPU CoherenceMSI ProtocolIn this cache coherence protocol each block contained inside a cachecan have one of three possible states:- M:the cache line has beenmodifiedand the corresponding data isinconsistent with the backing store; the cache has the responsibility towrite the block to the backing store when it is S:this block is unmodified and isshared, that is, exists in at least onecache.

7 The cache can evict the data without writing it to the I:this block isinvalid, and must be fetched from memory or anothercache if the block is to be stored in this coherency states are maintained through communicationbetween the caches and the backing caches have different responsibilities when blocks are read orwritten, or when they learn of other caches issuing reads or writes fora block.(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962422 / 60 Multi-core ArchitectureCPU CoherenceTrue Sharing and False SharingTrue sharing:True sharing cache misses occur whenever two processors access thesame data wordTrue sharing requires the processors involved to explicitly synchronizewith each other to ensure program computation is said to havetemporal localityif it re-uses much ofthe data it has been with high temporal locality tend to have less true sharing.

8 False sharing results when different processors use different data thathappen to be co-located on the same cache lineA computation is said to havespatial localityif it uses multiple wordsin a cache line before the line is displaced from the cacheEnhancing spatial locality often minimizes false sharingSeeData and Computation Transformations for Anderson, Amarasinghe and (Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962423 / 60 Multi-core ArchitectureCPU CoherenceMulti-core processor (cntd)Advantages:Cache coherency circuitry operate at higher rate than power consumption for a dual core vs two coupled single-coreprocessors (better quality communication signals, cache can be shared)Challenges:Adjustments to existing software (including OS) are required tomaximize performanceProduction yields down (an Intel quad-core is in fact a doubledual-core)Two processing cores sharing the same bus and memory bandwidthmay limit performancesHigh levels of false or true sharing and synchronization can easilyoverwhelm the advantage of parallelism(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962424 / 60 Concurrency PlatformsPlan1 Multi-core ArchitectureMulti-core processorCPU CacheCPU Coherence2 Concurrency PlatformsPThreadsTBBOpen MPCilk ++Race Conditions and CilkscreenMMM in Cilk++(Moreno Maza)

9 Introduction to Multicore ProgrammingCS 433 - CS 962425 / 60 Concurrency PlatformsConcurrency PlatformsProgramming directly on processor cores is painful and platformsabstract processor cores, handles synchronization, communicationprotocols(optionally) perform load balancingExamples of concurrency platforms:PthreadsThreading Building Blocks (TBB)OpenMPCilk++We use an implementation of the Fibonacci sequenceFn+2=Fn+1+Fnto compare these four concurrency platforms.(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962426 / 60 Concurrency PlatformsFibonacci Executionfib(4)fib(3)fib(2)fib(3)fib(2)f ib(1)fib(2)fib(1)fib(0)fib(1)fib(0)int fib(int n)Key idea for parallelizationThl l tiffib(1){if (n < 2) return n;else {intx = fib(n-1);The calculations of fib(n-1)and fib(n-2)can be executed simultaneously intx = fib(n-1);int y = fib(n-2);return x + y;}without mutual interference.}

10 }(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962427 / 60 Concurrency PlatformsPThreadsPThreadsPthreads is a POSIX standard for threads, communicating thoughshared defines a set of C Programming language types, functionsand is implemented with and a thread can use Pthreads to create, manipulate and particular, programmers can synchronize between threads usingmutexes, condition variables and is a Do-it-yourself concurrency platform: programmers have tomap threads onto the Computer resources (static scheduling).(Moreno Maza) Introduction to Multicore ProgrammingCS 433 - CS 962428 / 60 Concurrency PlatformsPThreadsKey PThread Functionint pthread_create(pthread_t *thread,//returned identifier for the new threadconst pthread_attr_t *attr,//object to set thread attributes (NULL for default)void *(*func)(void *),//routine executed after creationvoid *arg//a single argument passed to func) //returns error statusint pthread_join (pthread_t thread,//identifier of thread to wait forvoid **status//terminating thread s status (NULL to ignore)) //returns error status*WinAPI threads provide similar functionality.


Related search queries