Example: bachelor of science

Cache Performance and Set Associative Cache

Cache Performance and Set Associative CacheLecture 12 CDA 310306-30-2014 chapter 5 Large and Fast: Exploiting Memory Hierarchy 2 Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to be accessed again soon , instructions in a loop, induction variables Spatial locality Items near those accessed recently are likely to be accessed soon , sequential instruction access, array data IntroductionChapter 5 Large and Fast: Exploiting Memory Hierarchy 3 Memory Hierarchy Levels Block (aka line): unit of copying May be multiple words If accessed data is present in upper level Hit: access satisfied by upper level Hit ratio: hits/accesses If accessed data is absent Miss: block copied from lower level Time taken: miss penalty Miss ratio: misses

Jun 30, 2014 · Chapter 5 —Large and Fast: Exploiting Memory Hierarchy —36 How Much Associativity Increased associativity decreases miss rate But with diminishing returns Simulation of a system with 64KB D-cache, 16-word blocks, SPEC2000 1-way: 10.3% 2-way: 8.6% 4-way: 8.3% 8-way: 8.1%

Tags:

  Chapter, Spec2000

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of Cache Performance and Set Associative Cache

1 Cache Performance and Set Associative CacheLecture 12 CDA 310306-30-2014 chapter 5 Large and Fast: Exploiting Memory Hierarchy 2 Principle of Locality Programs access a small proportion of their address space at any time Temporal locality Items accessed recently are likely to be accessed again soon , instructions in a loop, induction variables Spatial locality Items near those accessed recently are likely to be accessed soon , sequential instruction access, array data IntroductionChapter 5 Large and Fast: Exploiting Memory Hierarchy 3 Memory Hierarchy Levels Block (aka line): unit of copying May be multiple words If accessed data is present in upper level Hit: access satisfied by upper level Hit ratio: hits/accesses If accessed data is absent Miss: block copied from lower level Time taken: miss penalty Miss ratio: misses/accesses= 1 hit ratio Then accessed data supplied from upper levelChapter 5 Large and Fast.

2 Exploiting Memory Hierarchy 4 Memory Technology Static RAM (SRAM) , $2000 $5000 per GB Dynamic RAM (DRAM) 50ns 70ns, $20 $75 per GB Magnetic disk 5ms 20ms, $ $2 per GB Ideal memory Access time of SRAM Capacity and cost/GB of disk Memory TechnologiesChapter 6 Storage and Other I/O Topics 5 Disk Storage Nonvolatile, rotating magnetic storage Disk StorageChapter 5 Large and Fast: Exploiting Memory Hierarchy 6 Address SubdivisionThe number of bits in Cache ? chapter 5 Large and Fast: Exploiting Memory Hierarchy 7 2nx (block size + tag size + valid field size) Cache size is 2nblocks Block size is 2mwords (2m+2words) Size of tag field 32 (n + m + 2) Therefore, 2nx (2m x32 + 32 (n + m + 2) + 1) = 2nx (2m x 32 + 31 n -m)Question?

3 chapter 5 Large and Fast: Exploiting Memory Hierarchy 8 How many total bits are required for a direct mapped Cache with 16 KiB of data and 4-word blocks, assuming 32 bit address? 2nx (2m x 32 + 31 n -m)AnwerChapter 5 Large and Fast: Exploiting Memory Hierarchy 9 16 KiB = 4096 (212words) With Block size of 4 words (22) there are 1024 (210) blocks. Each block has 4 x 32 or 128 bits of data plus a tag which is 32 10 2 2 bits, plus a valid bit Thus total Cache size is 210x (4x 32 + (32 10 2 -2) + 1) = 210 x 147 = 147 KibiBitsChapter 5 Large and Fast: Exploiting Memory Hierarchy 10 Example: Larger Block Size 64 blocks, 16 bytes/block To what block number does address 1200 map?

4 Block address = 1200/16 = 75 Block number = 75 modulo 64 = 11 TagIndexOffset034910314 bits6 bits22 bitsChapter 5 Large and Fast: Exploiting Memory Hierarchy 11 Block Size Considerations Larger blocks should reduce miss rate Due to spatial locality But in a fixed-sized Cache Larger blocks fewer of them More competition increased miss rate Larger blocks pollution Larger miss penalty Can override benefit of reduced miss rate Early restart and critical-word-first can helpBlockSizeTradeoff BenefitsofLargerBlockSize SpatialLocality:ifweaccessagivenword,we relikelytoaccessothernearbywordssoon VeryapplicablewithStored-ProgramConcept: ifweexecuteagiveninstruction, it slikelythatwe llexecutethe next fewaswell Worksnicelyinsequentialarrayaccessestoo DrawbacksofLargerBlockSize Largerblock sizemeanslargermisspenalty onamiss,takeslongertimetoloadanewblock fromnext level Ifblock sizeistoobigrelative tocachesize,thentherearetoofewblocks Result:missrategoesupDr.

5 Dan GarciaExtremeExample:OneBigBlock CacheSize=4bytesBlockSize=4bytes OnlyONEentry(row)inthe Cache ! Ifitemaccessed,likelyaccessedagainsoon Butunlikelywillbeaccessedagainimmediatel y! Thenextaccesswilllikelytobeamissagain Continuallyloadingdataintothe cachebutdiscarddata(forceout)beforeuseit again Nightmareforcachedesigner: PingPongEffectTagCacheDataValidBitB 3B 2B 1B 0Dr. Dan GarciaBlockSizeTradeoffConclusionsMiss PenaltyBlock SizeIncreasedMissPenalty&MissRateAverage Access TimeBlock SizeExploitsSpatialLocalityFewerblocks:c ompromisestemporallocalityMiss RateBlock SizeDr.

6 Dan GarciaWhattodoonawritehit? Write-through updatethe wordincacheblock andcorrespondingwordinmemory Write-back updatewordincacheblock allowmemorywordtobe stale add dirty bittoeachblockindicatingthatmemoryneedst obeupdatedwhenblockisreplaced Performancetrade-offs?Dr. Dan GarciaChapter 5 Large and Fast: Exploiting Memory Hierarchy 16 Write-Through On data-write hit, could just update the block in Cache But then Cache and memory would be inconsistent Write through: also update memory But makes writes take longer , if base CPI = 1, 10% of instructions are stores, write to memory takes 100 cycles Effective CPI = 1 + 100 = 11 Solution: write buffer Holds data waiting to be written to memory CPU continues immediately Only stalls on write if write buffer is already fullChapter 5 Large and Fast.

7 Exploiting Memory Hierarchy 17 Write-Back Alternative: On data-write hit, just update the block in Cache Keep track of whether each block is dirty When a dirty block is replaced Write it back to memory Can use a write buffer to allow replacing block to be read firstChapter 5 Large and Fast: Exploiting Memory Hierarchy 18 Write Allocation What should happen on a write miss? Alternatives for write-through Allocate on miss: fetch the block Write around: don t fetch the block Since programs often write a whole block before reading it ( , initialization) For write-back Usually fetch the blockChapter 5 Large and Fast: Exploiting Memory Hierarchy 19 Example: Intrinsity FastMATH Embedded MIPS processor 12-stage pipeline Instruction and data access on each cycle Split Cache : separate I- Cache and D- Cache Each 16KB: 256 blocks 16 words/block D- Cache : write-through or write-back spec2000 miss rates I- Cache : D- Cache .

8 Weighted average: 5 Large and Fast: Exploiting Memory Hierarchy 20 Example: Intrinsity FastMATHT ypesofCacheMisses(1/2) ThreeCs ModelofMisses 1stC:CompulsoryMisses occurwhenaprogramisfirststarted cachedoesnotcontainanyofthatprogram sdatayet,somissesareboundtooccur can tbeavoidedeasily,sowon tfocusontheseinthiscoursePandora uses Cache warm upWhen should be Cache Performance measured?Dr. Dan GarciaTypesofCacheMisses(2/2) 2ndC:ConflictMisses missthatoccursbecausetwodistinct memoryaddressesmaptothe samecachelocation twoblocks(whichhappentomaptothe samelocation)cankeepoverwritingeachother bigproblemindirect-mappedcaches howdowelessenthe effectofthese?

9 DealingwithConflictMisses Solution1:Makethe cachesizebigger Failsatsomepoint Solution2:Multipledistinct blockscanfitinthe samecacheIndex?Dr. Dan GarciaFullyAssociativeCache(1/3) Memoryaddressfields: Tag:sameasbefore Offset:sameasbefore Index:non-existant Whatdoesthismean? no rows :anyblock cangoanywhereinthe Cache must comparewithalltagsinentirecachetoseeifda taisthereDr. Dan GarciaFullyAssociativeCache(2/3) FullyAssociativeCache( ,32 Bblock) comparetagsinparallelByteOffset:CacheDat aB00431:CacheTag (27 bits long)Valid:B 31B 1:CacheTag====:=Dr. Dan GarciaFullyAssociativeCache(3/3) BenefitofFullyAssocCache NoConflictMisses(sincedatacangoanywhere) DrawbacksofFullyAssocCache Needhardwarecomparatorforeverysingleentr y:ifwehavea64 KBofdataincachewith4 Bentries,weneed16 Kcomparators:infeasibleDr.

10 Dan GarciaFinalTypeofCacheMiss 3rdC:CapacityMisses missthatoccursbecausethe cachehasalimitedsize missthatwouldnotoccurifweincreasethe sizeofthe Cache sketchydefinition,sojustgetthe generalidea Dan GarciaN-WaySetAssociativeCache(1/3) Memoryaddressfields: Tag:sameasbefore Offset:sameasbefore Index:pointsustothe correct row (calledasetinthiscase) Sowhat sthedifference? eachsetcontains multipleblocks oncewe vefoundcorrectset,must comparewithalltagsinthatsettofindourdata Is the temporal or spatial locality exploited here?Dr. Dan GarciaAssociativeCacheExample Here sasimple2-way Address0123456789 ABCDEFC acheIndex0011Dr.


Related search queries