Example: marketing

CACHE MISSING FOR FUN AND PROFIT - …

| putsimply, theshar-ingof theexecutionresourcesof a superscalarprocessorbetweenmultipleexecu tionthreads| hasrecentlybecomewidespreadviaitsintrodu ction(underthename\Hyper-Threading")into IntelPentium4 thisimplementation,forreasonsof ef- ciencyandeconomy of processorarea,thesharingof processorresourcesbetweenthreadsextendsb eyondtheexecutionunits;ofparticularconce rnis thatthethreadsshareaccessto demonstratethatthissharedaccessto memorycachespro-videsnotonlyaneasilyused highbandwidthcovertchannelbe-tweenthread s,butalsopermitsa maliciousthread(operating,intheory, withlimitedprivileges)to monitortheexecutionof anotherthread,allowingin many casesfortheftof , we providesomesuggestionsto processordesigners,op-eratingsystemvendo rs,andtheauthorsof cryptographicsoftware,of how thisattack couldbe mitigatedor improved,provid-ingnotonlyfastertransist orsbutsmallertransistors,processordesign -ershave beenm

CACHE MISSING FOR FUN AND PROFIT 3 If the two processes do not share any reference le, this approach will not work, but instead an opposite approach may be …

Tags:

  Missing, Cache, Cache missing

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of CACHE MISSING FOR FUN AND PROFIT - …

1 | putsimply, theshar-ingof theexecutionresourcesof a superscalarprocessorbetweenmultipleexecu tionthreads| hasrecentlybecomewidespreadviaitsintrodu ction(underthename\Hyper-Threading")into IntelPentium4 thisimplementation,forreasonsof ef- ciencyandeconomy of processorarea,thesharingof processorresourcesbetweenthreadsextendsb eyondtheexecutionunits;ofparticularconce rnis thatthethreadsshareaccessto demonstratethatthissharedaccessto memorycachespro-videsnotonlyaneasilyused highbandwidthcovertchannelbe-tweenthread s,butalsopermitsa maliciousthread(operating,intheory, withlimitedprivileges)to monitortheexecutionof anotherthread,allowingin many casesfortheftof , we providesomesuggestionsto processordesigners,op-eratingsystemvendo rs,andtheauthorsof cryptographicsoftware,of how thisattack couldbe mitigatedor improved,provid-ingnotonlyfastertransist orsbutsmallertransistors,processordesign -ershave beenmetwithtwo ,memorylatencieshave increaseddramaticallyin relative terms.

2 Andsecond,whileit iseasyto spendextratransistorsonbuildingadditiona lexecutionunits,many programshave fairlylimitedinstruction-level parallelism,whichlimitstheextent to which additionalexecutionresourcescanbe partialsolutiontothe rstproblem,whileout-of-orderexecutionpro videsa partialsolutionto 1995,simultaneousmultithreadingwas revived1in orderto com-batthesetwo di culties[12].Whereout-of-orderexecutional lowsinstructionsto be reordered(subjectto maintainingarchitecturalse-mantics)withi na narrow window of perhapsa hundredinstructions,Keywords ,simultaneousmultithreading, least1974in theory[10], evenif it hadnotyet beenshownto be be reorderedacrossthreads.

3 Thatis, ratherthanhavingtheoperatingsystemperfor mcon-textswitchesbetweentwo threads,it canscheduleboththreadssimul-taneouslyont hesameprocessor,andinstructionswillbe interleaved,dramaticallyincreasingtheuti lizationof withHyper-Threadingprocessor,withwhich theremainderof thispaper is concerned2, thetwo threadsbeingexecutedoneach processorsharemorethanmerelytheexecu-tio nunits;of particularconcernto us,theyshareaccessto thememorycaches[8]. Cacheshave alreadybeendemonstratedto be cryptograph-icallydangerous:Many implementationsof AES[9] aresubjectto tim-ingattacks arisingfromthenon-constancyof S-box lookuptimings[1].

4 However,havingcachessharedbetweenthreads providesa vastlymoredangerousavenueof communicationvia pagingTo seehow sharedcachescancreatea cryptographicside-channel,we rststepback fora moment to a simplerproblem| covertchannels[7]|andoneof theclassicexamplesof such a processes,knownas theTrojanprocessandtheSpyprocess,operati ngatdi erent privilegelevelsona multilevel securesystem,butbothwithaccessto somelargereference le(naturally, ona multilevel securesystemthisaccesswouldnecessarilybe read-only).TheTrojanprocessnow readsa subsetof pagesin thisreference le,resultingin pagefaultswhich loadtheselectedpagesfromdiskintomemory.

5 Oncethisis complete(oreven in themiddleof thisoperation)theSpy processreadseverypageof thereference leandmeasuresthetimetakenforeach beenpreviouslyreadby theTrojanprocesswillcompleteveryquickly, whilethosepageswhich have notalreadybeenreadwillincurthe(easilymea surable)costof a thismanner,theTrojanprocesscanrepeatedly communicateonebitof informationto theSpyprocessin thetimeit takes fora pagetobe loadedfromdiskintomemory, upto a totalnumber of bitsequalto thesize(inpages)of thesharedreference withHyper-Threadingprocessorforreasonsof availability, butexpectthattheresultsin thispaper FUNANDPROFIT3If thetwo processesdonotshareany reference le,thisapproachwillnotwork,butinsteadano ppositeapproach may be taken:Insteadof faultingpagesintomemory, theTrojanprocesscanfaultpagesoutof memory.

6 AssumethattheTrojanandSpy processeseach have anaddressspaceof morethanhalfof theavailablesystemmemoryandtheoperatings ystemusesa transmita \one"bit,theTrojanprocessreadsitsentirea ddressspace;to transmita \zero"bit,theTrojanprocessspinsforthesam eamount of timewhileonlyaccessinga singlepageof memory. TheSpyprocessnow repeatedlymeasurestheamount of timeneededto theTrojanprocesswassendinga \one"bit,thentheoperatingsystemwillhave evictedpagesownedby theSpy processfrommemory, andthenecessarydiskactivity whenthosepagesareaccessedwillprovideanea silymeasurabletimedi bandwidththanthepreviouschannel| it operatesat a fractionof a bitper second,comparedto afewhundredbitsper second| it demonstrateshow a sharedcache canbe usedas a covertchannel,evenif thetwo communicatingprocessesdonothave sharedaccessto any CACHE missingTheL1datacachein thePentium4 consistsof 128cachelinesof64byteseach.

7 Organizedinto 324-way associative executionthreads;as such, each ofthe32cachesetsbehaves in thesamemanneras thepagingsystemdiscussedin theprevioussection:Thethreadscannotcommu nicatebyloadingdataintothecache,sincenod atais sharedbetweenthetwothreads3, buttheycancommunicateviaa timingchannelby forcingeach other'sdataoutof covertchannelcanthereforebe constructedas follows:TheTrojanprocessallocatesanarray of 2048bytes,andforeach 32-bitworditwishesto transmit,it accessesbyte64iof thearray i bitiof processallocatesanarray of 8192bytes,andrepeatedlymeasurestheamount of timeneededtoreadbytes64i, 64i+ 2048,64i+ 4096,and64i+ 6144foreach 0 i < memoryaccessperformedby theTrojanwillevicta cachelineownedby theSpy,3 Bydefault, CACHE linesaretaggedaccordingto which thread\owns"themandcannotbe accessedby theotherthread.

8 Thisbehaviourmay be modi edby theoperatingsystem,butonlyto theextent of allowingcachelinesharingbetweenthreadswi ththesamepagingtables,andsuch ecx,start_of_buffersub length_of_buffer,0x2000rdtscmov esi,eaxxor edi,ediloop:prefetcht2[ecx+ edi + 0x2800]add cx, [ecx+ edi+ 0x0000]imulecx,1add cx, [ecx+ edi+ 0x0800]imulecx,1add cx, [ecx+ edi+ 0x1000]imulecx,1add cx, [ecx+ edi+ 0x1800]imulecx,1rdtscsub eax,esimov [ecx+ edi],axadd esi,eaximulecx,1add edi,0x40testedi,0x7C0jnz loopsub edi,0x7 FEtestedi,0x3 Ejnz loopadd edi,0x7C0sub length_of_buffer,0x800jge Spy FUNANDPROFIT5resultingin linesbeingreloadedfromtheL2cache4, which addsanadditionallatencyof approximately30cyclesif measurable,thankstothelonglatencyof theRDTSC(readtimestampcounter)

9 Instruction,butthisproblemis resolved by addingsomehigh-latencyinstructions{ forexample,integermultiplications{ into Figure1we show anexampleof how theSpy processcouldmeasureandrecordtheamount of timerequiredto accessallthecachelinesof each ,32bitscanbe reliablytransmittedfromtheTro-janto theSpy in roughly5000cycleswitha biterrorrateof under25%;usinganappropriateerrorcorrecti ngcode,thisprovidesa covertchannelof 400kilobytesper secondona CACHE missingThesamegeneralapproach is e ective in respectof theL2cache,witha L2cache (onthepar-ticularmodelwhich we areexamining)consistsof 4096cachelinesof128byteseach, organizedinto 5128-way associative ,thedataTLBholdsonly64 entries| onlyenoughto provideaddressmappingsforhalfof thecacheddata5.}}

10 Asa result,a Spy processop-eratingin thesamemanneras describedin theprevioussectionwillincurthecostof TLBmissesonat leastsomeof avoidallowingthisto addnoiseto themeasurements,we canresortto ensuringthateverymemoryaccessincurstheco stof a TLBmiss,by accessingeach of the128pages(512kBdividedby 4 kBper page)beforereturningto the rstpageandaccessingthesecondcache lineitcontains.(Anotheroptionwouldusea bu erof 16MB,placingeachpotentiallycachedlineint o a separatepage,butaccessingthelinesina suitableorderis justase ective.)SincetheTLBentrieshave tobe repeatedlyreloaded,however,we alsoexperiencesomeadditionalcachemisses, as thememoryholdingthepagingtableswillbe repeat-edlyreloadedinto , thiswillonlya ecta smallnumber of cachelines,leavingthevastmajority of thecachein introducedby thedesignof thePentium4as a \AdvancedTransferCache"includesacapabili ty forhardwareprefetching:If a seriesof cachemissesoccur,in arithmeticprogression,withina singlepage,thenthecachewill4In fact,thanksto thepseudo-LRU algorithmforcachelinereplacement, onememoryaccessby theTrojanprocesscausesfourcachemissesby MBpagesareused.


Related search queries