Transcription of Understanding The Linux Virtual Memory Manager
1 Understanding TheLinux Virtual Memory ManagerMelGormanJuly9,2007 PrefaceLinuxisdevelop ,itiscommontorequestco (VM) ,Linuxdo esfollowthetraditionaldevelopmentcycleof designtoimplementationbutitismorecommonf orchangestob emadeinreactiontohowthesystemb ehavedinthe real-world andintuitivedecisionsbydevelop erformswellinpracticebutthereisverylittl eVMsp eci cdo cumentationavailableexceptforafewincompl eteoverviewsinasmallnumb erofwebsites,exceptthewebsitecontaininga nearlierdraftofthisb o okofcourse!Thishasleadtothesituationwher etheVMisfullyundersto o donlybyasmallnumb erofcoredevelop erslo deandstudythe o ok, ductionofwhattoexp ,thetheoryitisisbasedonwillalsob eintro eamemorymanagementtheoryb o ,theapp endixincludesadetailedco decommen-taryonasigni cantp erorresearcherneedstoinvestinunderstandi ngwhatishapp depatternsevenb etweenma , edecipherableinanumb ,acorekernelsubsystem,workswill ndanswerstomanyoftheirquestionsinthisb o ,morethananyothersubsystem,iPrefaceiia ectstheoverallp erformanceoftheop o orlyundersto o dandbadlydo cumentedsubsysteminLinux,partiallyb ecausethereis,quiteliterally, culttoisolateandunderstandindividualpart softheco dewithout rsthavingastrongconceptualmo delofthewholeVM.
2 Sothisb o okintendstogiveadetaileddescriptionofwha ttoexp ectwithoutb eofprimeinteresttonewdevelop ene tothersubsystemdevelop erswhowanttogetthemostfromtheVMwhentheyi nteractwithitandop eratingsystemsresearcherslo okingfordetailsonhowmemorymanagementisim plementedinamo dernop ,whoarejustcurioustolearnmoreab outasubsystemthatisthefo cusofsomuchdiscussion,theywill ndaneasytoreaddescriptionoftheVMfunction alitythatcoversallthedetailswithoutthene edtoploughthroughsourceco ,itisassumedthatthereaderhasreadatleasto negeneralop eratingsystemb o okoronegeneralLinuxkernelorientatedb o okandhasageneralknowl-edgeofCb eforetacklingthisb o ortismadetomakethematerialapproachable,s omepriorknowledgeofgeneralop okOverviewInchapter1,wegointodetailonhow thesourceco demayb olswillb eintro ducedthatareusedfortheanalysis,easybrows ingandmanagementofco olsaretheLinuxCrossReferencing(LXR)to olwhichallowssourceco detob ebrowsedasawebpageandCodeVizforgener-ati ngcallgraphswhichwasdevelop edwhileresearchingthisb o ol, etimeconsumingandtheuseofversioncontrols oftwaresuchasCVS( )orBitKeep er( )
3 Ol,asimplesp eci cation ledetermineswhatsourcetouse,whatpatchest oapplyandwhatkernelcon ,eachpartoftheLinuxVMimplementationwillb ediscussedindetail,suchashowmemoryisdesc rib edinanarchitectureindep endentmanner,howpro cessesmanagetheirmemory,howthesp eci callo ersthatdescrib eclosesttheb ehaviourofLinuxaswellascoveringindepthth eimplementation,thefunctionsusedandtheir callgraphssothereaderwillhaveaclearviewo fhowtheco ,therewillb ea What'sNew sectionwhichintro duceswhattoexp endicesareaco decommentaryofasigni cantp ereasonablyconsistent,evenb etweenma ' , anymonthnow whichmeansDecemb ,inmostways, , epitytoignorethemsotoaddressthis,henceth e What' ,thesesectionspresumeyouhavereadtheresto ftheb o oksoonlyglanceatthemduringthe de,thebasicdescriptionofwhattoexp ectfromthe WhatsNew cantlyb jecttochangethough,youshouldstilltreatth e What'sNew sectionsasguidelinesratherthande o okwhichisintendedtob ;root@joshua:/$ mount /dev/cdrom /cdrom -o ( )hasb eenbuiltandcon g-uredtorunbutitrequirestheCDb emountedon/cdrom/.
4 Tostartit,runthescript/ ,theoutputshouldlo oklike:mel@joshua:~$ /cdrom/start_serverStarting CodeViz Server: doneStarting Apache Server: doneThe URL to access is http://localhost:10080/Iftheserverstarts successfully,p ointyourbrowsertohttp:// : Awebserverstartedisavailablewhichisstart edby/ ,theURLtoaccessishttp:// o dy; Thewholeb o okisincludedinHTML,PDFandplaintextformat sfrom/ esnothaveacommentary,thebrowserwillb eautomaticallyredirectedtoLXR; Generatecallgraphswithanonlineversionoft heCo deVizto ol. TheVMRegress,Co deVizandpatchsetpackageswhicharediscusse dinChapter1areavailablein/ ,runthescript/cdrom/stop_serverandtheCDm aythenb ographicConventionsTheconventionsusedint hisdo , eldnames,compiletimede outa eldinastructure,b oththestructureand eldnamewillb eincludedlikepage leshaveanglebracketsaroundthemlike< >andmayb o o okwasresearchedanddevelop edintheop enanditwouldb eremissofmenottomentionsomeofthep eoplewhohelp ,Iap ,IwouldliketothankJohnO'Gormanwhotragica llypassedawaywhilethematerialforthisb o okwasb erienceandguidancethatlargelyinspiredthe formatandqualityofthisb o , ortunitytopublishthisb o eingarewardingexp erienceanditmadetrawlingthroughalltheco ,onthepublishersfront,IwouldliketothankB rucePerensforallowingmetopublishunderthe BrucePeren'sOp enBo okSeries( ).
5 Withthetechnicalresearch,anumb erofp , eredtohelpmeifIfeltIevergotlostinthetwis tymazeofkernelco erofsystemsfromnon-contiguousmemoryallo cation,topagereplacementp ,thechiefb ehindKernelTra c, ,aspartoftheEquinoxPro ject, ,PatrickHealywascrucialtoensuringthatthi sb o okwasconsistentandapproachabletop eoplewhoarefamiliar,butnotexp erts, erofp eoplehelp edwithsmallertechnicalissuesandgeneralin consisten-cieswherematerialwasnotcovered insu ,ParagSharma,MatthewDobson,RogerLuethi, ernetpartsofthedo cumentwhichensuredto erofqueriesandcorrectionstoeveryasp ectoftheb o oveandb eyondb einghelpfulsendingover90separatecorrecti onsandqueriesonvariousasp nished,ArisSotirop oulossentalargenumb erson,whosenameIcannotrememb erbutisaneditorforamagazinesentmeover140 correctionsagainstanearlyversiontothedo , eoplesentafewcorrections,thoughsmall, ,AmitShah,AdrianStanciu,AndyIsaacson,Jea nFrancoisMartinez,GlenKaukola,WolfgangOe rtl,MichaelBab co ck,KirkTrue, ,therewereninep eoplewhohelp othsentmeanumb erofbugrep ortsandhelp edensureitworkedwithavarietyofdi White,fromtheOSDL labsensuredthatVMRegresswouldhaveawidera pplicationthanmyowntestb ,alsoasso ciatedwiththeOSDL labswasresp onsibleforup ertCahalansentalltheinformationIneededto makeitfunctionagainstlaterpro ,AndrewMorton,RikvanRielandScottKaplanal lprovidedinsightonwhatdirectiontheto olshouldb edevelop edtob eb ,PaulRolland,MohamedGhouse,SamuelChess-m an,ErsinEr,MarkHoy,MichaelMartin,MartinG allwey,RaviParimi,DanielCo dt,AdnanSha ,XiongQuanren,DaveAirlie,DerHerrHofrat,I daHallgren,ManuAnand,EugeneTeo, , ,Iwouldliketothankafewp eoplewithoutwhom, eenearningenoughmoneytosupp ,whopatientlylistenedtorants,techbabble, angstingovertheb o okandmadesureIwasthep ersonwiththeb erio dicallyandkeptmerelativelysane,including Darenwhoisco ,Iwouldliketothankthethousandsofhackerst hathavecontributedtoGNU.
6 TheLinuxkernelPrefaceviandotherFreeSoftw arepro jectsovertheyearswhowithoutIwouldnothave anexcellentsystemtowriteab rststartedprogrammingonmyownPC6yearsagoa fter nally ' okasideBu er(TLB).. ' ' otMemoryAllo otMemoryAllo otMemoryAllo ' (GFP) ' ' (kswap d).. ' deOp ' ' ckingandUnlo otMemoryAllo otMemoryAllo otMemoryAllo cationHelp deOp Commentary ContentsxiiList of des, :setup_memory().. :free_area_init().. :paging_init().. :sys_mmap2().. :get_unmapped_area().. :insert_vm_struct().. :sys_mremap().. :move_vma().. :move_page_tables().. :sys_mlock().. :do_munmap().. :do_page_fault().. () :handle_mm_fault().. :do_no_page().. :do_swap_page().. :do_wp_page().. :alloc_bootmem().. :mem_init().. :alloc_pages().. :__free_pages().. :vmalloc().. etweenvmalloc(),alloc_page() :vfree().. :kmem_cache_create().
7 :kmem_cache_reap().. :kmem_cache_shrink().. :__kmem_cache_shrink().. :kmem_cache_destroy().. :kmem_cache_grow().. :kmem_slab_destroy().. :kmem_cache_alloc().. :kmem_cache_free().. :kmalloc().. :kfree().. :kmap().. :kunmap().. :create_bounce().. :bounce_end_io_read/write().. :generic_file_read().. :add_to_page_cache().. :shrink_caches().. :swap_out().. :kswapd().. :get_swap_page().. :add_to_swap_cache().. :read_swap_cache_async().. :sys_writepage().. :init_tmpfs().. :shmem_create().. :shmem_nopage().. :shmem_zero_setup().. :sys_shmget().. :out_of_memory().. :mmput().. :free_bootmem().. :enable_all_cpucaches()..490 List of ,SettingandClearingpage okasideBu okasideBu erFlushAPI(cont).. otMemoryAllo otMemoryAllo ectingZoneAllo ectingAllo catorb ectingAllo cessFlagsA ectingAllo catorb 1 IntroductionLinuxisarelativelynewop eratingsystemthathasb eguntoenjoyalotofattentionfromthebusines s, eratingsystemmatures,itsfeatureset,capab ilitiesandp erformancegrowbutso,outofnecessitydo deinbytesandlinesofco esnotincludethemachinedep endentco deoranyofthebu ermanagementco deanddo esnotevenpretendtob , , er16th, , , :KernelsizeasanindicatorofcomplexityAsis thehabitofop ensourcedevelop ersingeneral,newdevelop ersask-ingquestionsaresometimestoldtoref erdirectlytothesourcewiththe p o-lite acronymRTFS1orelsearereferredtothekernel newbiesmailinglist( ).
8 WiththeLinuxVirtualMemory(VM) Manager ,thi susedtob easuitableresp onseasthetimerequiredtounderstandtheVMco uldb emeasuredinweeksandtheb o oksavailabledevotedenoughtimetothememory managementchapterstomaketherelativelysma llamountofco o oksthatdescrib etheop eratingsystemsuchasUnderstandingtheLinux Kernel[BC00][BC03],tendtocovertheentirek ernelratherthanonetopicwiththenotableexc eptionofdevicedrivers[RC01].Theseb o oks, esn'treallystandforFlamingbuttherecouldb ,provideinvaluableinsightintokernelinter nalsbuttheymissthedetailswhicharesp eci ,itisdetailedinthisb o okwhyZONE_NORMAL isexactly896 MiBandexactlyhowp ectsoftheVM,suchastheb o otmemoryallo catorandthevirtualmemory lesystemwhicharenotofgeneralkernelintere starealsocoveredbythisb o ,togetacomprehensiveviewonhowthekernelfu nctions,oneisrequiredtoreadthroughthesou rceco o oktacklestheVMsp eci callysothatthisinvestmentoftimetounderst anditwillb o okwillb ecaughtbytheco ,therewillb eininformalintro ductiontothebasicsofacquiringinformation onanop ensourcepro jectandsomemetho dsformanaging,browsingandcomprehendingth eco ereadingtheactualsource, deisdecidingwheretostartandhowtoeasilyma nage,browseandgetanoverviewoftheoverallc o ,p eoplewillprovidesomesuggestionsonhowtopr o ceedbutacomprehensivemetho dologyisrarelyo.
9 Someusefulrulesofthumbforop ensourceco decomprehensionwillb eintro ducedandsp eci callyonhowtheymayb gurationandBuildingWithanyop ensourcepro ject,the rststepistodownloadthesourceandreadthein stallationdo ,thesourcewillhaveaREADMEorINSTALL leatthetop-levelofthesourcetree[FF02].In fact,someautomatedbuildto olssuchasautomakerequiretheinstall leswillcontaininstructionsforcon guringandinstallingthepackageorwillgivea referencetowheremoreinformationmayb eshowthekernelmayb econ ,therequirementformanypro conf2toautomatetestingofthebuildenvironm entandautomake3tosimplifythecreationofMa kefilessobuildingisoftenassimpleas:mel@j oshua: project $ ./configure && make2 jects,suchastheLinuxkernel,usetheirownco n gurationto olsandsomelargepro jectssuchastheApachewebserverhavenumerou scon gurationoptionsbutusuallythecon gurescriptisthestartingp ,thecon gurationishandledbytheMakefilesandsupp ortingto gurationisto:mel@joshua: $ make configThisasksalongseriesofquestionsonwh attyp eofkernelshouldb eenanswered,compilingthekernelissimply:m el@joshua.
10 $ make bzImage && make modulesAcomprehensiveguideoncon guringandcompilingakernelisavailablewith theKernelHOWTO4andwillnotb ecoveredindetailwiththisb o ,wewillpresumeyouhaveonefullybuiltkernel anditistimetob egin enSourcepro jectswillusuallyhaveahomepage,esp eciallysincefreepro cumentationandinstructionsonhowtojointhe mailinglist, cumentationwillalwaysexist,evenifitisasm inimalasasimpleREADME le, jectisoldandreasonablylarge,thewebsitewi llprobablyfeatureaFrequentlyAskedQuestio ns(FAQ).Next,jointhedevelopmentmailingli standlurk,whichmeanstosubscrib etoamailinglistandreaditwithoutp ercommunicationfollowedby,toalesserexten t,InternetRelayChat(IRC)andonlinenewgrou ps, ,itisimp ortanttoreadatleastthepreviousmonthsarch ivestogetafeelforthedevelop ethe rstplacetosearchifyouhaveaquestionorquer yontheimplementationthatisnotcoveredbyav ailabledo ers,taketimetoresearchthequestionsandask itthe RightWay [RM01].