Example: barber

3. The microarchitecture of Intel, AMD, and VIA CPUs

3. The microarchitecture of intel , AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers By Agner Fog. Technical University of Denmark. Copyright 1996 - 2018. Last updated 2018-08-08. Contents 1 Introduction .. 6 About this manual .. 6 Microprocessor versions covered by this manual .. 7 2 Out-of-order execution (All processors except P1, PMMX) .. 9 Instructions are split into ops .. 9 Register renaming .. 10 3 Branch prediction (all processors) .. 12 Prediction methods for conditional jumps.

3. The microarchitecture of Intel, AMD, and VIA CPUs An optimization guide for assembly programmers and compiler makers By Agner Fog. Technical University of Denmark.

Tags:

  Guide, Microarchitecture, Intel, The microarchitecture of intel

Information

Domain:

Source:

Link to this page:

Please notify us if you found a problem with this document:

Other abuse

Transcription of 3. The microarchitecture of Intel, AMD, and VIA CPUs

1 3. The microarchitecture of intel , AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers By Agner Fog. Technical University of Denmark. Copyright 1996 - 2018. Last updated 2018-08-08. Contents 1 Introduction .. 6 About this manual .. 6 Microprocessor versions covered by this manual .. 7 2 Out-of-order execution (All processors except P1, PMMX) .. 9 Instructions are split into ops .. 9 Register renaming .. 10 3 Branch prediction (all processors) .. 12 Prediction methods for conditional jumps.

2 12 Branch prediction in P1 .. 18 Branch prediction in PMMX, PPro, P2, and P3 .. 21 Branch prediction in P4 and P4E .. 23 Branch prediction in PM and Core2 .. 25 Branch prediction in intel Nehalem .. 27 Branch prediction in intel Sandy Bridge and Ivy Bridge .. 28 Branch prediction in intel Haswell, Broadwell and Skylake .. 29 Branch prediction in intel Atom, Silvermont, Goldmont and Knights Landing .. 29 Branch prediction in VIA Nano .. 30 Branch prediction in AMD K8 and K10 .. 31 Branch prediction in AMD Bulldozer, Piledriver, Steamroller and 34 Branch prediction in AMD Ryzen.

3 34 Branch prediction in AMD Bobcat and Jaguar .. 35 Indirect jumps on older processors .. 35 Returns (all processors except P1) .. 36 Static prediction .. 36 Close jumps .. 37 4 Pentium 1 and Pentium MMX pipeline .. 39 Pairing integer instructions .. 39 Address generation interlock .. 43 Splitting complex instructions into simpler ones .. 43 Prefixes .. 44 Scheduling floating point code .. 45 5 Pentium 4 (NetBurst) pipeline .. 48 Data cache .. 48 Trace cache .. 48 Instruction decoding .. 53 Execution units.

4 54 Do the floating point and MMX units run at half speed? .. 57 Transfer of data between execution units .. 59 Retirement .. 62 Partial registers and partial flags .. 62 Store forwarding stalls .. 63 Memory intermediates in dependency chains .. 63 Breaking dependency chains .. 65 Choosing the optimal instructions .. 65 2 Bottlenecks in P4 and P4E .. 68 6 Pentium Pro, II and III 71 The pipeline in PPro, P2 and P3 .. 71 Instruction fetch .. 71 Instruction decoding .. 72 Register renaming .. 76 ROB read .. 76 Out of order execution.

5 80 Retirement .. 81 Partial register stalls .. 82 Store forwarding stalls .. 85 Bottlenecks in PPro, P2, P3 .. 86 7 Pentium M pipeline .. 88 The pipeline in PM .. 88 The pipeline in Core Solo and Duo .. 89 Instruction fetch .. 89 Instruction decoding .. 89 Loop buffer .. 91 Micro-op fusion .. 91 Stack engine .. 93 Register renaming .. 95 Register read stalls .. 95 Execution units .. 97 Execution units that are connected to both port 0 and 1 .. 97 Retirement .. 99 Partial register access .. 99 Store forwarding stalls.

6 101 Bottlenecks in PM .. 101 8 Core 2 and Nehalem pipeline .. 104 Pipeline .. 104 Instruction fetch and predecoding .. 104 Instruction decoding .. 107 Micro-op fusion .. 107 Macro-op fusion .. 108 Stack engine .. 109 Register renaming .. 110 Register read stalls .. 110 Execution units .. 111 Retirement .. 115 Partial register access .. 115 Store forwarding stalls .. 117 Cache and memory access .. 118 Breaking dependency chains .. 119 Multithreading in Nehalem .. 119 Bottlenecks in Core2 and Nehalem.

7 120 9 Sandy Bridge and Ivy Bridge pipeline .. 122 Pipeline .. 122 Instruction fetch and decoding .. 122 op cache .. 123 Loopback buffer .. 125 Micro-op fusion .. 125 Macro-op fusion .. 125 Stack engine .. 126 Register allocation and renaming .. 127 Register read stalls .. 128 Execution units .. 128 Partial register access .. 132 Transitions between VEX and non-VEX modes .. 132 3 Cache and memory access .. 133 Store forwarding stalls .. 134 Multithreading .. 134 Bottlenecks in Sandy Bridge and Ivy Bridge.

8 135 10 Haswell and Broadwell pipeline .. 137 Pipeline .. 137 Instruction fetch and decoding .. 137 op cache .. 137 Loopback buffer .. 138 Micro-op fusion .. 138 Macro-op fusion .. 138 Stack engine .. 139 Register allocation and renaming .. 139 Execution units .. 140 Partial register access .. 143 Cache and memory access .. 144 Store forwarding stalls .. 145 Multithreading .. 146 Bottlenecks in Haswell and Broadwell .. 146 11 Skylake pipeline .. 149 Pipeline .. 149 Instruction fetch and decoding .. 149 op cache.

9 149 Loopback buffer .. 150 Micro-op fusion .. 150 Macro-op fusion .. 150 Stack engine .. 151 Register allocation and renaming .. 151 Execution units .. 152 Transitions between VEX and non-VEX modes .. 155 Partial register access .. 156 Cache and memory access .. 156 Store forwarding stalls .. 157 Multithreading .. 158 Bottlenecks in Skylake .. 158 12 intel Atom pipeline .. 160 Instruction fetch .. 160 Instruction decoding .. 160 Execution units .. 160 Instruction 161 X87 floating point instructions.

10 162 Instruction latencies .. 162 Memory access .. 163 Branches and loops .. 164 Multithreading .. 164 Bottlenecks in Atom .. 165 13 intel Silvermont pipeline .. 165 Pipeline .. 166 Instruction fetch and decoding .. 166 Loop buffer .. 167 Macro-op fusion .. 167 Register allocation and out of order execution .. 167 Special cases of 167 Execution units .. 167 Partial register access .. 168 Cache and memory access .. 168 Store 169 Multithreading .. 169 4 Bottlenecks in Silvermont .. 169 14 intel Goldmont pipeline.


Related search queries