Transcription of Lecture 13: SRAM
1 Introduction toCMOS VLSID esignLecture 13: SRAMD avid HarrisHarvey Mudd CollegeSpring 200413: SRAMS lide 2 CMOS VLSI DesignOutlineqMemory ArraysqSRAM Architecture SRAM Cell Decoders Column Circuitry Multiple PortsqSerial access Memories13: SRAMS lide 3 CMOS VLSI DesignMemory ArraysMemory ArraysRandom access MemorySerial access MemoryContent Addressable memory (CAM)Read/Write memory (RAM)(Volatile)Read Only memory (ROM)(Nonvolatile) static RAM(SRAM)Dynamic RAM(DRAM)Shift RegistersQueuesFirst InFirst Out(FIFO)Last InFirst Out(LIFO)Serial InParallel Out(SIPO)Parallel InSerial Out(PISO)Mask ROMP rogrammableROM(PROM)ErasableProgrammable ROM(EPROM)ElectricallyErasableProgrammab leROM(EEPROM)Flash ROM13: SRAMS lide 4 CMOS VLSI DesignArray Architectureq2nwordsof 2mbitseachqIf n >> m, fold by 2kinto fewer rowsof more columnsqGood regularity easy to designqVery high density if good cells are usedrow decodercolumndecodernn-kk2m bitscolumncircuitrybitline conditioningmemory cells:2n-k rows x2m+k columnsbitlineswordlines13: SRAMS lide 5 CMOS VLSI Design12T SRAM CellqBasic building block: SRAM Cell Holds one bit of information, like a latch Must be read and writtenq12-transistor (12T) SRAM cell Use a simple latch connected to bitlinebitwritewrite_breadread_b13.
2 SRAMS lide 6 CMOS VLSI Design6T SRAM CellqCell size accounts for most of array size Reduce cell size at expense of complexityq6T SRAM Cell Used in most commercial chips Data stored in cross-coupled invertersqRead: Precharge bit, bit_b Raise wordlineqWrite: Drive data onto bit, bit_b Raise wordlinebitbit_bword13: SRAMS lide 7 CMOS VLSI DesignSRAM ReadqPrecharge both bitlines highqThen turn on wordlineqOne of the two bitlines will be pulled down by the cellqEx: A = 0, A_b = 1 bit discharges, bit_b stays high But A bumps up slightlyqRead stability A must not (ps)wordbitAA_bbit_b13: SRAMS lide 8 CMOS VLSI DesignSRAM ReadqPrecharge both bitlines highqThen turn on wordlineqOne of the two bitlines will be pulled down by the cellqEx: A = 0, A_b = 1 bit discharges, bit_b stays high But A bumps up slightlyqRead stability A must not flip N1 >> (ps)wordbitAA_bbit_b13: SRAMS lide 9 CMOS VLSI DesignSRAM WriteqDrive one bitline high, the other lowqThen turn on wordlineqBitlines overpower cell with new valueqEx: A = 0, A_b = 1, bit = 1, bit_b = 0 Force A_b low, then A rises highqWritability Must overpower feedback invertertime (ps) : SRAMS lide 10 CMOS VLSI DesignSRAM WriteqDrive one bitline high, the other lowqThen turn on wordlineqBitlines overpower cell with new valueqEx.
3 A = 0, A_b = 1, bit = 1, bit_b = 0 Force A_b low, then A rises highqWritability Must overpower feedback inverter N2 >> P1time (ps) : SRAMS lide 11 CMOS VLSI DesignSRAM SizingqHigh bitlines must not overpower inverters during readsqBut low bitlines must write new value into cellbitbit_b medAweakstrongmedA_bword13: SRAMS lide 12 CMOS VLSI DesignSRAM Column ExampleReadWriteHHSRAM Cellword_q1bit_v1fbit_b_v1fout_v1rout_b_ v1r 1 2word_q1bit_v1fout_v1r 2 MoreCellsBitline Conditioning 2 MoreCellsSRAM Cellword_q1bit_v1fbit_b_v1fdata_s1write_ q1 Bitline Conditioning13: SRAMS lide 13 CMOS VLSI DesignSRAM LayoutqCell size is criticalqTile cells sharing VDD, GND, bitline contactsVDDGNDGNDBITBIT_BWORDCell boundary13: SRAMS lide 14 CMOS VLSI DesignDecodersqn:2ndecoder consists of 2nn-input AND gates One needed for each row of memory Build AND from NAND or NOR gatesStatic CMOSP seudo-nMOSword0word1word2word3A0A1A1word A0111/224816wordA0A1111148word0word1word 2word3A0A113: SRAMS lide 15 CMOS VLSI DesignDecoder LayoutqDecoders must be pitch-matched to SRAM cell Requires very skinny gatesGNDVDD wordbuffer inverterNAND gateA0A0A1A2A3A2A3A113: SRAMS lide 16 CMOS VLSI DesignLarge DecodersqFor n > 4, NAND gates become slow Break large gates into multiple smaller gatesword0word1word2word3word15A0A1A2A31 3.
4 SRAMS lide 17 CMOS VLSI DesignPredecodingqMany of these gates are redundant Factor out commongates into predecoder Saves area Same path effortA0A1A2A3word1word2word3word15word0 1 of 4 hotpredecoded linespredecoders13: SRAMS lide 18 CMOS VLSI DesignColumn CircuitryqSome circuitry is required for each column Bitline conditioning Sense amplifiers Column multiplexing13: SRAMS lide 19 CMOS VLSI DesignBitline ConditioningqPrecharge bitlines high before readsqEqualize bitlines to minimize voltage difference when using sense amplifiers bitbit_b bitbit_b13: SRAMS lide 20 CMOS VLSI DesignSense AmplifiersqBitlines have many cells attached Ex: 32-kbit SRAM has 256 rows x 128 cols 128 cells on each bitlineqtpd (C/I) V Even with shared diffusion contacts, 64C of diffusion capacitance (big C) Discharged slowly through small transistors (small I)qSense amplifiersare triggered on small voltage swing (reduce V)13: SRAMS lide 21 CMOS VLSI DesignDifferential Pair AmpqDifferential pair requires no clockqBut always dissipates static powerbitbit_bsense_bsenseN1N2N3P1P213: SRAMS lide 22 CMOS VLSI DesignClocked Sense AmpqClocked sense amp saves powerqRequires sense_clk after enough bitline swingqIsolation transistors cut off large bitline capacitancebit_bbitsensesense_bsense_clk isolationtransistorsregenerativefeedback 13.
5 SRAMS lide 23 CMOS VLSI DesignTwisted BitlinesqSense amplifiers also amplify noise Coupling noise is severe in modern processes Try to couple equally onto bit and bit_b Done by twistingbitlinesb0b0_bb1b1_bb2b2_bb3b3_b 13: SRAMS lide 24 CMOS VLSI DesignColumn MultiplexingqRecall that array may be folded for good aspect ratioqEx: 2 kwordx 16 folded into 256 rows x 128 columns Must select 16 output bits from the 128 columns Requires 16 8:1 column multiplexers13: SRAMS lide 25 CMOS VLSI DesignTree Decoder MuxqColumn mux can use pass transistors Use nMOS only, precharge outputsqOne design is to use k series transistors for 2k:1 mux No external decoder logic neededB0B1B2B3B4B5B6B7B0B1B2B3B4B5B6B7A0 A0A1A1A2A2 YYto sense amps and write circuits13: SRAMS lide 26 CMOS VLSI DesignSingle Pass-Gate MuxqOr eliminate series transistors with separate decoderA0A1B0B1B2B3Y13: SRAMS lide 27 CMOS VLSI DesignEx: 2-way Muxed SRAMMoreCellsword_q1write0_q1 2 MoreCellsA0A0 2data_v1write1_q113: SRAMS lide 28 CMOS VLSI DesignMultiple PortsqWe have considered single-ported SRAM One read or one write on each cycleqMultiportedSRAM are needed for register filesqExamples.
6 MulticycleMIPS must read two sources or write a result on some cycles Pipelined MIPS must read two sources and write a third result each cycle Superscalar MIPS must read and write many sources and results each cycle13: SRAMS lide 29 CMOS VLSI DesignDual-Ported SRAMqSimple dual-ported SRAM Two independent single-ended reads Or one differential writeqDo two reads and one write by time multiplexing Read during ph1, write during ph2bitbit_bwordBwordA13: SRAMS lide 30 CMOS VLSI DesignMulti-Ported SRAMqAdding more access transistors hurts read stabilityqMultiported SRAM isolates reads from state nodeqSingle-ended design minimizes number of bitlinesbAwordBwordAwordDwordCwordFwordE wordGbBbCwritecircuitsreadcircuitsbDbEbF bG13: SRAMS lide 31 CMOS VLSI DesignSerial access MemoriesqSerial access memories do not use an address Shift Registers Tapped Delay Lines Serial In Parallel Out (SIPO) Parallel In Serial Out (PISO) Queues (FIFO, LIFO)13: SRAMS lide 32 CMOS VLSI DesignShift RegisterqShift registersstore and delay dataqSimple design: cascade of registers Watch your hold times!
7 ClkDinDout813: SRAMS lide 33 CMOS VLSI DesignDenser Shift RegistersqFlip-flops aren t very area-efficientqFor large shift registers, keep data in SRAM insteadqMove read/write pointers to RAM rather than data Initialize read address to first entry, write to last Increment address on each : SRAMS lide 34 CMOS VLSI DesignTapped Delay LineqA tapped delay lineis a shift register with a programmable number of stagesqSet number of stages with delay controls to mux Ex: 0 63 stages of delaySR32clkDindelay5SR16delay4SR8delay3 SR4delay2SR2delay1SR1delay0 Dout13: SRAMS lide 35 CMOS VLSI DesignSerial In Parallel Outq1-bit shift register reads in serial data After N steps, presents N-bit parallel outputclkP0P1P2P3 Sin13: SRAMS lide 36 CMOS VLSI DesignParallel In Serial OutqLoad all N bits in parallel when shift = 0 Then shift one bit out per cycleclkshift/loadP0P1P2P3 Sout13: SRAMS lide 37 CMOS VLSI DesignQueuesqQueuesallow data to be read and written at different and write each use their own clock, dataqQueue indicates whether it is full or emptyqBuild with SRAM and read/write counters (pointers)QueueWriteClkWriteDataFULLRead ClkReadDataEMPTY13.
8 SRAMS lide 38 CMOS VLSI DesignFIFO, LIFO QueuesqFirst In First Out(FIFO) Initialize read and write pointers to first element Queue is EMPTY On write, increment write pointer If write almost catches read, Queue is FULL On read, increment read pointerqLast In First Out(LIFO) Also called a stack Use a single stack pointerfor read and writ