Wire-Streaming Processors on 2-D Nanowire ... - …

Wire-Streaming Processors on 2-D Nanowire FabricsTeng Wang, Mahmoud Ben-Naser, Yao Guo, Csaba Andras MoritzElectrical and Computer Engineering DepartmentUniversity of Massachusetts Amherst, MA, USA{twang, mbennase, yaoguo, of the research in the field of nanoelectronicshas been focused on nanodevices and fabrication as-pects and as a result a variety of nanodevice technolo-gies have been demonstrated. By contrast, very littlework has been reported on the design and evaluationof circuits and computational architectures using nano-devices. There is similarly not much work on the im-pact of device and fabric ( , the 2-D Nanowire ar-ray) properties on computing. In this paper, we focuson computing architectures based on silicon explore a simple stream processor developed on 2-D Nanowire fabrics and compare its density to a 30nmCMOS implementation. We also identify techniques towork around fabric-specific constraints. Our initial eval-uation shows that this stream processor has great den-sity advantage compared to CMOS : nanoscale circuits, architecture, NASIC,silicon nanowires, carbon nanotubes1 INTRODUCTIONN anotechnology is one of the most promising replace-ments for CMOS technology.}

Perhaps the most excitingnanodevices today for nanoscale integrated circuits aresemiconductor nanowires (NWs) and arrays of crossedcarbon nanotubes (CNTs). Researchers have alreadybuilt FETs and diodes out of NWs [4] [6] and CNTs [8].While there are many practical challenges still remain-ing, it seems that it will soon be possible to build regularnanoarrays from uniform-length CNTs or NWs [7]. Bycontrast, we have seen very little work on the design andevaluation of circuits and computational architecturesusing nanodevices. Similarly, little work on the impactof device and fabric properties on computing has beenmade so previous work [1] [3] has addressed some of thechallenges and technical constraints when building com-puting systems on 2-D Nanowire fabrics. These chal-lenges include: The 2-D regular NW array, where doping is fixedin each direction, significantly impacts the densityof circuits due to the diagonal problem, when thelogic is cascaded, only the diagonal portion of thenanotile is utilized [1].

Having 2-level logic instead of multi-level on 2-Dnanoarrays reduces the density further. Control circuits and bypass networks are difficultto implement. This is because building feedbackpaths on 2-D nanoarray is challenging. Similarly, it is hard to design high-density sequen-tial circuits using traditional MOS-like approaches,because they require feedback this paper, we present a complete design of a sim-ple stream processor developed on 2-D Nanowire fab-rics, called WISP-0. WISP-0 is the first version of ourWire-Streaming Processors (WISP). In WISP, in or-der to preserve the density advantages of nanodevices,data are streamed through the fabric with minimal con-trol/feedback paths. Intermediate values produced dur-ing processing are often stored on the nanowires withoutrequiring explicit also compare the density of a fully implementedCMOS design and the NW-based WISP-0. Our evalu-ation shows that it is possible to preserve the densityadvantages of nanodevices in WISP despite the OVERVIEW OF NASICSThe WISP architecture proposed here is a key partof our effort to build NASICs:Nanoscale Application-Specific Integrated Circuits[1] [3].

NASICs are basedon extensive research on understanding emerging deviceand fabrication constraints. A NASIC design has a hi-erarchical and tiled architecture and it is optimized todeal with various manufacturing, fabric and device con-straints. Next, We briefly describe the key componentsof NASICs. More details can be found in our previouspapers [1] [3]. NanotilesNanotiles are the basic building blocks of 1 shows a typical nanotile. The orthogonallycrossed nanowires form a nanoarray. The junctions canbe programmed as FETs or detached [7]. NanoarraysF V F V D D E E F F 3 XOOXS $UUD\3 XOOGRZQ $UUD\S QDQRZLUHQ QDQRZLUH9GG*QGFigure 1: A nanotile for a 1-bit full adder. The thickerwires are MWs and the thinner wires are NWs. NWs indifferent directions have different doping surrounded by microwires (MWs). Each signal isexpressed in its both original and complementary provide signals and power supply. Pull-up/downarrays act as the interface between MWs and NWs.

Toprogram each NW junction, the number of MWs mustbe at least logarithmic to the number of NWs [7]. Dynamic Nanotiles and PipelineOne of the most common components in any proces-sor design is the datapath. Registers or latches are re-quired to pipeline the data flow. Due to topologicaland doping constraints, latch circuits are however verydifficult to implement on nanoarrays. In NASICs, weuse a new dynamic circuit style to achieve temporarystorage in stead of using explicit flip-flops. A pipelinedNASIC circuit can be built by cascading dynamic nano-tiles without explicit latching of signals. Use of latchcircuits would have affected density considerably due tothe diagonal problem [1] [3]. Interconnect and Multi-tile DesignsA nanoscale processor may have thousands of nano-tiles. Achieving efficient communication between nano-tiles is a critical design aspect. In NASICs, local com-munication between adjacent nanotiles is provided byNWs for area efficiency.

MWs are used for global com-munications WISP-0 WISP-0 is an initial version of our wire -streamingprocessors. It exercises many of the design strategiesand optimizations in NASICs. WISP-0 contains a 2-bitdatapath and a 3-bit opcode. Figure 2 shows the overalllayout of WISP-0. In this figure, each box surroundedby dashed lines represents a nanotile. All adjacent nano-tiles are connected by a set of NWs. These nanotiles areRFALU addrinsopcodeoprandsresult47929 PCROMDECp-nanowiresn-nanowiresnanoarrayF igure 2: The floorplan of WISP-0 ProgramCounterIns RomIns DecoderRegisterFileALU47996 Figure 3: The schematic of WISP-0all designed in dynamic style and are cascaded together on the wire to form a 5-stage pipeline: fetch, decode,register file, execute and write back. No explicit latchesare used. Figure 3 is the schematic of implements the program counter. Itgenerates a 4-bit address for each the in-struction ROM. It fetches one instruction according tothe address fromPC.

The instruction goes toDEC(de-coder) and is decoded into opcode and operands. Next,the opcode and operands enterRF(register file) stageand read the value of operands from the registers. Theinstruction will be executed inALUand the result willbe written back to the register logics and bypass networks are difficult toimplement on 2-D fabrics. Carefully selected ISA min-imizes these circuits. Currently WISP-0 supports thefollowing instructions:nop,mov,movi, fields in these instructions have fixed lengths (likeRISC) in order to simplify the design of the decoderand to limited space, we only include three key blocks:PC,RFandALU. MWs are not shown in the followingfigures to improve the readability. For simplicity, thepull-up/down networks are also 4 is the layout and schematic of the programcounter. This block implements a 4-bit accumulator ina nanotile. It has two components: a 4-bit incrementeraddr3addr2addr1addr0>+addr1re gFigure 4: The layout and schematic of the programcounter in WISP-0.

Black dots represent p-FET andwhite dots represent n-FET.(bottom half in the layout) and a 4-bit latch (top half inthe layout). In each cycle, the output of the incrementeris delayed and fed back to the input. The address istherefore increased by one in each design of the register file is shown in Figure 5 andFigure 6 is its schematic. In this block, data are storedon the 16 horizontal NWs at the bottom. They areselected by the 2-bit 4-to-1 multiplexer (2-bit MUX41in the figure) data (operanda) read out byoperand1is sent di-rectly to ALU . The data read out byoperand2is sentto another multiplexer (2-bit MUX21). This data andoperand2are selected byopcodeto reason is that some instructions ( ,movi) will usethe immediate data provided by the instruction insteadof the values from registers. At the same time,opcodeanddest(destination register address) are pipelined to ALU . If ALU needs to write results back to theregister file, the data and control signals will enter fromthe top right corner of the tile and update the values onthe bottom 16 horizontal 7 shows the layout of ALU in WISP-0.

Thisblock executes the instructions and generates the re-sult (resultin the figure) and control signals (rf3 0).The top part (2-4 decoder) is a 2-4 decoder. It selectsthe register to be written back according to the destina-tion address (dest). The bottom part (adder/multiplier)represents an arithmetic unit. It calculates the summa-tion or product (decided byopcode) DENSITY EVALUATIONThe key advantage of nanoscale devices is their den-sity. However, we have found that without new cir-cuit and architecture approaches, this density advantagecould be lost due to manufacturing, fabric and deviceconstraints when building nanoscale systems [1] [3]. Inopcode destrf2rf1rf0rf3 ELW UHJILOH opcode ELW 08; ELW 08; operand1operand2destoperandaoperandbresu ltFigure 5: Register file in WISP-0. It has four ~0resultoperand1reg0reg1reg2reg3operanda operandbopcodeFigure 6: The schematic of register ~02-4 decoderadder/multiplier2-4decoderadder/m ultiplieropcodeopcodeoperandaoperandbrf3 ~0resultdestFigure 7: The layout and schematic of the ALU +4log2Y+4 MicrowirePull up/downpre/eva nanowireFigure 8: Area breakdown of a typical nanotile.

This fig-ure shows the minimum size of pull-up/down networksand section, we make an initial density evaluation let us analyze a typical nanotile in NASIC. Fig-ure 8 shows the area breakdown. Assuming that the sizeof a nanoarray isX Y, we need at least 2log2 Yverticaland 2log2 Xhorizontal NWs as pull-up/down networksplus 2log2X+4 vertical and 2log2Y+4 horizontal MWs(4 MWs as power supply and ground). The total areabecomes: [X+2log2Y+s (log2X+4)] [Y+2log2X+s (log2Y+ 4)].In this expression,sis the pitch ratio of MWs toNWs. We assume that the pitch of NWs is 10nm ands=9. This is a relatively conservative assumption for therapidly improving nanodevice technology. We use thisexpression to calculate all blocks in WISP-0. The totalarea is m2. Among this area, nanoarrays take m2while the overhead of MWs dominates. Thearea efficiency (Areananoarray/Areatotal) is only larger nanotiles, however, this efficiency can beimproved significantly as will be shown compare with CMOS technology, we implementeda CMOS prototype in verilog .

Wire-Streaming Processors on 2-D Nanowire ... - …

Tags:

Information

Transcription of Wire-Streaming Processors on 2-D Nanowire ... - …

Related search queries

Wire-Streaming Processors on 2-D Nanowire ... - …

Tags:

Information

Documents from same domain

Related documents

Related search queries