## SELF-TIMED SIMULTANEOUS BIDIRECTIONAL SIGNALLING FOR IC SYSTEMS G. Y. Yacoub and W. H. Ku Department of ECE Mail Code R-007 University of California, San Diego La Jolla, CA 92093 Abstract - This paper presents a self-timed architectural interface design for VLSI CMOS integrated circuit systems. The interface follows a 2-cycle self-timed bundled data protocol and nearly doubles the I/O bandwidth for wide bus VLSI interprocessor communication by simultaneously sending and receiving data over the same wires. A constant overhead of 4 control wires is incurred over the synchronous approach in [1] as well as a decaying area penalty for increasing bus widths. This type of self-timed bundled interface can prove attractive for applications where global synchronization is difficult to achieve. The interface has been simulated for a 16-bit wide bus using a quasi-analog mixed-signal Verilog approach. ## 1. INTRODUCTION VLSI self-timed interprocessor communication is becoming more important as isochronous regions shrink with feature line miniaturization [2]. Several interesting applications can be found in extendable array processing [3] while other thorough presentations of the theory and concepts of self-timed techniques can be found in [4] [5] [6]. The type of self-timed I/O interface analyzed in this paper assumes the same signal levels as those presented for the synchronous interface by Lam, Dennison, and Dally [1]. Specifically, each communication path is treated as a properly terminated transmission line between two chips and carries superposed digital data encoded into three different voltage levels namely, 2v, 2.5v, and 3v. The transmitter at each end of the tri-valued line retains a local copy of the sent signal for use by the local receiver in reconstructing the signal transmitted from the far end. This local copy of the digital signal is impedance matched to achieve a low of 2.25v and a high of 2.75v and goes to a local sense amplifier for subtraction from the tri-valued signal received from the far end thus, reconstructing the received digital signal. An event-driven scheme which generates a targeted set of sense amplifier control clocks will be described. This scheme can be tailored to generate a variety of different control clocks. The specific sense amplifier control pulses to be described are edge bundled to each other (rising edge of each pulse triggers the next pulse). These control pulses are initiated from the request signals of the 2-cycle bundled data convention which the event-driven registers follow. A bank of these event-driven registers resides within each of the transmitter and receiver blocks of the interface shown in Figure 1. The event-driven registers are connected in a ring configuration with Muller C-elements to form two stage FIFO queues similar to those in [7]. # 2. ARCHITECTURE Figure 2 depicts the self-timed 2-cycle bundled data protocol utilized by two processors communicating simultaneously via a common data bus where each wire carries superposed digital waveforms traveling in opposite directions. For each transaction the sender transmits new data followed by a transition on the Request line. Each transaction is initiated by a transition on the Acknowledge line and assures data stability when both the Request and Acknowledge controls are in opposite phases. Figure 1: Processors 1 and 2 communicating via interconnect paths modeled by transmission lines with characteristic impedances of $Z_o$ and propagation delays of $Z_o$ and $Z_o$ . Figure 2: 2-cycle bundled data convention occurring in a cyclic sequence. During ① the sender is actively toggling the data bus to its new value. Then ② the sender transmits a transition to indicate that the new data is valid and will remain stable until receiving ③ an acknowledge transition from the receiver, after which the transaction cycle repeats itself. #### 2.1 Overall Behavioral Design FIFO Interface Net – Inets [8] are a restricted class of Petri nets useful in specifying the precise behavior of asynchronous interfaces. Their graphical nature allows precise specifications of sequences of events. Inets must be bounded, safe, and live. A safe Inet is bounded if the number of tokens at any place is always less than or equal to 1. It is live if there is always at least one enabled transition. We show in Figure 3a the Inet for Sutherland's two stage FIFO as a basis upon which we will proceed to construct the self–timed interface. The regular–expression which compactly describes the trace set for the Inet of Figure 3a is $$\left(R_{in}, R_{out}, A_{in}\right)^* \tag{1}$$ where transition Aout is concurrent with Ain Sense Amplifier Control Inet – The Inet for the required sense amplifier control clocks must be initiated when both Request signals from the two FIFOs transition to the same phase indicating that the data bus is holding valid and stable signals. At this time the sense amplifiers must sample the data. This should be repeated indefinitely and can be described by the Inet in Figure 3b whose regular expression is $(M,I)^*$ (2) Figure 3: (a) Inet describing the sequence of control events for the FIFO queue which follows the 2-cycle bundled data protocol. (b) Inet describing the sense amplifier control sequence of events. Self-Timed Simultaneous Bidirectional Inet – The control Inet of the complete self-timed simultaneous bidirectional interface is depicted in Figure 4. This Inet was designed to capture the sequencing of events for the interface and to aid in giving additional insight to the bundling constraints between the data bus and the request signal. The regular-expression for the west-to-east FIFO which depicts the execution of the west-side sense amplifier control can be written as $$(R_{iwe}, R_{ew}, M_w, I_w)^*$$ (3) while the regular-expression for the east-to-west FIFO which depicts the execution of the east-side sense amplifier control can be written as $$\left(R_{iew}, R_{we}, M_e, I_e\right)^* \tag{4}$$ ### 2.2 Block Architecture The two main building blocks are the receiver and the transmitter. Within the receiver resides the sense amplifier control circuitry which will be described. The biasing and reference circuitry necessary for impedance matching has been studied in [1] and is applicable to the circuit-level implementation of this self-timed architecture. **Transmitter Block** – This block consists of an n-bit wide event-driven register controlled by a Muller C-element similar to those in [7] and [8] whose outputs go to current drivers where the impedance matching occurs. Figure 5 depicts all other components in the block. Data is accepted from the left side which can be a synchronous interface having a stretchable clock as described in [4]. Whenever $H_L$ is high, then the synchronous logic will hold the data stable. However, the synchronous logic must supply a completion signal after each new data packet is sent along $D_{T1}$ . The current drivers and active termination devices serve to match the transmission lines labeled a,b, and r. If the delay times are $\tau_0$ and $\tau_h$ for the data and request signals respectively, then the condition that $\tau_0 \leq \tau_h$ must hold (bundling constraint). **Receiver Block** – The receiver block contains n sense amplifiers which decode the n-bit signal received from the other end as shown in Figure 6. The digital outputs of these latched amplifiers, $\nu_n(t-\tau_0)$ , are stored in the event-driven registers. Figure 4: Inet for the complete self-timed simultaneous bidirectional interface. The top FIFO sends data from processor 2 to processor 1 (i.e., east-to-west). In doing so, place 10 receives a token but transition $R_{ew}$ is enabled only if place 8 receives a token. Transition $R_{ew}$ fires after place 8 receives its token. This causes the sense amplifier control on the west-side to execute its sequence of control pulses which allow the proper operation of the sense amplifier. In a similar, manner the bottom FIFO will cause the the sense amplifier control circuitry located on the east-side to execute its sequence of control pulses. Figure 5: Transmitter block showing the internal components. The truth table for the Muller C-element with one input inverted is given. Digital data is accepted from the left side as described in the pseudo-code, then encoded into two copies with different voltage levels. One copy is sent to another processor while the other is retained and sent to a local sense amplifier residing within the receiver block. Figure 6: Receiver block showing its internal components. The sense amplifiers reconstruct the tri-valued superposed signals available on bus 'a' to obtain the final digital data transmitted from the processor at the far end. This data is then stored in the n-bit event driven register and is available for access from the left-side according to the described pseudo-code. Sense Amplifier Control - A feedback control circuit consisting of an exclusive OR gate, two pulse generators, and a toggle circuit generates the required sense amplifier pulses. The sense amplifier circuit type can vary widely in performance specifications to meet specific trade-offs between power, area, noise margins, speed, and process variation effects. Figure 7 depicts the control architecture where the control signal along the M, R, and L lines feed through the sense amplifiers to supply the required bundled pulses for data sampling. The constraint relationship for the feedback loop can be expressed as $$t_{\rm M} \le t_{\rm REQ}$$ (5) where $$t_M = t_{\Delta 1} + t_{\Delta 2} + t_{\varepsilon}$$ and $t_{\varepsilon} = t_{xor} + t_{toggle}$ (6) Figure 7: Sense amplifier control generator depicted along with the sequence of bundled pulses necessary to sample the data inputs. #### 2.3 Area Estimates A first order area estimate of the described self-timed simultaneous bidirectional interface was performed for a 16-bit wide I/O bus. The building block components used for the sense amplifier control and FIFOs are similar to those in [8] while the other components are based on [1]. Table 1 lists the transistor count for a bidirectional bit for each of the synchronous and self-timed interfaces. | Cell name | Synchronous | Self-Timed | |----------------|-------------|------------| | icelement | | 64 | | merge | | 52 | | flipflop | 80 | 128 | | comparator | 56 | 56 | | analog buffers | 24 | 24 | | reference | 70 | 70 | | toggle | | 75 | | pulse Gen | | 64 | Table 1: Number of transistors per bidirectional bit It can be observed that the self-timed control area overhead decays as the number of bits, n, increases. Table 2 formulates a relationship for the Area Penalty, AP, as a function of the number of interface I/O bits, n. | Constant control overhead: | 64 + 52 + 75 + 64 = 255 | |-----------------------------------|-------------------------| | Common transistors: | 80 + 56 + 24 + 70 = 230 | | Incremental increase in flipflop: | 128-80 = 48 | | Area (self-timed): | (230+48)n+255 | | Area (synchronous): | 230n | Table 2: Comparison of transistor counts for each of the synchronous and the self-timed interfaces for a single bit. 'n' is the number of bits. The area penalty incurred by the self-timed interface can be expressed as $$AP = \frac{278n + 255}{220n} \tag{7}$$ and in the limit, approaches a lower bound given by $$AP_{bound} = \lim_{n \to \infty} \frac{278n + 255}{230n} = 1.21 \tag{8}$$ Relationship (7) is plotted in Figure 8. Figure 8: Area penalty versus the number of I/O bits, n. #### 3. SIMULATION The described interface was simulated using a quasi-analog mixed-signal Verilog approach for a 16-bit wide I/O bus. An approach similar to [9] was followed where all the impedance matched lines were treated as digital vectors (vector-voltages) representing analog voltage values. Figure 9 summarizes the selected vector-voltage characteristics. The MSB is the sign bit and is 1 for positive voltage values. The LSB was chosen to achieve a better than 8 mv resolution when performing addition/subtraction or comparisons of the analog vector-voltage. **Figure 9:** A vector-voltage of 9-bits was selected to represent the tri-valued and bi-valued signals carried by the impedance matched lines. Figure 10 shows the four component types which processed the 9-bit vector-voltage. These components starting from top-down are, the tri-valued transmission line (2.0v,2.5v,3.0v), the sense amplifier (inputs a = 2.0v,2.5v,3.0v), be 2.25v,2.75v), analog buffer type A (o = 2.0v,2.5v,3.0v), and analog buffer type B (o = 2.25v,2.75v). An assignment of delays was made to each of the analog and digital components in order to study the behavior of the interface. The expected sequencing of events were fully validated for a 16-bit wide self-timed interface. Figure 10: The component types in Verilog which processed the 9-bit vector-voltage. #### 4. CONCLUSION An architectural study of a self-timed simultaneous bidirectional interface which follows the 2-cycle bundled data protocol has been completed. The interface's sequence of events was validated using quasi-analog mixed-signal Verilog simulations. A first order area penalty estimate, compared to a synchronous interface, was presented and shown to decay with increasing I/O bus width. Future work consists of circuit-level design and an investigation into the application of such an interface to multiprocessers. #### ACKNOWLEDGEMENTS We would like to thank Richard North for interesting discussions on high-speed digital design and Kay-Cheng Chew for discussions on asynchronous interfacing. #### REFERENCES - K. Lam, L. R. Dennison, and W. J. Dally, "Simultaneous Bidirectional Signalling for IC Systems," in *Proc. IEEE ICCD*, pp. 430-433, October 1990. - [2] T.-H. Meng, Synchronization Design for Digital Systems, Kluwer Academic Publishers, 1991. - [3] S. Y. Kung, VLSI Array Processors, Prentice Hall, New Jersey, 1988. - [4] D. M. Chapiro, "Globally-Asynchronous, Locally-Synchronous Systems," Ph.D. Dissertation, Computer Science Department, STAN-CS-1026, Stanford University, October 1984. - [5] G. M. Jacobs, "Self-Timed Integrated Circuits for Digital Signal Processing," Ph.D Dissertation, UCB/ERL Memorandum No. M89/128, University of California, Berkeley, November 1989. - [6] J. M. Johnson, "Theory and Application of Self-Timed Integrated Systems Using Ternary Logic Elements," Ph.D. Dissertation, Electrical and Computer Engineering, University of California, Santa Barbara, December 1988. - [7] I. E. Sutherland, "Micropipelines," in Communications of the ACM, Vol. 32, No. 6, pp. 720-738, June 1989. - [8] I. E. Sutherland, R. F. Sproull, and I. Jones, Standard Asynchronous Modules, *Technical Memo 4662*, Sutherland, Sproull and Associates, 1986. - [9] M. K. Mayes, "Analog-Verilog Quasi Mixed-Signal Simulation Model for A/D Converters," in *Proc. of Mixed-Signal and Analog Conference*, pp. 113–122, November 1991.