Abstract— The most recent approaches proposed for high-speed dynamic pipelines are applicable only to linear datapaths. However, real systems are dynamical in their datapaths, i.e. stages may have multiple inputs (“joins”) or multiple outputs (“forks”). This paper presents several new pipeline templates that extend existing high-speed approaches for linear dynamic logic pipelines, by providing efficient control structures that can accommodate forks and joins. In addition, constructs for conditional computation are also introduced. Timing analysis and SPICE simulations show that the performance overhead of these extensions is fairly low (5% to 20%).
I. INTRODUCTION High-speed dynamic design is increasingly becoming an attractive alternative to full-custom synchronous design because of its freedom from clock distribution and clock skew problems, and because it naturally provides robust interfaces to slower components. The ultra-high-speed designs have very aggressive timing assumptions that introduce stringent transistor sizing requirements and high demands on post-layout verification. In recent work, Singh and Nowick have proposed several high-speed dynamic logic pipeline templates , as well as high-speed static logic pipeline templates , that achieve comparable performance with much less stringent timing assumptions. ]. In addition, an initial approach to handling slow or stalled environments for the limited case of linear pipelines was also proposed in . However, the synchronization problems that arise when using arbitrary forks and joins are much more complex and challenging, and the approaches of  do not address these issues. This paper attempts to fill this void and builds upon one representative each of single-rail (LPSR2/2) and dual-rail (LP3/1) lookahead pipelines, and also upon the single-rail high-capacity pipeline (HC). The ideas presented here, however, can be easily adapted to the remaining styles. The remainder of this paper is organized as follows.
Section 2 gives background on single and dual-rail data paths, and reviews some of the basic linear pipelines of  and . Section 3 gives an overview of some of the challenges involved in the design of dynamic pipelines.
Sections 4-6 present the new dynamic designs in detail, including their protocols, implementation, and timing analysis. Extensions to handle conditional computation are proposed in Section 7 and, finally, experimental results and conclusions are given in Sections 8 and 9. 2. BACKGROUND
This section first gives background on commonly-used start-stop data representation schemes. Then, it reviews three: Start-stop pipelining styles: (i) LPSR2/2, a single-rail lookahead pipeline, (ii) LP3/1, a dual-rail lookahead pipeline, and (iii) HC, the high-capacity pipeline.
2.1 Bundled Data vs. Dual-Rail Encoding
One common paradigm of start-stop system design is to decompose the system into functional units that communicate data via channels, as shown in Fig 2.1(a). In these channels, data can be encoded in many ways. In the single-rail encoding scheme, one wire per bit is used to transmit data, and an associated request line is used to indicate data validity, as shown in Fig 2.1(b). The associated channel is called a bundled-data channel . Alternatively, in dual-rail encoding, the data is sent using two wires for each bit of information, as shown in Fig 2.1(c) . Extensions to 1-of-N and M-of-N encoding also exist. Both single-rail and dual-rail encoding schemes are commonly used, and there are tradeoffs between each. Dual-rail encoding allows for data validity to be indicated by the data itself. Single-rail, in contrast, requires the associated request line that is driven by a matched delay line