VLSI Circuits and Systems III

Front Matter: Volume 6590

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 6590, including the Title Page, Copyright information, Table of Contents, Introduction (if any), and the Conference Committee listing.

High-level power estimation for digital system

Yaseer A. Durrani, Ana Abril, Teresa Riesgo

Show abstract

In this paper, we present a high-level power macromodeling technique at register transfer level (RTL). The proposed methodology allows to estimate the power dissipation on digital systems composed of intellectual property (IP) macro-blocks by using the statistical knowledge of their primary inputs. During the power estimation procedure, the sequence of an input stream is generated by using input metrics. Then, a Monte Carlo zero delay simulation is performed and a power dissipation macromodel function is built from power dissipation results. From then on, this macromodel function can be used to estimate power dissipation of the system just by using the statistics of the IPs primary inputs. In our experiments with the test IP system, the average error is 29.63%.

Crosscoupling power optimal wire spacing in quasilinear runtime

Paul Zuber, Thomas Ilnseher, Walter Stechele

Show abstract

A new quasilinear algorithm for solving the crosscoupling power optimal wire spacing problem is developed. Contrasted to state of the art solutions, the proposed method not only guarantees optimality of the solution, but also achieves improvements of more than five orders of magnitude in runtime. In addition, the algorithm is modified to river-route the wire endings to their initial positions, allowing it to optimize the wire topology of entire detail-routed standard cell circuits. Extensive replicable experiments assess the effectivity of the methods for a wide range of real-world circuit examples of which the wire switching power is reduced locally by up to 50% and chip-wide by up to 8.3%.

Partitioning and characterization of high speed adder structures in deep-submicron technologies

Adrián Estrada, Gashaw Sassaw, Carlos J. Jiménez, et al.

Show abstract

The availability of higher performance (in area, time and power consumption) and greater precision binary adders is a constant requirement in digital systems. Consequently, the design and characterization of adders and, most of all, their adaptation to the requisites of present-day deep-submicron technologies, are today still issues of concern. The binary adder structures in deep-submicron technologies must be revised to achieve the best balance between the number of bits in the adder and its delay, area and power consumption. It is therefore very important to make an effort to carefully optimize adder structures, thus obtaining improvements in digital systems. This communication presents the optimization of adder structures for implementations in deep-submicron technologies through their partitioning into blocks. This partitioning consists of dividing the number of input bits to the adder into several subsets of bits that will constitute the inputs to several adder structures of the same or of different types. The structures used to accomplish this study range from the more traditional types, such as the carry look ahead adder, the ripple carry adder or the carry select adder, to more innovative kinds, like the parallel prefix adders of the type proposed by Brent-Kung, Han-Carlson, Kogge-Stone or Ladner-Fischer. The analyses carried out allow the characterization of structures implemented in deep-submicron technologies for area, delay and power consumption parameters.

Power-driven FPGA to ASIC conversion

WenHai Fang, Lambert Spaanenburg

Show abstract

Gate arrays are often presented as a convenient means for ASIC prototyping. Obviously, they can both perform the same function and therefore be built from the same behavioral description. Design development implies a process of subsequent parameter bindings, leaving steadily less freedom for the remaining implementation choices. On the other hand, the ASIC offers more place & route freedom than the gate array. Hence it is commonly suggested that an optimal prototype will always have an acceptable ASIC realization. But this does not make the gate array an easy stepping-stone in ASIC development. Differences in platform technology induce a different structural sugaring to achieve a reasonable implementation. This cannot easily be ported, unless the implementation is developed while keeping the restrictions for the other technology in mind. Such implies a number of scaling rules to be the foundation of the design transformation process. This paper looks into the platform commonalities of Field-Programmable Gate-arrays and standard-cell ASICs from fundamental physical principles. These basic considerations are then related to show how the area and speed restrictions in the logic synthesis can be applied to carry power efficient designs efficiently from prototype to realization. This is illustrated in the design of the SNOW-2 encryption core, where a consistent 38% power reduction is achieved.

Automatic logic synthesis for parallel alternating latches clocking schemes

D. Guerrero, M. Bellido, J. Juan, et al.

Show abstract

This paper proposes a VHDL coding technique that allows for the automatic synthesis of digital circuits using the so called Parallel Alternating Latches Clocking Schemes (PALACS). The proposed method greatly improves the applicability of PALACS and its benefits. This technique is verified through design examples in three different CMOS processes and using logic level simulation, with successful results in all the cases.

Effects of buffer insertion on the average/peak power ratio in CMOS VLSI digital circuits

Antonio J. Acosta, José M. Mora, Javier Castro, et al.

Show abstract

The buffer insertion has been a mechanism widely used to increase the performances of advanced VLSI digital circuits and systems. The driver or repeater used to this purpose has effect on the timing characteristics on the signal on the wire, as propagation delay, signal integrity, transition time, among others. The power concerns related to buffering have also received much attention, because of the low power requirements of modern integrated systems. In the same way, the buffer insertion has strong impact on the reliability of synchronous systems, since the suited distribution of clock requires reduced or controlled clock-skew, being the buffer and wire sizing, a crucial aspect. In a different way, buffer insertion has been also used to reduce noise generation, especially in heavily loaded nets, since the inclusion of buffers help to desynchronize signal transitions. However, the inclusion of buffers of inverters to improve one or more of these characteristics have often negative effect on another parameters, as it happens in the average and peak of supply current. Mainly, the inclusion of a buffer to reduce noise (peak power), via desynchronizing transitions, could introduce more dynamic consumption, but reducing the short-circuit current because of the increment of signal slope. Thus, the average/peak current optimization can be considered a design trade-off. In this paper, the mechanism to obtain an average/peak power optimization procedure are presented. Selected examples show the feasibility of minimizing switching noise with negligible impact on average power consumption.

HEAPAN: a high-level computer architecture analysis tool

Dionisio D. Peñalosa, Carlos J. Jiménez, Manolo Valencia

Show abstract

The non-stop advance of computer architectures and their wide variety, not only in types (pipeline, super pipeline, etc.) but also in application fields, as well as their high cost from conception to the implementation, make it necessary to have tools that, on the one hand, help to design and to evaluate processors comfortably, and, on the other, link up with commercial design flows of microelectronics circuits (in FPGA technologies or VLSI Deep-submicron). This work presents HEAPAN, a tool for the high level design and evaluation of processors. The main idea of this tool is to be able to describe a processor at high level easily and cheaply in order to compare different architectural options. Behavioural verifications are done at RT level, with descriptions automatically generated by HEAPAN. For the development of HEAPAN, a study of the most important distinctive features of major recent commercial processors has been carried out, and the most relevant blocks that make up these architectures have been extracted. These blocks have been implemented as functional units of the tool. In this way the construction of a processor with HEAPAN basically consists of selecting and interconnecting those blocks. Finally the validity of the developed tool has been tested through the design of a simple processor, verifying its behaviour and implementing it in a deep-submicron technology.

MPEG-4 ASP SoC receiver with novel image enhancement techniques for DAB networks

D. Barreto, A. Quintana, L. García, et al.

Show abstract

This paper presents a system for real-time video reception in low-power mobile devices using Digital Audio Broadcast (DAB) technology for transmission. A demo receiver terminal is designed into a FPGA platform using the Advanced Simple Profile (ASP) MPEG-4 standard for video decoding. In order to keep the demanding DAB requirements, the bandwidth of the encoded sequence must be drastically reduced. In this sense, prior to the MPEG-4 coding stage, a pre-processing stage is performed. It is firstly composed by a segmentation phase according to motion and texture based on the Principal Component Analysis (PCA) of the input video sequence, and secondly by a down-sampling phase, which depends on the segmentation results. As a result of the segmentation task, a set of texture and motion maps are obtained. These motion and texture maps are also included into the bit-stream as user data side-information and are therefore known to the receiver. For all bit-rates, the whole encoder/decoder system proposed in this paper exhibits higher image visual quality than the alternative encoding/decoding method, assuming equal image sizes. A complete analysis of both techniques has also been performed to provide the optimum motion and texture maps for the global system, which has been finally validated for a variety of video sequences. Additionally, an optimal HW/SW partition for the MPEG-4 decoder has been studied and implemented over a Programmable Logic Device with an embedded ARM9 processor. Simulation results show that a throughput of 15 QCIF frames per second can be achieved with low area and low power implementation.

Toward the implementation of a baseline H.264/AVC decoder onto a reconfigurable architecture

S. López, A. Kanstein, J. F. López, et al.

Show abstract

The decoding of a H.264/AVC bitstream represents a complex and time-consuming task. Due to this reason, efficient implementations in terms of performance and flexibility are mandatory for real time applications. In this sense, the mapping of the motion compensation and deblocking filtering stages onto a coarse-grained reconfigurable architecture named ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) is presented in this paper. The results obtained show a considerable reduction in the number of cycles and memory accesses needed to perform the motion compensation as well as an increase in the degree of parallelism when compared with an implementation on a Very Long Instruction Word (VLIW) dedicated processor.

Accelerating a MPEG-4 video decoder through custom software/hardware co-design

Jorge L. Díaz, Dacil Barreto, Luz García, et al.

Show abstract

In this paper we present a novel methodology to accelerate an MPEG-4 video decoder using software/hardware co-design for wireless DAB/DMB networks. Software support includes the services provided by the embedded kernel &mgr;C/OS-II, and the application tasks mapped to software. Hardware support includes several custom co-processors and a communication architecture with bridges to the main system bus and with a dual port SRAM. Synchronization among tasks is achieved at two levels, by a hardware protocol and by kernel level scheduling services. Our reference application is an MPEG-4 video decoder composed of several software functions and written using a special C++ library named CASSE. Profiling and space exploration techniques were used previously over the Advanced Simple Profile (ASP) MPEG-4 decoder to determinate the best HW/SW partition developed here. This research is part of the ARTEMI project and its main goal is the establishment of methodologies for the design of real-time complex digital systems using Programmable Logic Devices with embedded microprocessors as target technology and the design of multimedia systems for broadcasting networks as reference application.

Optimizing coarse-grain reconfigurable hardware utilization through multiprocessing: an H.264/AVC decoder example

Andreas Kanstein, Sebastian López Suárez, Bjorn De Sutter

Show abstract

Coarse-grained reconfigurable architectures offer high execution acceleration for code which has high instruction-level parallelism (ILP), typically for large kernels in DSP applications. However for applications with a larger part of control code and many smaller kernels, as present in modern video compression algorithms, the achievable acceleration through ILP is significantly reduced. We introduce a multi-processing extension to the coarse-grained reconfigurable architecture ADRES (Architecture for Dynamically Reconfigurable Embedded Systems) to deal with this kind of applications, by enabling it to exploit thread-level parallelism (TLP). This extension consists of a partitioning of an ADRES array into non-overlapping parts, where every partition can execute a processing thread independently, or a processing thread can be assigned to hierarchically combined partitions which provide a larger number of resources. Because the combining of partitions can be changed dynamically, this extension provides more flexibility than a multi-core approach. This paper discusses the architecture and an exploration into how to potentially partition a given array for executing an H.264/AVC baseline decoder.

Low-voltage low-power reference circuits for an autonomous robot: I-SWARM

J. Colomer, A. Saiz-Vela, P. Miribel-Català, et al.

Show abstract

In this paper it is presented the design of the power supply system for an autonomous robot of few mm³ called I-SWARM (Intelligent Small World Autonomous Robots for Micro-manipulation) which is based on the design of a low-dropout regulator (LDO), and a bandgap reference circuit (BG), that has been designed for the LDO. The paper presents the design, stability issues and full Montecarlo studies about the performances of the BG circuit and the LDO regulator, for different temperature and supply conditions. The regulator has been developed to supply the required voltage for the electronics involved in the robot to be tested in a near future. The regulator is based on a low-dropout linear regulator (LDO). The architecture of the BG is based on a peaking current mirror circuit with MOSFET transistors, working in the sub-threshold region. This architecture is very interesting because it presents a good trade-off between performances, area and power dissipation. These circuits have been designed in a 0.13 &mgr;m technology from ST Microelectronics through the CMP-TIMA service.

Low-voltage CMOS variable preamplifier for fiber-based gigabit ethernet

J. M. García del Pozo, S. Celma, C. Aldea, et al.

Show abstract

In this paper we present a low-voltage preamplifier destined for optical-fiber communication front-ends in the standard Gigabit Ethernet. Designed in a low-cost 0.35 μm CMOS technology, the circuit can work with a single 1.8 V supply voltage, consumes only 6.2 mW and exhibits a tunable transimpedance from 50 to 65 dBΩ with bit rates up to 1.5 Gb/s.

Design of clock recovery circuits for optical clocking in DSM CMOS

Charles Thangaraj, Kevin Stephenson, Tom Chen, et al.

Show abstract

CMOS technology scaling especially in the sub-100 nm regime has made signaling in long global a challenge, resulting in a need for an improved interconnect technology. Optical signalling is a promising alternative to existing global interconnects and alleviates interconnect bottle-neck. This paper presents a design of a CMOS trans-impedance amplifier (TIA) that is intended for a truly CMOS compatible on-chip optical clock distribution system. This TIA employs replica biasing technique to achieve stability while maximizing its bandwidth and gain. The design was implemented in a 0.35μm CMOS process and is currently under probe testing. The simulation results show that the design achieved a bandwidth of 1GHz and gain of 128dB-Ω. Extensive Monte-Carlo simulations indicate the superior characteristics of stability under a variety of process and environmental variations.

A study of mismatch in adaptive programmable CMOS sensor compensation circuits

G. Zatorre, N. Medrano, M. T. Sanz, et al.

Show abstract

This paper presents a study of mismatch effects in a digitally programmable analogue processor designed for small embedded applications. Circuit programmability allows for its adaptation to deviations in circuit operation or environmental effects. Starting from circuit simulation data, the system-level operation is modeled, showing its robustness to circuit mismatch. Simulation results of the proposed processor applied to compensate the response of a sinusoidal sensor and its robustness to mismatch are presented.

Ultra low power switched current finite impulse response filter banks realized in CMOS 0.18 um technology

Rafał Długosz

Show abstract

Ultra low power circuits are in high demand in many applications especially in wireless sensor networks (WSN), where energy is scavenged from environment. WSN systems contain different blocks, such as: sensors, filters, analog-to-digital converters, very often a simple processor and the RF front end block. This paper concerns ultra low power finite impulse response (FIR) filters and filter banks implemented in a switched current (SI) technique. In this paper new SI FIR filter structures and filter banks have been proposed. These circuits operate in the current mode and do not use operational amplifiers, what enables very low power dissipation on the level of several μW. Proposed filters incorporate transistors working under threshold level for the voltage supply that is in the range 0.5 - 0.7 V. The simulated attenuation in the stopband of the frequency response is limited to about 45 dB, what is due to different nonidealities, but such value is usually sufficient in WSN applications. The SI technique features many interesting mechanisms that simplify realization of analog filter banks. The signal samples that are stored in the delay lane are in SI filters copied to the filter coefficients using current mirrors. As a result, there exists the possibility to connect many sets of filter coefficients to a single delay line without the speed limitation. Ultra low power operation of proposed filters is also possible due to a special structure of the clock generator that only consists from switches and NOT gates.

IP-based design reuse for analog systems

Timothée Levi, Jean Tomas, Noëlle Lewis, et al.

Show abstract

The design flow of Analog and Mixed Signal has to be improved. In a specific application, we propose a definition of the IP content and the structure of an IP-based library. The case study consists in the neuron-level integration of a complete system that emulates spiking neural networks. As it is often the case, the development of the analog part of the system requires the largest amount of time, due to the lack of formalism and automation in that domain. One solution to accelerate the analog design cycle is to re-use already designed blocks and accumulated design knowledge, which could be illustrated by the IP (Intellectual Property) concept. Indeed, an experience of about ten years and 19 designed ASICs allow now to have an accurate idea of the system hierarchy and the recurrent analog blocks, which is the basis of IP-based design. We will describe the IP-based library which has been developed for that specific application domain and show how it can be used to accelerate the design cycle of the next ASIC generation.

A fully integrated folded mixer in CMOS 0.35 µm technology for 802.11a WIFI applications

J. del Pino, R. Díaz, M. Afonso, et al.

Show abstract

In the last years, Wireless market has shown an incredible growth, exceeding expectations. This paper presents a fully integrated folded mixer in a BiCMOS 0.35 μm technology for the 5 GHz band, according to the IEEE 802.11a WIFI standard. To make possible a comparison, two designs are presented: a folded mixer, and a classical Gilbert cell. In both designs all passives devices are on chip, including integrated inductors which have been designed by electromagnetic simulations. This work demonstrates the improvement in gain and linearity of a folded mixer comparing to a classical Gilbert topology, at expense of a little increase in power consumption. This implies that, unlike the Gilbert mixer, in a low voltage application, the folded topology would present still good performance.

Flexible and low power binary-tree current mode min/max nonlinear filters realized in CMOS technology

R. Długosz, T. Talaśka

Show abstract

In this paper we present current mode, programmable, binary tree MIN/MAX filters designed for nonlinear data processing. Proposed circuits can be used in image filtration, to realize operations such as erosion or dilatation that are useful in noise reduction or correction of objects in the images. Two kinds of filters are proposed. The first one has been designed for 1-dimensional (1-D) signal processing. Samples of the input signal are being stored in the circular analog delay line. Each sample remains on its fixed position in the delay line as long as is overwritten by the new sample after number of clock phases that is equal to the filter order N. As a result, only one analog delay element is updated with every new signal sample. This minimizes both the power dissipation and errors that in other types of filter structures are associated with data rewriting. The 2-D filters proposed in this paper are the natural extension of 1-D filters. These filters have been realized as universal 2-D structures, which can be easily reprogrammed to perform various nonlinear operations. The experimental 2-D image processor with 64 inputs (8x8 cluster) has been designed in CMOS 0.18μm technology and successfully tested in HSPICE simulations. Designed circuit enables parallel calculation of 64 pixels with the rate that is equal to 500 thousands image frames per second, dissipating power about 20 μW. Resultant data rate is therefore equal to 32 MSamples/s and energy consumed per one calculated pixel is about 1 pJ.

Architectural design for a low cost FPGA-based traffic signal detection system in vehicles

Ignacio López, Rubén Salvador, Jaime Alarcón, et al.

Show abstract

In this paper we propose an architecture for an embedded traffic signal detection system. Development of Advanced Driver Assistance Systems (ADAS) is one of the major trends of research in automotion nowadays. Examples of past and ongoing projects in the field are CHAMELEON ("Pre-Crash Application all around the vehicle" IST 1999-10108), PREVENT (Preventive and Active Safety Applications, FP6-507075, http://www.prevent-ip.org/) and AVRT in the US (Advanced Vision-Radar Threat Detection (AVRT): A Pre-Crash Detection and Active Safety System). It can be observed a major interest in systems for real-time analysis of complex driving scenarios, evaluating risk and anticipating collisions. The system will use a low cost CCD camera on the dashboard facing the road. The images will be processed by an Altera Cyclone family FPGA. The board does median and Sobel filtering of the incoming frames at PAL rate, and analyzes them for several categories of signals. The result is conveyed to the driver. The scarce resources provided by the hardware require an architecture developed for optimal use. The system will use a combination of neural networks and an adapted blackboard architecture. Several neural networks will be used in sequence for image analysis, by reconfiguring a single, generic hardware neural network in the FPGA. This generic network is optimized for speed, in order to admit several executions within the frame rate. The sequence will follow the execution cycle of the blackboard architecture. The global, blackboard architecture being developed and the hardware architecture for the generic, reconfigurable FPGA perceptron will be explained in this paper. The project is still at an early stage. However, some hardware implementation results are already available and will be offered in the paper.

Hand veins feature extraction using DT-CNNS

Suleyman Malki, Lambert Spaanenburg

Show abstract

As the identification process is based on the unique patterns of the users, biometrics technologies are expected to provide highly secure authentication systems. The existing systems using fingerprints or retina patterns are, however, very vulnerable. One's fingerprints are accessible as soon as the person touches a surface, while a high resolution camera easily captures the retina pattern. Thus, both patterns can easily be "stolen" and forged. Beside, technical considerations decrease the usability for these methods. Due to the direct contact with the finger, the sensor gets dirty, which decreases the authentication success ratio. Aligning the eye with a camera to capture the retina pattern gives uncomfortable feeling. On the other hand, vein patterns of either a palm of the hand or a single finger offer stable, unique and repeatable biometrics features. A fingerprint-based identification system using Cellular Neural Networks has already been proposed by Gao. His system covers all stages of a typical fingerprint verification procedure from Image Preprocessing to Feature Matching. This paper performs a critical review of the individual algorithmic steps. Notably, the operation of False Feature Elimination is applied only once instead of 3 times. Furthermore, the number of iterations is limited to 1 for all used templates. Hence, the computational need of the feedback contribution is removed. Consequently the computational effort is drastically reduced without a notable chance in quality. This allows a full integration of the detection mechanism. The system is prototyped on a Xilinx Virtex II Pro P30 FPGA.

Real-time lane detector hardware system

Pedro Cobos Arribas, Felipe Jiménez Alonso

Show abstract

This paper presents a design adapting the Kalman Filter to the vehicle system domain and Field Programmable Logic technology. The objective to which the system will be applied is detection of road lines from visual information, derived from a low cost monochrome camera with real time response requirements and good results for real scenarios (secondary roads, rain, damaged or occluded road lines..). The sections will describe how the original algorithm is mapped to a real time hardware vision system, that includes a low-cost FPGA processing system and a camera, for vehicle applications. The paper will also illustrate how the needed tasks have been implemented on the FPGA, with the logical architectural restrictions. It mentions also the ways in which overall performance will be increased.

FPGA realization of a split radix FFT processor

Jesús García, Juan A. Michell, Gustavo Ruiz, et al.

Show abstract

Applications based on Fast Fourier Transform (FFT) such as signal and image processing require high computational power, plus the ability to choose the algorithm and architecture to implement it. This paper explains the realization of a Split Radix FFT (SRFFT) processor based on a pipeline architecture reported before by the same authors. This architecture has as basic building blocks a Complex Butterfly and a Delay Commutator. The main advantages of this architecture are: * To combine the higher parallelism of the 4r-FFTs and the possibility of processing sequences having length of any power of two. * The simultaneous operation of multipliers and adder-subtracters implicit in the SRFFT, which leads to faster operation at the same degree of pipeline. The implementation has been made on a Field Programmable Gate Array (FPGA) as a way of obtaining high performance at economical price and a short time of realization. The Delay Commutator has been designed to be customized for even and odd SRFFT computation levels. It can be used with segmented arithmetic of any level of pipeline in order to speed up the operating frequency. The processor has been simulated up to 350 MHz, with an EP2S15F672C3 Altera Stratix II as a target device, for a transform length of 256 complex points.

Exploring system interconnection architectures with VIPACES: from direct connections to NoCs

Armando Sánchez-Peña, Pedro P. Carballo, Antonio Núñez

Show abstract

This paper presents a simple environment for the verification of AMBA 3 AXI systems in Verification IP (VIP) production called VIPACES (Verification Interface Primitives for the development of AXI Compliant Elements and Systems). These primitives are presented as a not compiled library written in SystemC where interfaces are the core of the library. The definition of interfaces instead of generic modules let the user construct custom modules improving the resources spent during the verification phase as well as easily adapting his modules to the AMBA 3 AXI protocol. This topic is the main discussion in the VIPACES library. The paper focuses on comparing and contrasting the main interconnection schemes for AMBA 3 AXI as modeled by VIPACES. For assessing these results we propose a validation scenario with a particular architecture belonging to the domain of MPEG4 video decoding, which is compound by an AXI bus connecting an IDCT and other processing resources.

Automatic synthesis of zero-aliasing space compactors with application to testing of embedded IP cores

José M. Solana, Javier Frechoso

Show abstract

This paper presents a set of software tools for the synthesis of structure-independent single-output space compactors with application to combinational or scan-based digital circuits. The synthesized compactor compresses test responses of a circuit under test (CUT) to a periodic single-output data stream with guaranteed zero-aliasing. The compactor is designed using the knowledge of the expected fault-free responses of the circuit, being particularly suitable for intellectual property (IP) cores whose internal structure is frequently unknown. The space-compactor compares the actual response of the circuit in all of its functional outputs with the expected hardware-generated responses. When the circuit is fault-free, the successive responses provoke an alternate sequence of high and low levels in the single-output of the compactor. This periodicity of the response is broken in presence of a fault. Using this compactor, only one output is required to check the response of the combinational logic of the circuit. Moreover, the characteristics of the output make the storing test responses unnecessary, thus reducing the amount of test data. The sole input required by the set of tools developed is the set of test patterns generated for the circuit and the fault-free expected responses. When the internal structure of the circuit is known, only the patterns must be provided. The tools generate as output a high-level synthesizable description in VHDL of the complete space compactor. External tools as the well-known espresso or sis have been used to minimize the amount of logic or the number of logic levels of the compactor.

Design automation techniques for high-resolution current folding and interpolating CMOS A/D converters

D. Gevaert

Show abstract

The design and testing of a 12-bit Analog-to-Digital (A/D) converter, in current mode, arranged in an 8-bit LSB and a 4- bit MSB architecture together with the integration of specialized test building blocks on chip allows the set up of a design automation technique for current folding and interpolation CMOS A/D converter architectures. The presented design methodology focuses on the automation for CMOS A/D building blocks in a flexible target current folding and interpolating architecture for a downscaling technology and for different quality specifications. The comprehensive understanding of all sources of mismatching in the crucial building blocks and the use of physical based mismatch modeling in the prediction of mismatch errors, more adequate and realistic sizing of all transistors will result in an overall area reduction of the A/D converter. In this design the folding degree is 16, the number of folders is 64 and the interpolation level is 4. The number of folders is reduced by creating intermediate folding signals with a 4-level interpolator based on current division techniques. Current comparators detect the zero-crossing between the differential folder output currents. The outputs of the comparators deliver a cyclic thermometer code. The digital synthesis part for decoding and error correction building blocks is a standardized digital standard cell design. The basic building blocks in the target architecture were designed in 0.35μ CMOS technology; they are suitable for topological reuse and are in an automated way downscaled into a 0.18μ CMOS technology.

Toward systematic design of multi-standard converters

V. J. Rivas, R. Castro-López, A. Morgado, et al.

Show abstract

In the last few years, we are witnessing the convergence of more and more communication capabilities into a single terminal. A basic component of these communication transceivers is the multi-standard Analog-to-Digital-Converter (ADC). Many systematic, partially automated approaches for the design of ADCs dealing with a single communication standard have been reported. However, most multi-standard converters reported in the literature follow an ad-hoc approach, which do not guarantee either an efficient occupation of silicon area or its power efficiency in the different standards. This paper aims at the core of this problem by formulating a systematic design approach based on the following key elements: (1) Definition of a set of metrics for reconfigurability: impact in area and power consumption, design complexity and performances; (2) Definition of the reconfiguration capabilities of the component blocks at different hierarchical levels, with assessment of the associated metrics; (3) Exploration of candidate architectures by using a combination of simulated annealing and evolutionary algorithms; (4) Improved top-down synthesis with bottom-up generated low-level design information. The systematic design methodology is illustrated via the design of a multi-standard &Sgr;&Dgr; modulator meeting the specifications of three wireless communication standards.

A methodology for switching noise estimation at gate level

Javier Castro, Pilar Parra, Antonio J. Acosta

Show abstract

This paper provides a simple methodology, based on available CAD tools, able of extracting valuable information on supply current curves, otherwise limited by the layout disposal, making it impracticable for the present high density circuits. The approach starts at HDL level, which will be automatically synthesized to a gate level being the peak power (one peak per clock cycle) measured at this level, giving an idea of the switching noise generated. Although an indirect method, it provides a quantitative value of noise valid for comparison between different proposals. To assess the methodology two different tools are used: PrimePower and NanoSim, both from Synopsys, that generate an average power and a peak power value. We will see that NanoSim is good for noise estimation but this is not the case of PrimePower.

Synchronous and asynchronous multiplexer circuits for medical imaging realized in CMOS 0.18um technology

R. Długosz, K. Iniewski

Show abstract

Multiplexers are one of the most important elements in readout front-end ASICs for multi-element detectors in medical imaging. The purpose of these ASICs is to detect signals appearing randomly in many channels and to collect the detected data in an ordered fashion (de-randomization) in order to send it to an external ADC. ASIC output stage functionality can be divided into two: pulse detection and multiplexing. The pulse detection block is responsible for detecting maximum values of signals arriving from the shaper, sending a flag signal indicating that the peak signal has been detected and storing the pulse in an analog memory until read by ADC. The multiplexer in turn is responsible for searching for active flags, controlling the channel that has detected the peak signal and performing reset functions after readout. There are several types of multiplexers proposed in this paper, which can be divided into several classes: synchronous, synchronized and asynchronous. Synchronous circuits require availability of the multiphase clock generator, which increases the power dissipation, but simultaneously provide very convenient mechanism that enables unambiguous choice of the active channel. This characteristics leads to 100% effectiveness in data processing and no data loss. Asynchronous multiplexers do not require clock generators and because of that have simpler structure, are faster and more power efficient, especially when data samples occur seldom at the ASIC's inputs. The main problem of the asynchronous solution is when data on two or more inputs occur almost at the same time, shorter than the multiplexer's reaction time. In this situation some data can be lost. In many applications loss of the order of 1% of the data is acceptable, which makes use of asynchronous multiplexers possible. For applications when the lower loss is desirable a new hierarchy mechanism has been introduced. One of proposed solutions is a synchronized binary tree structure, that uses many simple asynchronous clock generators. This circuit joins advantages of synchronous and asynchronous solutions resulting in low power dissipation, high speed of operation and 100% effectiveness.

Resizing methodology for CMOS analog circuits

Timothée Levi, Jean Tomas, Noëlle Lewis, et al.

Show abstract

This paper proposes a CMOS resizing methodology for analog circuits during a technology migration. The scaling rules aim to be easy to apply and are based on the simplest MOS transistor model. The principle is to transpose one circuit topology from one technology to another, while keeping the main figures of merit, and the issue is to quickly calculate the new transistor dimensions. Furthermore, when the target technology has smaller minimum length, we expect to obtain a decrease of area. This methodology is applied to both linear and non-linear examples: an OTA and a ring oscillator. The results are compared on three CMOS processes whose minimum length is 0.8 μm, 0.35 μm, 0.25 μm.

Low-cost VLSI architecture design for forward quantization of H.264/AVC

G. A. Ruiz, J. A. Michell

Show abstract

The H.264/AVC (Advanced Video Codec) is the latest standard for video coding. It assumes a scalar forward quantizer performed at the encoder which can be implemented directly in integer arithmetic. An efficient architecture for the computation of forward quantization of H.264/AVC is presented in this paper. It uses a modification of the quantization operation which reduces the arithmetic operations, and a truncated Booth multiplier based on adaptative statistical approach, which reduces the hardware. The JM reference software's C code has been re-written to analyze the effect of new algorithm and of truncated Booth multiplier. Simulations made up over popular test sequences used in video standardization show the validity of this approach. These results demonstrate that, at low QP, the PSNR is improved between a maximum of +0.81db and a minimum of 0.31db, with a slight increase in the Bit Rate being around 0.8%. Finally, a suitable architecture for VLSI implementation is presented, which reduces in a 26% the area, 32% the power and 21% the critical path delay in comparison with classical implementation. Moreover, it also reduces the area and increase the speed in comparison with architectures presented in references.

Multiformat decoder for a DSP-based IP set-top box

F. Pescador, M. J. Garrido, C. Sanz, et al.

Show abstract

Internet Protocol Set-Top Boxes (IP STBs) based on single-processor architectures have been recently introduced in the market. In this paper, the implementation of an MPEG-4 SP/ASP video decoder for a multi-format IP STB based on a TMS320DM641 DSP is presented. An initial decoder for PC platform was fully tested and ported to the DSP. Using this code an optimization process was started achieving a 90% speedup. This process allows real-time MPEG-4 SP/ASP decoding. The MPEG-4 decoder has been integrated in an IP STB and tested in a real environment using DVD movies and TV channels with excellent results.

High parallel-pipeline integer-pel and fractional-pel motion estimation VLSI architectures for H.264/AVC

Armando Mora-Campos, Francisco J. Ballester-Merelo, Marcos A. Martínez-Peiró, et al.

Show abstract

This paper presents efficient integer-pel and fractional-pel motion estimation VLSI architectures for luma video component in H.264/AVC. The proposed architectures were designed as hardware accelerators for 32-bit processors to reduce computation cost and processing time. Both accelerators use the full-search block-matching algorithm to fulfil the standard requirements with maximum quality. The integer motion estimator is composed by a systolic 16x16 processing elements array with optimal memory management and effective data-path. The array was designed to adjust the search window size and shape at macroblock level without a high control overhead. Simulation results show computing and time reduction from 21.5%, to 60.7% using a search window shape different than square with a maximum PSNR degradation of 0.014 dB. The fractional motion estimation architecture improves time operation of previous designs by means of two parallel-pipeline stages, an effective block flow and faster interpolation modules. The design can process the 41 macroblock partitions and sub-partitions in quarter-pel resolution in 606 clock cycles. Operating at 100-MHz clock frequency, the architecture supports 720p HD video format @ 30 fps for one reference frame. Implementation results based on FPGA devices using VHDL are included.

H.264 video stream statistical analysis for post-compression improvements

J. Hugo Pérez Casanova, Francisco J. Ballester Merelo, Marcos A. Martínez Peiró, et al.

Show abstract

As today's video applications are being requested in many portable end-user devices, and these ones are far capable of holding and processing large amounts of video data, there is a need for bit rate improvement in compression algorithms. The objective of this paper is to propose a hardware based post-compression enhancer situated between the Video Coding Layer and the Network Abstraction Layer of H.264. Our research analyzes the resulting bit streams produced by the emerging H.264 standard. The goal is to enhance compression rates by proposing simple post-compression techniques based in symbol's statistics. The CABAC and CAVLC entropy coders used in H.264 work optimally for 1-bit symbols, and the statistical distribution among them is almost the best. Our studies reveal that the bit streams presents similar results for 8-bit symbols, and thus a post-compression using well known byte-based mechanisms will not yield better results; further more, our studies also show that they even degrade the original compression rate. Nevertheless, a non equally distribution using 6-bits symbols in 2046-bits discrete data packets is found, which can be exploited to boost compression. This distribution varies between 5.4% for the most probable symbol and 0.98% for the least probable symbol in average. Again, simple coding a few of the most probable symbols will result in bit rate reduction. A 1- bit compression enhanced used flag penalty must be introduced for each discrete packet, increasing its size in 0.049%.

Variable length packet scheduler algorithm with QoS support

R. Arteaga, F. Tobajas, R. Esper-Chaín, et al.

Show abstract

A novel variable length packet scheduling algorithm focused on real output queue reference architecture is presented in this paper. The main features of this packet scheduler development are the Quality of Service (QoS) and variable length packet support. The packet scheduler supports up to eight traffic classes which can be assigned up to two different priorities. The bandwidth assigned to any traffic class is configurable. The packet scheduler has been described and simulated in C++ language under uniform and bursty traffic conditions.

Integrated hardware interfaces for modular sensor networks

J. Portilla, A. de Castro, A. Abril, et al.

Show abstract

Sensor networks have reached a great relevance during the last years. The idea is to use a large number of nodes measuring different physical parameters in several environments, which implies different research challenges (low power consumption, communication protocols, platform hardware design, etc). There is a tendency to use modular hardware nodes in order to make easier rapid prototyping as well as to be able to redesign faster and reuse part of the hardware modules. One of the main obstacles for rapid prototyping is that sensors present heterogeneous interfaces. In this paper, a VHDL library for sensors/actuators interfaces is proposed. The purpose is to have a set of different sensor interfaces that include the most common in the sensors/actuators world, enabling the rapid connection to a new sensor/actuator. Moreover, the concept presented here may be used for new interfaces that can be easily developed. The VHDL implementation is independent of the final platform (any FPGA or ASIC) in order to minimize redesign effort and make easier rapid prototyping. The interfaces are installed in a UPM platform for sensor networks.

Design of a 0.13-um CMOS cascade expandable ΣΔ modulator for multi-standard RF telecom systems

Alonso Morgado, Rocío del Río, José M. de la Rosa

Show abstract

This paper reports a 130-nm CMOS programmable cascade &Sgr;&Dgr; modulator for multi-standard wireless terminals, capable of operating on three standards: GSM, Bluetooth and UMTS. The modulator is reconfigured at both architecture- and circuit- level in order to adapt its performance to the different standards specifications with optimized power consumption. The design of the building blocks is based upon a top-down CAD methodology that combines simulation and statistical optimization at different levels of the system hierarchy. Transistor-level simulations show correct operation for all standards, featuring 13-bit, 11.3-bit and 9-bit effective resolution within 200-kHz, 1-MHz and 4-MHz bandwidth, respectively.

A design tool for high-resolution high-frequency cascade continuous-time ΣΔ modulators

R. Tortosa, R. Castro-López, J. M. de la Rosa, et al.

Show abstract

This paper introduces a CAD methodology to assist the designer in the implementation of continuous-time (CT) cascade ΣΔ modulators. The salient features of this methodology are: (a) flexible behavioral modeling for optimum accuracy-efficiency trade-offs at different stages of the top-down synthesis process; (b) direct synthesis in the continuous-time domain for minimum circuit complexity and sensitivity; and (c) mixed knowledge-based and optimization-based architectural exploration and specification transmission for enhanced circuit performance. The applicability of this methodology will be illustrated via the design of a 12 bit 20 MHz CT ΣΔ modulator in a 1.2V 130nm CMOS technology.

A highly linear fast-settling envelope detector

Juan Pablo Alegre, Santiago Celma, Jose María García del Pozo, et al.

Show abstract

A novel high performance envelope detector structure is proposed in this work. This circuit does not need the traditional compensation between keeping and tracking required in these circuits due to a system by what the signal peaks are held in two periods and combined to obtain the envelope of the signal. At the same time, it solves some drawbacks due to switches used in these kinds of circuits when this technique has been employed, such as nonlinearities due to charge injection in switches, which reduces the linearity of these circuits. Thus, it is shown the superior performance of this circuit obtaining for a signal at 10MHz small ripple (<1%), very fast settling (0.4&mgr;s) and using smaller capacitive area (-60%) than conventional peak detectors. Furthermore, this envelope detector has a dynamic range above 40dB for nonlinearities below 1dB.

Behavioral modeling and simulation of multi-standard RF receivers using MATLAB/SIMULINK

Alonso Morgado, Rocío del Río, José M. de la Rosa

Show abstract

This paper presents a SIMULINK block set for the behavioral modeling and high-level simulation of RF receiver frontends. The toolbox includes a library with the main RF circuit models that are needed to implement wireless transceivers, namely: low noise amplifiers, mixers, oscillators, filters and programmable gain amplifiers. There is also a library including other blocks like the antenna, duplexer filter and switches, required to implement reconfigurable architectures. Behavioral models of building blocks include the main ideal functionality as well as the following non-idealities: thermal noise characterized by the Noise Figure (NF) and the Signal-to-Noise Ratio (SNR) and nonlinearity expressed by the input-referred 2nd- and 3rd-order intercept points, IIP₂ and IIP₃, respectively. In addition to these general parameters, some block specific errors have been also included, like oscillator phase noise and mixer offset. These models have been incorporated into the SIMULINK environment making an extensive use of C-coded S-functions and reducing the number of library block elements. This approach reduces the simulation time while keeping high accuracy, what makes the proposed toolbox very appropriate to be combined with an optimizer for the automated high-level synthesis of radio receivers. As an application of the capabilities of the presented toolbox, a multi-standard Direct-Conversion Receiver (DCR) intended for 4G telecom systems is modeled and simulated considering the building-block requirements for the different standards.

Low power considerations and design for CMOS VCOs applied for direct conversion receivers at 5GHz

Iñigo Adin, Carlos Quemada, Hector Solar, et al.

Show abstract

Low power design often requires direct conversion architectures, such as low-IF or zero-IF. Any of these two possibilities needs a low power, low phase noise voltage control oscillator (VCO) in the frequency synthesizer. This work is focused on low power considerations applied to the practical modern conception of this device. Fulfilling the standard specifications (output power, phase noise, frequency range) should be completed with this deeper step. A conscious design leads moreover to an improvement in the results obtained by the classical considerations. The increase of the quality factor of the passive elements is one of the key points, followed by an accurate design of the architecture scheme. Furthermore, lower current consumption provides higher oscillation frequencies and facilitates higher frequency ranges, which follow the trends of modern wireless and wideband communication standards. In order to validate the aforementioned assumptions, a CMOS VCO has been implemented in UMC 0.18μm 1P6M technology, with power consumption down to 3.4mW.

Test measures evaluation for VCO and charge pump blocks in RF PLLs

Anna Asquini, Jean-Louis Carbonero, Salvador Mir

Show abstract

This work deals with the development of test techniques for RF (Radio Frequency) components. The optimization of production tests for RF PLLs (Phase Locked Loops) is targeted in particular. With devices of ever increasing speed, it is no longer possible to measure some of the classical circuit performances even with dedicated RF testers. This problem has been tackled in recent years by using BIST (Built-In Self Test) techniques for PLLs able to perform on-chip high resolution measurements such as picosecond jitter. However, this risks to become also impossible at very high frequencies. This paper will present some preliminary work towards the optimization of production tests for RF PLLs with the aim of avoiding traditional test measurements such as phase noise. Attention will be focused on single relevant blocks of the RF PLL that have the greatest impact on phase noise and other critical performances. The VCO (Voltage Controlled Oscillator) block will be first studied, since it gives the greatest contribution to phase noise. Our work will proceed by taking into consideration the possibility to detect mismatches and leakages in CP (Charge Pump) currents that cause spurious in the output spectrum. Simulation results in this paper will consider only catastrophic faults in circuit components. The fault coverage of performances and simple test measurements that can be implemented on-chip for the VCO is thus evaluated.

A low-voltage fully balanced CMFF transconductor with improved linearity

B. Calvo D.D.S., S. Celma, J. P. Alegre, et al.

Show abstract

This paper presents a new low-voltage pseudo-differential continuous-time CMOS transconductor for wideband applications. The proposed cell is based on a feedforward cancellation of the input common-mode signal and keeps the input common mode voltage constant, while the transconductance is easily tunable through a continuous bias voltage. Linearity is preserved during the tuning process for a moderate range of transconductance values. Simulation results for a 0.35 &mgr;m CMOS design show a 1:2 G_m tuning range with an almost constant bandwidth over 600 MHz. Total harmonic distortion figures are below -60 dB over the whole range at 10 MHz up to a 200 &mgr;A_p-p differential output. The proposed cell consumes less than 1.2 mW from a single 2.0 V supply.

A study of stacked and miniature 3-D inductor performance for radio frequency integrated circuit design

A. Goñi Iturri, F. J. del Pino, S. L. Khemchandani, et al.

Show abstract

The performance of stacked and miniature three-dimensional spiral inductors is analyzed and compared to standard planar coils. For this purpose, nine of these new structures have been fabricated in a 0.35-μm four-metal SiGe process. According to the measurement results, some of the proposed stacked inductors occupy only 48% of the area of a conventional planar inductor with the same inductance value and work frequency. The area reduction is even more significant with the miniature 3-D structures, which occupy only 22% in some cases, and translate the inductor self-resonance frequency to higher values than the conventional stacked inductors. In spite of this area reduction, these new structures employ metal levels close to the substrate, which significantly degrades the quality factor. So the standard planar coils continue to be the best choice if the designer requires high-quality inductors. However, stacked and 3-D miniature structures could be a better solution if the area saving is the circuit major priority.

A fully integrated VCO with a wide tuning range for DVB-H

S. L. Khemchandani, G. Betancort, Javier del Pino Suarez, et al.

Show abstract

European standard DVB-T (Digital Video Broadcasting - Terrestrial) has already proven its exceptional features, including the possibility to receive broadcast services also with portable devices and even in receivers with a limited mobility such as cars. This paper presents a fully integrated LC voltage controlled oscillator (VCO) in a low cost 0.35 μm SiGe technology for DVB-H standard. To obtain VCO specifications system simulations have been done. The designed VCO is suitable to operate with ZERO and LOW IF receiver architectures. To integrate all the VCO components, it oscillates at double of the frequency band, from 940 to 1724 MHz. In order to sweep the whole frequency range, the tank is composed of an array of switched capacitors together with the varactors. The integrated inductors have been designed by electromagnetic simulations using Momentum(C). Techniques like using a capacitor divider, biasing the transistor for minimum noise and emitter degeneration have been utilized to improve phase noise requirements. The obtained phase noise is -108 dBc/Hz at 100 kHz offset and the power consumption, including the output buffers, is 28 mW.

Influence of the diffusion geometry on PN integrated varactors

J. García, B. González, M. Marrero-Martin, et al.

Show abstract

In this work, four different structures based on PN junction are studied. These structures are based on changing the geometry of the p+ diffusion. The designed and fabricated devices will be used like integrated varactors in radiofrequency applications. The measures have been made at frequencies since 500 MHz to 10 GHz, and the influence that diffusion geometry has in the capacitance (C), the quality factor (Q) and the tuning range (TR) have been studied. The pn varactors have been simulated with Taurus Device and have been fabricated in a 0.35um SiGe standard process. In order to obtain better benefits of the varactors, the p⁺ and n⁺ diffusion geometries have been modified. This way, novel structures called crosses, fingers, donuts, and bars have been designed and fabricated. The results of the tuning range have been obtained superior to 40%.

A 3-10 GHz ultra-wideband SiGe LNA with wideband LC matching network

J. del Pino, S. L. Khemchandani, H. García, et al.

Show abstract

A fully-integrated SiGe wide band amplifier implemented in a standard low cost 0.35 &mgr;m process up to 12 dB of gain and a bandwidth of 3-10 GHz is presented. This circuit is divided in 3 stages. The first one is the input matching where the use of an inductively degenerated amplifier is expanded by embedding the input network of the amplifying device in a multisection reactive network so that the overall input reactance is resonated over a wider bandwidth. The second stage is a cascode transistor to obtain a great power gain and a high isolation between input and output ports. In adition, by adjusting the area and the multiplicity of these transistors, we can reduce the noise figure of the circuit. Finally at the output a new technique is used to increase the bandwidth. This technique is based in the replacement of the load resistor by a shunt-peaking resistor composed by an inductor and a resistor. The addition of an inductance gives an output impedance that remains roughly constant over a broader frequency range. Chip dimensions are 0.665 × 0.665 mm2 and power dissipation is 39 mW, drawn from a 3.3V supply. The noise figure ranges from 3.5 to 7.5 in the band between 2 GHz and 8.5 GHz. Finally, the circuit core draws 5.3 mA from a 3.3-V supply. All this results were measured in a probe station.

Powerline LonTalk protocol performance analysis in SystemC

Salvatore Isaia, Massimo Conti, Giovanni B. Vece, et al.

Show abstract

This paper presents an implementation in SystemC of the LonTalk protocol starting from the reference code for the MC68360 microcontroller. The SystemC code of the LonTalk protocol has been written at transaction level with the aim of reusability and high simulation speed. The efficiency of the LonTalk protocol as been verified in a powerline network with different number of nodes in the network, different type of traffic and in presence of noise in the channel. We verified that SystemC can be easily used as a executable language to define protocols, ensuring reusability and reduced design time.

Mixed signal SystemC modelling of a SoC architecture with Dynamic Voltage Scaling

G. Leoce, R. D'Aparo, G. B. Vece, et al.

Show abstract

Dynamic Voltage Scaling is a technique that reduces supply voltage and clock frequency, depending on system workload, with the aim of reducing power dissipation. This works is devoted to the modelling and integration in the same system level simulation environment of the analog DC-DC converter for Dynamic Voltage Scaling, the Dynamic Power Management and a test System on Chip with three Masters and two Slaves connected to the AMBA AHB bus. The DC-DC converter is described with a detail such that it is possible to verify the effect of the transient during the change of supply voltage on the performance of the DVS algorithm. SystemC and its extension SystemC-WMS have been used as description languages in which a system level description of the dynamic supply management coexists with the analog switching power converter and its control.

Efficient hardware implementation of 3X for radix-8 encoding

G. A. Ruiz, Mercedes Granda

Show abstract

Several commercial processors have selected the radix-8 multiplier architecture to increase their speed, thereby reducing the number of partial products. Radix-8 encoding reduces the digit number length in a signed digit representation. Its performance bottleneck is the generation of the term 3X, also referred to as hard multiple. This term is usually computed by an adding and shifting operation, 3X=2X+X, in a high-speed adder. In a 2X+X addition, close full adders share the same input signal. This property permits simplified algebraic expressions associated to a 3X operation other than in a conventional addition. This paper shows that the 3X operation can be expressed in terms of two signals, H_i and K_i, functionally equivalent to two carries. Hi and Ki are computed in parallel using architectures which lead to an area and speed efficient implementation. For the purposes of comparison, implementation based on standard-cells of conventional adders has been compared with the proposed circuits based on these H_i and K_i signals. As a result, the delay of proposed serial scheme is reduced by roughly 67% without additional cost in area, the delay and area of the carry look-ahead scheme is reduced by 20% and 17%, and that of the parallel prefix scheme is reduced by 26% and 46%, respectively.

Dynamic power management of a system on chip based on AMBA AHB bus

Simone Marinelli, Massimo Conti

Show abstract

This paper presents new dynamic voltage scaling and power management architectures for a System on Chip with an AMBA AHB bus. The Power State Machine describing the status of the core follows the recommendations of the ACPI standard. The algorithm controls the power states of each block on the basis of battery status, chip temperature and workload conditions. The DVS and DPM architectures proposed has been described at system level in SystemC. In particular, we investigated the possibility to change clock frequency and supply voltage for each master, slave and bus independently when no transfer is required. A system level analysis has been performed to evaluate the effect of different DVS and DPM algorithms, topologies and architectures on power dissipation and system performances.

Implementation of a parametrizable router architecture for networks-on-chip (NoC) with quality of service (QoS) support

R. Regidor, F. Tobajas, V. De Armas, et al.

Show abstract

Managing the complexity of designing Systems-on-Chip (SoC) containing billions of transistors requires decoupling computation from communication. Networks-on-Chip (NoC) have been proposed as a solution for managing this problem as they meet the reusability, scalability and parallelism requirements of these systems, while coping with power constraints and clock distribution. In this paper, the implementation of a router's architecture for NoC with both guaranteed and best-effort services support is described, and some synthesis results are presented. The proposed router architecture is parameterized on the number of virtual channels, the size of virtual channels, the number of virtual channels for guaranteed traffic, the relative priority of the guaranteed traffic, and the switching technique.

A SoC for studying multi-agent software/algorithms on a real swarm of mm3-sized microrobots

R. Casanova, A. Diéguez, A. Arbat, et al.

Show abstract

This paper presents a System On Chip (SoC) designed specifically to control a mm³- sized microrobot called I-SWARM. The robot is intended to be part of a colony of 1000 I-SWARM robots for studying swarm behavior in real time and in a real swarm. The SoC offers a well-suited hardware platform to run multi-agent systems software. It is composed of an 8051 microcontroller with 2 kB of data memory and 8 kB of program memory. The processor is provided with specific hardware modules for controlling the locomotion unit, the communications and the vibrating contact sensor of the robot. These modules perform basic tasks as movements or communications so the 8051 can focus on processing data and taking decisions. With these capabilities, the robot is able to avoiding collisions with other members of the swarm, performing cooperative tasks, sharing information and executing specialized tasks. The SoC has been fabricated with a 0.13 &mgr;m ultra low power CMOS process of STMicroelectronics and consumes less than 1 mW.

New FPSoC-based architecture for efficient FSBM motion estimation processing in video standards

J. A. Canals, M. A. Martínez, F. J. Ballester, et al.

Show abstract

Due to the timing constraints in real time video encoding, hardware accelerator cores are used for video compression. System on Chip (SoC) designing tools offer a complex microprocessor system designing methodologies with an easy Intellectual Property (IP) core integration. This paper presents a PowerPC-based SoC with a motion-estimation accelerator core attached to the system bus. Motion-estimation (ME) algorithms are the most critical part in video compression due to the huge amount of data transfers and processing time. The main goal of our proposed architecture is to minimize the amount of memory accesses, thus exploiting the bandwidth of a direct memory connection. This architecture has been developed using Xilinx XPS, a SoC platforms design tool. The results show that our system is able to process the integer pixel full search block matching (FSBM) motion-estimation process and interframe mode decision of a QCIF frame (176*144 pixels), using a 48*48 pixel searching window, with an embedded PPC in a Xilinx Virtex-4 FPGA running at 100 MHz, in 1.5 ms, 4.5 % of the total processing time at 30 fps.

The electrical origin of the 1/f electrical noise in solid-state devices and integrated circuits

José-Ignacio Izpura

Show abstract

Contrarily to current theories based on hypothetical traps where charge carriers can translocate to, this paper gives an explanation for 1/f electrical noise in solid-state devices based on well known electrical effects taking place in these devices. A parasitic capacitor and the backgating effect of its thermal noise, both overlooked in the course of the years, are the basis of the above explanation. The above effect produces a resistance noise with a Lorentzian spectrum in any unbiased resistor. As soon as the resistor is biased, this spectrum is scattered into a continuous set of Lorentzian noise terms that synthesize 1/f noise over a frequency band that is an exponential function of the bias voltage V_DS expressed in thermal units V_T. This is due to the exponential dependence of the dynamical resistance in most semiconductor junctions. A V_DS=180mV is thus enough to give 1/f noise over three decades at room temperature. This unexpected and non-linear feature, where the spectrum of this noise results from the own bias used to measure it, has kept 1/f noise as a puzzling and enigmatic noise for more than eighty years. The above theory, born in the solid-state field, can also be generalized to other devices where two orthogonal forces or energy gradients appear while electrical noise is being measured.

A stochastic model of digital switching noise

Giorgio Boselli, Gabriella Trucco, Valentino Liberali

Show abstract

In fully CMOS digital integrated systems, switching activity of logic gates is the source of the so-called "digital noise". Together with interconnections parasitics, digital switching noise is known to cause "bouncing" effects, i.e. oscillations of on-chip supply and bias voltages, which can remarkably degrade overall system performance. Digital switching is a completely deterministic process, depending on both circuit parameters and input signals. However, the huge number of logic blocks in a digital integrated system makes digital switching a cognitively stochastic process. Therefore, logic transition activity can be analyzed using a stochastic approach. In this paper, we model the digital switching current as a stationary shot noise process, and we derive both its amplitude distribution and its power spectral density.

Temperature impact on multiple-input CMOS gates delay

C. de Benito, S. Bota, J. L. Rosselló, et al.

Show abstract

CMOS IC scaling has surpassed the 100nm barrier being now in the 65nm node with a rapid migration to the 35nm generation. In achieving the primary goals of technology scaling such as performance and density increase at a reduced cost per transistor, new side effects must be solved representing further challenges to the advance of the predicted roadmap. One of these challenges is related to the management of thermal-related effects such as hot-spots and overall junction temperature increase as they may have a significant impact on performance, power containment, circuit reliability, and even functionality. The adoption of adequate thermal management solutions requires a detailed analysis of the fundamental relationships governing the device and interconnect subsystem. Although much attention has been given to such analysis at the device and the logic inverter levels, less is known about such dependences in complex gates with transistor stacks. In this work we study the fundamental mechanisms underlying the temperature dependence of transistor stacks showing the key role of the stack dynamic threshold on the overall delay-temperature behavior at the gate level.

Enhanced instrumentation system to characterize the electric behavior of AFLC displays

José M. S. Pena, José I. Santos, Juan C. Torres, et al.

Show abstract

Liquid crystals (LCs) have focused a great attention from industrial and scientific community in the last decades. The property of ferroelectricity in liquid crystals was first claimed in 1975. Five years later so-called surface-stabilized ferroelectric liquid crystals were described, which caused a surge in industrial interest because of their promising electro-optical applications. Additionally, antiferroelectricity in liquid crystals (AFLCs) was also identified in 1989. This kind of devices show interesting electrooptical properties such as tri-state switching, fast response, intrinsic analogue gray scale, wide viewing angle, among others, which are appropriate for high-end video display applications. The performance of the AFLC displays is determined by their electrical and optical behavior. In order to measure some electrical characteristics such as the switching currents and electrical loops (polarization-voltage), an A/D instrumentation system has been specially designed and implemented. A first approach of this system was reported elsewhere. However, new components were introduced and functional blocks of such version were modified in order to improve the S/N ratio. It is well known that to perform measurements of electric current ranged in the pA-nA, a specific and usually expensive equipment should be used. This work presents an enhanced A/D instrumentation system which is able to measure with reasonably precision small amplitude values of switching currents in AFLC displays. Moreover, the system can also carry out the temporal integration of the switching current allowing to obtain the electrical hysteresis of these devices.

VLSI Circuits and Systems III

Volume Details

Table of Contents

Table of Contents