VLSI Circuits and Systems V

Front Matter: Volume 8067

Show abstract

This PDF file contains the front matter associated with SPIE Proceedings Volume 8067, including the Title Page, Copyright information, Table of Contents, and the Conference Committee listing.

Evolutionary hardware design

Lukas Sekanina

Show abstract

Since the early 1990's researchers have begun to apply evolutionary algorithms to synthesize electronic circuits. Nowadays it is evident that the evolutionary design approach can automatically create efficient electronic circuits in many domains. This paper surveys fundamental concepts of evolutionary hardware design. It introduces relevant search algorithms such as Cartesian genetic programming (CGP). Several case studies are presented demonstrating strength and weakness of the method. Target domains are combinational circuit synthesis where the goal is to minimize the number of gates, image filter design intended for field programmable gate arrays (FPGAs) where the goal is to obtain the quality of filtering of conventional methods for a significantly lower cost on a chip and evolution of benchmark circuits for evaluation of testability analysis methods. Evolved circuits are compared with the best-known conventional designs. FPGAs are presented as accelerators for evolutionary circuit design and circuit adaptation.

3D-design exploration of CNN algorithms

Lambert Spaanenburg, Suleyman Malki

Show abstract

Multi-dimensional algorithms are hard to implement on classical platforms. Pipelining may exploit instruction-level parallelism, but not in the presence of simultaneous data; threads optimize only within the given restrictions. Tiled architectures do add a dimension to the solution space. With locally a large register store, data parallelism is handled, but only to a dimension. 3-D technologies are meant to add a dimension in the realization. Applied on the device level, it makes each computational node smaller. The interconnections become shorter and hence the network will be condensed. Such advantages will be easily lost at higher implementation levels unless 3-D technologies as multi-cores or chip stacking are also introduced. 3-D technologies scale in space, where (partial) reconfiguration scales in time. The optimal selection over the various implementation levels is algorithm dependent. The paper discusses such principles while applied on the scaling of cellular neural networks (CNN). It illustrates how stacking of reconfigurable chips supports many algorithmic requirements in a defect-insensitive manner. Further the paper explores the potential of chip stacking for multi-modal implementations in a reconfigurable approach to heterogeneous architectures for algorithm domains.

Bio-inspired FPGA architecture for self-calibration of an image compression core based on wavelet transforms in embedded systems

Rubén Salvador, Alberto Vidal, Félix Moreno, et al.

Show abstract

A generic bio-inspired adaptive architecture for image compression suitable to be implemented in embedded systems is presented. The architecture allows the system to be tuned during its calibration phase. An evolutionary algorithm is responsible of making the system evolve towards the required performance. A prototype has been implemented in a Xilinx Virtex-5 FPGA featuring an adaptive wavelet transform core directed at improving image compression for specific types of images. An Evolution Strategy has been chosen as the search algorithm and its typical genetic operators adapted to allow for a hardware friendly implementation. HW/SW partitioning issues are also considered after a high level description of the algorithm is profiled which validates the proposed resource allocation in the device fabric. To check the robustness of the system and its adaptation capabilities, different types of images have been selected as validation patterns. A direct application of such a system is its deployment in an unknown environment during design time, letting the calibration phase adjust the system parameters so that it performs efcient image compression. Also, this prototype implementation may serve as an accelerator for the automatic design of evolved transform coefficients which are later on synthesized and implemented in a non-adaptive system in the final implementation device, whether it is a HW or SW based computing device. The architecture has been built in a modular way so that it can be easily extended to adapt other types of image processing cores. Details on this pluggable component point of view are also given in the paper.

Partial reconfiguration of a peripheral in an FPGA-based SoC to analyse performance-area behaviour

Andres Cardona, Yi Guo, Carles Ferrer

Show abstract

Systems on Chip (SoC) are present in a wide range of applications. This diversity in addition with the quantity of critical variables involved in their design process becomes it as a great challenging topic. FPGAs have consolidated as a preferred device to develop and prototype SoCs, and consequently Partial Reconfiguration (PR) has gained importance in this approach. Through PR it is possible to have a section of the FPGA operating, while other section is disabled and partially reconfigured to provide new functionality. In this way hardware resources can be time-multiplexed and therefore it is possible to reduce size, cost and power. In this case we focus on the implementation of a SoC, in an FPGA-based board, with one of its peripherals being a reconfigurable partition (RP). Inside this RP different hardware modules defined as reconfigurable modules (RM) can be configured. Thus, the system is suitable to have different hardware configurations depending on the application needs and FPGA limitations, while the rest of the system continues working. To this end a MicroBlaze soft-core processor is used in the system design and a Virtex-5 FPGA board is utilized to its implementations. A remote sensing application is used to explore the capabilities of this approach. Identifying the section(s) of the application suitable of being time-shared it is possible to define the RMs to place inside the RP. Different configurations were carried out and measurements of area were taken. Preliminary results of the performance-area utilisation are presented to validate the improvement in flexibility and resource usage.

Cost and energy efficient reconfigurable embedded platform using Spartan-6 FPGAs

A. Otero, M. Llinás, M. L. Lombardo, et al.

Show abstract

Modern FPGAs with run-time reconfiguration allow the implementation of complex systems offering both the flexibility of software-based solutions combined with the performance of hardware. This combination of characteristics, together with the development of new specific methodologies, make feasible to reach new points of the system design space, and make embedded systems built on these platforms acquire more and more importance. However, the practical exploitation of this technique in fields that traditionally have relied on resource restricted embedded systems, is mainly limited by strict power consumption requirements, the cost and the high dependence of DPR techniques with the specific features of the device technology underneath. In this work, we tackle the previously reported problems, designing a reconfigurable platform based on the low-cost and low-power consuming Spartan-6 FPGA family. The full process to develop the platform will be detailed in the paper from scratch. In addition, the implementation of the reconfiguration mechanism, including two profiles, is reported. The first profile is a low-area and low-speed reconfiguration engine based mainly on software functions running on the embedded processor, while the other one is a hardware version of the same engine, implemented in the FPGA logic. This reconfiguration hardware block has been originally designed to the Virtex-5 family, and its porting process will be also described in this work, facing the interoperability problem among different families.

Performance analysis of 802.15.4 wireless standard

Emanuele Losavio, Simone Orcioni, Massimo Conti

Show abstract

In recent years, low distance wireless connectivity is having an exponential growth. Fast design and verification of the performances of the wireless network is becoming a necessity for electronic industry to hit the more and more restrictive market requests. A system level model of the network is indispensable to ensure fast and flexible design and verification. In this work a SystemC model of the IEEE 802.15.4 standard is presented. The model has been used to verify the performances of the 802.15.4 standard in terms of efficiency and channel throughput as a function of the number of nodes in the network, of the dimension of the payload and of the frequency with which the nodes try to transmit.

RFID-based wake-up system for wireless sensor networks

A. Sanchez, J. Aguilar, S. Blanc, et al.

Show abstract

A critical issue of Wireless Sensor Networks circuits is energy management. This work presents a Radio-Triggered Wake-Up solution designed and developed for WSN based systems. The proposed circuit manages, in a simple and efficient way, node switching between sleep mode and both receiving or transmitting active modes. It uses a HW hearing circuit, which lowers power consumption and avoids extra processing on the main microcontroller. The weak-up is selective with predefined recognition patterns without the microcontroller intervention. Furthermore, it is tiny in size, and the whole circuit is suitable for single CMOS chip integration. The circuit has been tested to demonstrate the Wake- Up proposal worthiness. With only 8.7 microwatts of power consumption (@ 3.0 Vdc) the system successfully Wake-Up nodes up to 15 meters away from the transmission source. This performance improves solutions presented in previous research works.

Wireless videosurveillace over 802.15.4

Massimo Conti, Manuele Telari, Simone Orcioni

Show abstract

This paper presents a performance analysis of wireless image sensor networks for videosurveillance using the IEEE 802.15.4 wireless standard. The dependence of image quality and network throughput with JPEG image compression parameters and wireless protocol parameters has been investigated. The objective of the work is to give useful guidelines in the design of wireless videosurveillance networks over low cost, low power, low rate IEEE 802.15.4 wireless protocol.

Simulation of impulse response for indoor wireless optical channels using 3D CAD models

S. Rodríguez, B. R. Mendoza, G. Miranda, et al.

Show abstract

In this paper, a tool for simulating the impulse response for indoor wireless optical channels using 3D computer-aided design (CAD) models is presented. The tool uses a simulation algorithm that relies on ray tracing techniques and the Monte Carlo method and improves on all previous methods from a computational standpoint. The 3D scene, or the simulation environment, can be defined using any computer-aided design (CAD) software in which the user specifies, in addition to the setting geometry, the reflection characteristics of the surface materials as well as the structures of the emitters and receivers involved in the simulation. Also, in an effort to improve the computational efficiency, two optimizations are presented. The first consists of dividing the setting into cubic regions of equal size. These sub-regions allow the program to consider only those object faces and/or surfaces that are in the ray propagation path. This first optimization provides a calculation improvement of approximately 50%. The second involves the parallelization of the simulation algorithm. The parallelization method proposed involves the equal and static distribution of the rays for computation by different processors. This optimization results in a calculation speed-up that is essentially proportional to the number of processors used.

A 55 μW programmable gain amplifier with constant bandwidth for a direct conversion receiver

Jens Masuch, Manuel Delgado-Restituto

Show abstract

A fully differential programmable gain amplifier (PGA) with constant transfer characteristic and very low power consumption is proposed and implemented in a 130 nm CMOS technology. The PGA features a gain range of 4 dB to 55 dB with a step size of 6 dB and a constant bandwidth of 10-550 kHz. It employs two stages of variable amplification with an intermediate 2nd order low-pass channel filter. The first stage is a capacitive feedback OTA using current-reuse achieving a low input noise density of 16.7 nV/√Hz. This stage sets the overall high-pass cutoff frequency to approximately 10 kHz. For all gain settings the high-pass cutoff frequency variation is within ±5%. The low-pass channel filter is merged with a second amplifying stage forming a Sallen-Key structure. In order to maintain a constant transfer characteristic versus gain, the Sallen-Key feedback is taken from different taps of the load resistance. Using this new approach, the low-pass cutoff frequency stays between 440 kHz and 590 kHz for all gain settings (±14%). Finally, an offset cancelation loop reduces the output offset of the PGA to less than 5 mV (3σ). The PGA occupies an area of approximately 0.06 mm² and achieves a post-layout power consumption of 55 μW from a 1V-supply. For the maximum gain setting the integrated input referred noise is 14.4 μVRMS while the total harmonic distortion is 0.7 % for a differential output amplitude of 0.5 V.

Multi-user THSS system for indoor wireless optical communications with angle-diversity detection

S. Rodríguez, B. R. Mendoza, J. R. Álvarez, et al.

Show abstract

In this paper, an infrared wireless communications system based on THSS techniques employing angle-diversity detection is studied via simulation. Although the system is designed to operate at infrared wavelengths, it can also be used for Visible Light Communications (VLC). Time-Hopping codification is based on splitting the symbol period into several short slots. In order to specify which slots are used to transmit and which are not, the use of maximum length sequences is considered. The remaining time slots can be used by other users so as to provide the system with multiple access capabilities. In this paper, a 2-PPM modulation scheme is selected because it yields good results in infrared systems as well as in VLC. Furthermore, the THSS system allows for selecting the number of pulses per symbol to be transmitted and makes use of an optimum maximum-likelihood receiver for AWGN channels with the ability to choose between hard or soft decision decoding. The system designed allows for comparing the performance based on the computation of the bit error rate (BER) as a function of the pulse energy to noise power spectral density ratio, for different configurations in single-user and multi-user environments. The results show a significant enhancement when angle-diversity receivers are used as compared to employing receivers using a single-element detector with a wide field of view (FOV). In this paper, two angle-diversity structures are compared: conventional and sectored receivers. Although the sectored receiver exhibits better BER than the conventional receiver, its implementation is more complex.

Heterogeneous transmission and parallel computing platform (HTPCP) for remote sensing applications

Yi Guo, Antonio Rius, Serni Ribò, et al.

Show abstract

The increasing campaigns of GNSS-R scenario have put great pressure on high performance post-processing design into the space level instrumentation. Due to large scale of information acquisition and the intensive computation of cross-correlation waveform (CC-WAV), the overhead between the processing time and the storage of amount of data prior to downlink issues has lead us to get the solution of real-time parallel processing design on board. In this paper, we focus on the interaction of the chip level multiprocessing architecture and applications, which show that the unbalanced workload of the transmission and processing can be compensated on the novel architecture, Heterogeneous Transmission and Parallel Computing Platform (HTPCP). The intention of HTPCP is to get a solution for the bus congestion and memory allocation issues. The pros and cons of SMP and HTPCP are discussed, and the simulation results prove that HTPCP can highly improve the throughput of the GOLD-RTR system.

Low-power, high-speed FFT processor for MB-OFDM UWB application

Guixuan Liang, Danping He, Eduardo de la Torre, et al.

Show abstract

This paper presents a low-power, high-speed 4-data-path 128-point mixed-radix (radix-2 & radix-2²) FFT processor for MB-OFDM Ultra-WideBand (UWB) systems. The processor employs the single-path delay feedback (SDF) pipelined structure for the proposed algorithm, it uses substructure-sharing multiplication units and shift-add structure other than traditional complex multipliers. Furthermore, the word lengths are properly chosen, thus the hardware costs and power consumption of the proposed FFT processor are efficiently reduced. The proposed FFT processor is verified and synthesized by using 0.13 μm CMOS technology with a supply voltage of 1.32 V. The implementation results indicate that the proposed 128-point mixed-radix FFT architecture supports a throughput rate of 1Gsample/s with lower power consumption in comparison to existing 128-point FFT architectures.

SystemC modelling of wireless communication channel

Massimo Conti, Simone Orcioni

Show abstract

This paper presents the definition in SystemC of wireless channels at different levels of abstraction. The different levels of description of the wireless channel can be easily interchanged allowing the reuse of the application and baseband layers in a high level analysis of the network or in a deep analysis of the communication between the wireless devices.

Hardware-software-co-design of parallel and distributed systems using a behavioural programming and multi-process model with high-level synthesis

Stefan Bosse

Show abstract

A new design methodology for parallel and distributed embedded systems is presented using the behavioural hardware compiler ConPro providing an imperative programming model based on concurrently communicating sequential processes (CSP) with an extensive set of interprocess-communication primitives and guarded atomic actions. The programming language and the compiler-based synthesis process enables the design of constrained power- and resourceaware embedded systems with pure Register-Transfer-Logic (RTL) efficiently mapped to FPGA and ASIC technologies. Concurrency is modelled explicitly on control- and datapath level. Additionally, concurrency on data-path level can be automatically explored and optimized by different schedulers. The CSP programming model can be synthesized to hardware (SoC) and software (C,ML) models and targets. A common source for both hardware and software implementation with identical functional behaviour is used. Processes and objects of the entire design can be distributed on different hardware and software platforms, for example, several FPGA components and software executed on several microprocessors, providing a parallel and distributed system. Intersystem-, interprocess-, and object communication is automatically implemented with serial links, not visible on programming level. The presented design methodology has the benefit of high modularity, freedom of choice of target technologies, and system architecture. Algorithms can be well matched to and distributed on different suitable execution platforms and implementation technologies, using a unique programming model, providing a balance of concurrency and resource complexity. An extended case study of a communication protocol used in high-density sensor-actuator networks should demonstrate and compare the design of a hardware and software target. The communication protocol is suited for high-density intra-and interchip networks.

Dynamically reconfigurable router for NoC congestion reduction

Juan E. Rosales, Félix Tobajas, Valentín de Armas, et al.

Show abstract

Multiprocessor System-on-Chip (MPSoCs) are emerging as one of the technologies providing a way to support the growing design complexity of embedded systems including several types of cores. The interconnection among cores of a MPSoC is proposed to be provided by Networks-on-Chip (NoC). In real applications it is usual to find different interconnection needs amongst cores, so distinct bandwidth is needed in each node of a NoC. Since larger FIFOs in NoC routers provide larger throughputs and smaller latencies, depths are usually sized for the worst case, compromising not only the routing area, but power consumption. In this paper, a reconfigurable router with a dynamic sharing mechanism of buffers at the input channels is proposed to reduce congestion in the network. In this situation, a channel may dynamically lend or borrow some non-used buffer units to or from neighboring channels, in accordance to the connection rates. The proposed reconfigurable router architecture was embedded in the Hermes NoC. The main advantages of the Hermes are its small size and modular design. This, as well as the open source approach, have lead to the selection of this NoC. The basic element of Hermes is a router with five bi-directional ports employing an XY routing algorithm. FIFO buffering is present only at the input channel, with all channels having the same buffer depth defined at design time. The proposed reconfigurable router has been coded in VHDL at RTL level from the adaptation of the Hermes router to fit into the proposed scheme. Results obtained from the simulation of the router under scenarios with different traffic characteristics and percentage of shared buffer, show that mean latency can be reduced up to a 30% in comparison to the original router.

NoC emulation framework based on Arteris NoC solution for multiprocessor system-on-chip

José A. Mori, Félix Tobajas, Valentín de Armas, et al.

Show abstract

The growth of complexity and the requirements of on-chip technologies create the need for new architectures which generate solutions representing a compromise between complexity and power consumption, and Quality of Service (QoS) of the communications between the cores of a System-on-Chip (SoC). Network-on-Chip (NoC) arises as a solution to implement efficient interconnections in SoC. This new technology, due to its complexity, creates the need of specialized engineers who can design the intricate circuits that NoC requires. It is possible to reduce those specialization needs by using CAD tools. In this paper, one of this tools, called Arteris NoC Solution, is used for developing the proposed framework for NoC emulation. This software includes three different tools: NoCexplorer, for high-level simulation of an abstract model of the NoC, NoCcompiler, in which the NoC is defined and generated in HDL language, and NoCverifier, which performs simulations of the HDL code. Furthermore, a validation and characterization infrastructure was developed for the created NoC, which can be completely emulated in FPGA. This environment is composed by OCP traffic generators and receptors, which also can perform measurements over the created traffic, and a store and communication module, which is responsible for storing the results obtained from the emulation of the entire system in the FPGA, and send it to a PC. Once the data is stored in the PC, statistical analyses are performed, including a comparison of mean latency from high level simulations, RTL simulations and FPGA emulations. The analysis of the results is obtained from three scenarios with different NoC topologies for the same SoC design.

Performance analysis of the scalable video coding (SVC) extension of H.264/AVC for constrained scenarios

N. Suarez, Gustavo M. Callico, Sebastian Lopez, et al.

Show abstract

Scalable Video Coding (SVC) is the extension of H.264/AVC standard proposed by Joint Video Team (JVT) to provide flexibility and adaptability on video transmission. SVC is an extension of the H.264/AVC codec that exploits the use of layers, what permits to obtain a bit stream where specific parts can be removed to obtain an output video with a lower resolution (temporal or spatial) and/or lower quality/fidelity. This paper provides a performance analysis of the scalable video coding (SVC) extension of H.264/AVC for constrained scenarios. For this, the open-source decoder called "Open SVC Decoder" was adapted to obtain a version likely to be implemented in reconfigurable architectures. For each scenario a set of different sequences were decoded to analyze the performance of each functional block inside the decoder. From this analysis we conclude that reconfigurable architectures are a suitable solution for an SVC decoder in a constrained device or for a specific range of scalability levels. Our proposal consists in architecture of a SVC decoder that admits different options depending on device requirements where certain blocks are customizable to improve the performance of decoder in hardware resources usage and execution time.

Scalable 2D architecture for H.264 SVC deblocking filter with reconfiguration capabilities for on-demand adaptation

T. Cervero, A. Otero, E. de la Torre, et al.

Show abstract

One of the most computational intensive tasks in recent video encoders and decoders is the deblocking filter. Its computational complexity is considerable, and it might take more than 30% of the total computational cost of the decoder execution. Nowadays, some of its limiting factors for reaching real-time capabilities are mainly related with memory and speed. Trying to deal with these factors, this paper proposes a novel Deblocking filter architecture which supports all filtering modes available in both the H.264/AVC and Scalable Video Coding (SVC) standards. It has been implemented in a hardware scalable architecture, which benefits of the parallelism and adaptability of the algorithm and which can be adapted dynamically in FPGAs. Regarding to the parallelism, this architecture mapping is capable of respecting data dependencies among MBs while several functional units (FU) are filtering data in parallel. Regarding scalability, the architecture is flexible enough for adapting its performance to the diverse environment demands. This fact is possible by increasing or decreasing the number of FUs, like in a systolic array. In this sense, this paper will present a composition between the FU proposed against the state-of-the art work.

Closing the gap between software and hardware super-resolution image reconstruction: provision of high-quality output

Tomasz Szydzik, Gustavo M. Callico, Antonio Nunez

Show abstract

The ability of additional detail extraction offered by the super-resolution image reconstruction (SRIR) algorithms greatly improves the results of the process of spatial images augmentation, leading, where possible, to significant objective image quality enhancement expressed in the increase of peak-signal-to-noise ratio (PSNR). Nevertheless, the ability of providing hardware implementations of fusion SRIR algorithms capable of producing satisfactory output quality with real-time performance is still a challenge. In order to make the hardware implementation feasible a number of trade-offs that compromise the outcome quality are needed. In this work we tackle the problem of high resource requirements by using a non-iterative algorithm that facilitates hardware implementation. The algorithm execution flow is presented and described. The algorithm output quality is measured and compared with competitive solutions including interpolation and iterative SRIR implementations. The tested iterative algorithms use frame-level motion estimation (ME), whereas the proposed algorithm relies on, performance-wise better, block matching ME. The comparison shows that the proposed non-iterative algorithm offers superior output quality for all tested sequences, while promising efficient hardware implementation able to match -at least- the software implementations in terms of outcome quality.

Area-delay trade-offs of texture decompressors for a graphics processing unit

Emilio Novoa Súñer, Pablo Ituero, Marisa López-Vallejo

Show abstract

Graphics Processing Units have become a booster for the microelectronics industry. However, due to intellectual property issues, there is a serious lack of information on implementation details of the hardware architecture that is behind GPUs. For instance, the way texture is handled and decompressed in a GPU to reduce bandwidth usage has never been dealt with in depth from a hardware point of view. This work addresses a comparative study on the hardware implementation of different texture decompression algorithms for both conventional (PCs and video game consoles) and mobile platforms. Circuit synthesis is performed targeting both a reconfigurable hardware platform and a 90nm standard cell library. Area-delay trade-offs have been extensively analyzed, which allows us to compare the complexity of decompressors and thus determine suitability of algorithms for systems with limited hardware resources.

Evaluation of elementary functions without range reduction

Filipe A. Meireles, António J. Araújo

Show abstract

The evaluation of elementary functions can be performed by approximations using minimax polynomials requiring simple hardware resources. The general method to calculate an elementary function is composed by three steps: range reduction, computation of the polynomial in the reduced argument and range reconstruction. This approach allows a low-degree polynomial approximation but range reduction and reconstruction introduce a computation overhead. This work proposes an evaluation methodology without range reduction and range reconstruction steps. Applications that need to compute elementary functions may benefit from avoiding these steps if the argument belongs to a sub-domain of the function. Particularly in the context of embedded systems, applications related to digital signal processing most of the times require function evaluation within a specific interval. As a consequence of not doing range reduction, the degree of the approximant polynomials increases to maintain the required precision. Interval segmentation is an effective way to overcome this issue because the approximations are computed in smaller intervals. The proposed methodology uses non-uniform segmentation as a way to mitigate the problem arising from not carrying out range reduction. The benefits that come from applying interval segmentation to the general evaluation technique are limited by the range reduction and reconstruction steps because the segmentation only applies to the approximation step. However, when used in the proposed methodology it reveals more effective. Some elementary functions were implemented using the proposed methodology in a FPGA device. The metric used to characterize the proposed technique are the area occupation and the corresponding latency. The results of each implementation without range reduction were compared with the corresponding ones of the general method using range reduction. The results show that latency can be significantly reduced while the area is approximately the same.

Challenges facing academic research in commercializing event-detector implantable devices for an in-vivo biomedical subcutaneous device for biomedical analysis

E. Juanola-Feliu, J. Colomer-Farrarons, P. Miribel-Català, et al.

Show abstract

It is widely recognized that the welfare of the most advanced economies is at risk, and that the only way to tackle this situation is by controlling the knowledge economies and dealing with. To achieve this ambitious goal, we need to improve the performance of each dimension in the "knowledge triangle": education, research and innovation. Indeed, recent findings point to the importance of strategies of adding-value and marketing during R+D processes so as to bridge the gap between the laboratory and the market and so ensure the successful commercialization of new technology-based products. Moreover, in a global economy in which conventional manufacturing is dominated by developing economies, the future of industry in the most advanced economies must rely on its ability to innovate in those high-tech activities that can offer a differential added-value, rather than on improving existing technologies and products. It seems quite clear, therefore, that the combination of health (medicine) and nanotechnology in a new biomedical device is very capable of meeting these requisites. This work propose a generic CMOS Front-End Self-Powered In-Vivo Implantable Biomedical Device, based on a threeelectrode amperometric biosensor approach, capable of detecting threshold values for targeted concentrations of pathogens, ions, oxygen concentration, etc. Given the speed with which diabetes can spread, as diabetes is the fastest growing disease in the world, the nano-enabled implantable device for in-vivo biomedical analysis needs to be introduced into the global diabetes care devices market. In the case of glucose monitoring, the detection of a threshold decrease in the glucose level it is mandatory to avoid critic situations like the hypoglycemia. Although the case study reported in this paper is complex because it involves multiple organizations and sources of data, it contributes to extend experience to the best practices and models on nanotechnology applications and commercialization.

Discrete to full custom ASIC solutions for bioelectronic applications

J. Punter-Villagrasa, J. Colomer-Farrarons, P. Miribel-Català, et al.

Show abstract

This paper presents a first approach on multi-pathogen detection system for portable point-of-care applications on discrete electronics field. The main interest is focused on the development of custom built electronic solutions for bioelectronics applications, from discrete devices to ASICS solutions.

VLSI design of low-leakage single-ended 6T SRAM cell

S. Solanki, F. Frustaci, P. Corsonello

Show abstract

Aggressive CMOS scaling results in significant increase of leakage current in MOS transistors manufactured in deep submicron regime. Consequently low power SRAM design becomes an important criteria in design of VLSI circuits. In this work, a new six transistor (6T) SRAM cell based on dual threshold voltage and dual power supply techniques, has been proposed for low leakage SRAM design. The proposed cell has been compared to the conventional 6T-SRAM, using the 65 nm technology. Compared to conventional six transistor (6T) SRAM cell, new 6T SRAM cell reduces leakage power consumption by 72.6%. Furthermore, the proposed SRAM cell shows no area overhead and comparable read/ writes speed as compared to conventional 6T SRAM cell.

Effect of separation and depth of N+ diffusions in the quality factor and tuning range of PN varactors

M. Marrero-Martín, T. Szydzik, J. García, et al.

Show abstract

Variable capacitors, the varactors, are key components in many types of radiofrequency circuits and thus high quality varactors are essential to achieve high quality factors in these devices. This work presents results of a study on the variation of tuning range and quality factor when varying the depth and separation of N⁺ diffusions in a PN junction varactor with fixed number of cells. For test needs four types of cells, varying the geometry of N⁺ and P⁺ diffusions were designed. The varactors were formed by horizontally and vertically overlapping cells. Based on their implementation structure, the varactors were divided into two groups, each comprising 4 varactors. The varactors belonging to the first group have all N⁺ diffusions connected to the buried layer. Varactors from the second group use floating N+ diffusions and a buried N⁺ diffusion to separate pairs formed by two adjacent cells. Post implementation measurements show that the area of varactors from in the first and second group is 1795.74 μm²(51.9 x 34.6) and 1288.92 μm² (46.7 x 27.6), respectively. The varactors from the 1st group have a high tuning range, whereas the ones from the 2nd group high quality factors and require less area.

Automatic vector generation guided by a functional metric

I. Ugarte, P. Sanchez

Show abstract

Verification is still the bottleneck of the complex digital system design process. Formal techniques have advanced in their capacity to handle more complex descriptions, but they still suffer from problems of memory or time explosion. Simulation-based techniques handle descriptions of any size or complexity, but the efficiency of these techniques is reduced with the increase in the system complexity because of the exponential increase in the number of simulation tests necessary to maintain the coverage. Semi-formal techniques combine the advantages of simulation and formal techniques as they increase the efficiency of simulation-based verification. In this area, several research works have introduced techniques that automate the generation of vectors driven by traditional coverage metrics. However, these techniques do not ensure the detection of 100% of faults. This paper presents a novel technique for the generation of vectors. A major benefit of the technique is the more efficient generation of test-benches than when using techniques based on structural metrics. The technique introduced is more efficient since it relies on a novel coverage metric, which is more directly correlated to functional faults than structural coverage metrics (line, branch, etc.). The proposed coverage metric is based on an abstraction of the system as a set of polynomials where all system behaviours are described by a set of coefficients. By assuming a finite precision of coefficients and a maximum degree of polynomials, all the system behaviors, including both the correct and the incorrect ones, can be modeled. This technique applies mathematical theories (computer algebra and number theory) to calculate the coverage and to generate vectors which maximize coverage. Moreover, in this work, a tool which implements the technique has been developed. This tool takes a C-based system description and provides the coverage and the generated vectors as output.

Energy consumption estimation of an OMAP-based Android operating system

Gabriel González, Eduardo Juárez, Juan José Castro, et al.

Show abstract

System-level energy optimization of battery-powered multimedia embedded systems has recently become a design goal. The poor operational time of multimedia terminals makes computationally demanding applications impractical in real scenarios. For instance, the so-called smart-phones are currently unable to remain in operation longer than several hours. The OMAP3530 processor basically consists of two processing cores, a General Purpose Processor (GPP) and a Digital Signal Processor (DSP). The former, an ARM Cortex-A8 processor, is aimed to run a generic Operating System (OS) while the latter, a DSP core based on the C64x+, has architecture optimized for video processing. The BeagleBoard, a commercial prototyping board based on the OMAP processor, has been used to test the Android Operating System and measure its performance. The board has 128 MB of SDRAM external memory, 256 MB of Flash external memory and several interfaces. Note that the clock frequency of the ARM and DSP OMAP cores is 600 MHz and 430 MHz, respectively. This paper describes the energy consumption estimation of the processes and multimedia applications of an Android v1.6 (Donut) OS on the OMAP3530-Based BeagleBoard. In addition, tools to communicate the two processing cores have been employed. A test-bench to profile the OS resource usage has been developed. As far as the energy estimates concern, the OMAP processor energy consumption model provided by the manufacturer has been used. The model is basically divided in two energy components. The former, the baseline core energy, describes the energy consumption that is independent of any chip activity. The latter, the module active energy, describes the energy consumed by the active modules depending on resource usage.

SCA security verification on wireless sensor network node

Wei He, Carlos Pizarro, Eduardo de la Torre, et al.

Show abstract

Side Channel Attack (SCA) differs from traditional mathematic attacks. It gets around of the exhaustive mathematic calculation and precisely pin to certain points in the cryptographic algorithm to reveal confidential information from the running crypto-devices. Since the introduction of SCA by Paul Kocher et al [1], it has been considered to be one of the most critical threats to the resource restricted but security demanding applications, such as wireless sensor networks. In this paper, we focus our work on the SCA-concerned security verification on WSN (wireless sensor network). A detailed setup of the platform and an analysis of the results of DPA (power attack) and EMA (electromagnetic attack) is presented. The setup follows the way of low-cost setup to make effective SCAs. Meanwhile, surveying the weaknesses of WSNs in resisting SCA attacks, especially for the EM attack. Finally, SCA-Prevention suggestions based on Differential Security Strategy for the FPGA hardware implementation in WSN will be given, helping to get an improved compromise between security and cost.

Self-repairing SRAM architecture to mitigate the inter-die process variations at 65nm technology

Sumit Kansal, Marco Lanuzza, Pasquale Corsonello

Show abstract

With aggressive scaling, one of the major barriers that CMOS technology faces is the increasing process variations. The variations in process parameters not only affect the performance of the devices but also degrade the parametric yield of the circuits. Adaptive repairing techniques like adaptive body bias were proved to be effective to mitigate variations in the process parameters. In this paper, we evaluate the use of zone based self-repairing techniques to mitigate the impact of process variations on SRAM cells. Two different techniques were experimented and analyzed through extensive Monte Carlo simulations and exploiting a commercial 65nm technology. Obtained results demonstrate that improvements up to 35.7% in variability factor for leakage power and up to 22.3% in Design Margin for leakage power can be achieved by using the suggested approach.

Analytical modeling of glitch propagation in nanometer ICs

Xavier Gili, Salvador Barceló, Sebastià A. Bota, et al.

Show abstract

We present a glitch propagation model that can be used to categorize the propagation likelihood of a given noise waveform trough a logic gate. This analysis is key to predict if a SET induced within a combinational block is capable of causing a SEU. The model predicts the glitch output characteristics given the input noise waveform for each gate in a 65- nm technology library. These noise transfer curves are fitted to known functions to have a simple analytical equation and compute the propagation. Comparison between simulations and model shows a good agreement.

Evaluation of MOBILE-based gate-level pipelining augmenting CMOS with RTDs

Juan Nuñez, María J. Avedillo, José M. Quintana

Show abstract

The incorporation of Resonant Tunnel Diodes (RTDs) into III/V transistor technologies has shown an improved circuit performance: higher circuit speed, reduced component count, and/or lowered power consumption. Currently, the incorporation of these devices into CMOS technologies (RTD-CMOS) is an area of active research. Although some works have focused the evaluation of the advantages of this incorporation, additional work in this direction is required. We compare RTD-CMOS and pure CMOS realizations of a network of logic gates which can be operated in a gate-level pipeline. Significant lower average power is obtained for RTD-CMOS implementations.

VLSI Circuits and Systems V

Volume Details

Table of Contents

Table of Contents