Proceedings Volume 5205

Advanced Signal Processing Algorithms, Architectures, and Implementations XIII

cover
Proceedings Volume 5205

Advanced Signal Processing Algorithms, Architectures, and Implementations XIII

View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 24 December 2003
Contents: 11 Sessions, 59 Papers, 0 Presentations
Conference: Optical Science and Technology, SPIE's 48th Annual Meeting 2003
Volume Number: 5205

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Section
  • Time-Frequency and Time-Scale Analysis I
  • Time-Frequency and Time-Scale Analysis II
  • Adaptive Sensor Network
  • Wireless Communication
  • Image Processing
  • Exploitation of Structures in Imaging and Signal Processing
  • Matrix Algorithms
  • Signal Processing Applications
  • Computer Arithmetic
  • Arithmetic and Architectures for Real-Time Applications
Section
icon_mobile_dropdown
Scale and translation invariant shape and signal classification and detection
Highly sophisticated methods for detection and classification of signals and images are available. However, most of these methods are not robust to nonstationary variations such as imposed by Doppler effects or other forms of warping. Fourier methods handle time-shift or frequency shift variations in signals or spatial shifts in images. A number of methods have been developed to overcome these problems. In this paper we discuss some specific approaches that have been motivated by time-frequency analysis. Methodologies developed for images can often be profitably used for time-frequency analysis as well, since these representations are essentially images. The scale transform introduced by Cohen can join Fourier transforms in providing robust representations. Scale changes are common in many signal and image scenarios. We call the representation which results from appropriate transformations of the object of interest the Scale and Translation Invariant Representation or STIR. The STIR method is summarized and results from machine diagnosis, radar, marine mammal sounds, TMJ sounds, speech and word spotting are discussed. Some of the limitations and variations of the method are discussed to provide a rationale for selection of particular elements of the method.
Time-Frequency and Time-Scale Analysis I
icon_mobile_dropdown
Wigner distribution approximation applied to differential equations
Galleani and Cohen recently developed a Wigner-distribution based approach for the study of linear differential equations in general, and the gliding tone problem in particular. In this research, we extend these results by considering an exponential chirp and also a set of arbitrarily selected forcing functions. These forcing functions are taken from a class of smoothed and monotonically increasing phase functions. By examining a number of arbitrary selected forcing functions from this set, insight is gained into the nature of the solution and the associated dynamics of the system.
Time-frequency characterization of random systems
We have previously presented a method to write equations for the Wigner distribution corresponding to the solution of a linear differential equation. We extend this method to arbitrary time-frequency distributions, the Short Time Fourier Transform, and the wavelet transform. Also, we apply the method to stochastic signals and address a number of applications including Brownian motion. We obtain the Wigner distribution for Brownian motion by solving the Langevin equation and also by using our method. We obtain exact solutions.
Evaluation of the load impedance in coaxial cable via time-frequency domain reflectometry
YongJune Shin, Edward J. Powers, TokSon Choe, et al.
A new impedance measurement methodology based on time-frequency domain reflectometry (TFDR) is proposed. For the evaluation of the reflection coefficient in time-frequency domain reflectometry, the distortion of the reflected wave by the frequency-dependent attenuation is compensated which otherwise results in inaccurate impedance measurement. Also, the phase difference between the incident and reflected waveforms caused by the state of the load impedance is evaluated by the cross time-frequency distribution which provides time-frequency localized phase difference information. The proposed methodology is verified by a set of numerical electromagnectic simulation experiments and the results are compared with classical time domain reflectometry (TDR). Impedance measurement via time-frequency domain reflectometry is more accurate over a wider range of impedances than TDR.
Construction of signal-dependent Cohen's-class time-frequency distributions using iterative blind deconvolution
Andrew E Yagle, Jose E. Torres-Fernandez
The problem of kernel design for Cohen time-frequency distributions is formulated as a blind deconvolution problem. It is shown that the iterative blind deconvolution method (IBDM) used in image restoration problems can be successfully applied to solve the kernel design problem. We obtain the following results: (1) the rate of convergence depends on which domains the constraints are imposed (2) certain constraints are needed for algorithm convergence (3) the more constrained the kernel design is, the faster the rate of convergence (4) there are tradeoffs between constraints, e.g., compact support vs. satisfaction of marginals; (5) time-frequency distributions which are more amenable to visual interpretation can be obtained using this algorithm.
The marginals and time-frequency distributions
We address the question of the importance of satisfying the marginal conditions in a time-frequency distribution and discuss in detail the fundamental and practical issues involved. We also examine the marginals of the spectrogram. The spectrogram often gives reasonable results even though both marginals can never be satisfied exactly, but sometimes it can give very unreasonable results. We show by examples that the spectrogram gives the clearest characterization of the time-frequency properties of a signal when the combined error in both of its marginals is as small as possible. In contrast, when the error in either one of the marginals of the spectrogram is large, the time-frequency characterization degrades. Some different error measures are described. We discuss the issues involved both theoretically and by way of numerical simulations. We also address the issue of marginals for the random case.
Spatial polarimetric time-frequency distributions and applications to direction-of-arrival estimation
Time-frequency distributions (TFDs) have evolved to be a powerful technique for nonstationary signal analysis and synthesis. With the use of a multi-sensor array, spatial time-frequency distributions (STFDs) have been developed and successfully applied to high-resolution direction-of-arrival (DOA) estimation and blind recovery of the source waveforms. In this paper, the polarimetric dimension is introduced to the STFDs resulting in the spatial polarimetric time-frequency distributions (SPTFDs) as a platform for the processing of non-stationary polarized signals. In the SPTFD platform, polarized signals are decomposed (projected) into two orthogonal polarization components, such as horizontal and vertical, and later processed in a manner where their polarization characteristics are exploited. This empowers the STFDs with additional degrees of freedom and improves the robustness of the signal and noise subspaces, and therefore, serving to enhance DOA estimation, signal recovery, and source separation performance. To demonstrate the advantages of the SPTFDs, the polarimetric time-frequency MUSIC (PTF-MUSIC) method for DOA estimation is proposed based on the SPTFD platform and is shown to outperform the time-frequency, polarimetric, and conventional MUSIC methods.
Information theoretic signal detection on the time-frequency plane
A comprehensive theory for time-frequency based signal detection has been developed during the past decade. The time-frequency detectors proposed in literature are linear structures operating on the time-frequency representation of the signals and are equivalent to quadratic receivers that are defined in the time domain. In this paper, an information theoretic approach for signal detection on the time-frequency plane is introduced. In recent years, Renyi entropy has been proposed as an effective measure for quantifying signal complexity on the time-frequency plane and some important properties of this measure have been proven. In this paper, a new approach that uses the entropy functional as the test statistic for signal detection is developed. The minimum error detection algorithm is derived and the performance of this new signal detection method is demonstrated through examples.
Spectral correlation based on the cross-spectrum
We present new signal processing methods which may be used to estimate a high resolution spectrum. These methods are idally suited to the analysis of non-stationary signal components, such as speech formants or pitch. In addition, we present a new spectral correlation method, which may be used to provide accurate estimates of the frequency differences of the formants of linguistically similar utterances spoken by different speakers. These algorithms are accurate and simple and may be easily automated. These algorithms are based on a deterministic speech model and a cross-spectral representation computed from a short time Fourier transform.
Time-Frequency and Time-Scale Analysis II
icon_mobile_dropdown
Broadband interference mitigator for global positioning system using time-frequency domain excision filtering
Mark D. Silvius, Alan R. Lindsey, David H. Hughes
This paper explores methods to excise broadband interference signals from a Global Positioning System (GPS) spread spectrum signal using bilinear transformations. The Wigner-Ville Distribution allows for signal and jammer time-frequency characterization. The jammer signals, identified in the Time-Frequency (T-F) domain, are excised by zeroing peak amplitudes above a statistically determined detection threshold. The GPS signal is synthesized using inverse Wigner transform involving least squares amplitude and phase matching. Pre-processing increases Pseudo-Random Noise (PRN) correlator performance due to significant reduction in effective Jammer-to-Signal ratio (JSR). The proposed technique improves receiver robustness for large classes of broadband jammers, not limited to instantaneously narrowband jammers with a constant modulus or well defined instantaneous frequencies, while improving bit-error-performance and GPS Coarse Acquisition (C/A) and tracking in hostile interference environments.
Applications of non-uniform multi-Gabor expansions to time-frequency analysis
Michael D Hoffman, Shidong Li
We present a non-uniform multi-Gabor expansion (MGE) algorithm in which multiple windows and their corresponding translations and modulations are used for signal analysis. Analysis windows of varying supports, each corresponding to a chosen scale of detail, decompose a signal using different translation and modulation parameters for each scale. Structural analysis of dual synthesis waveforms is provided. We show that synthesis waveforms can still be formed from translations and modulations of a set of basic dual windows at each scale. The use of different translation and modulation parameters at each scale allows for a more refined and concise time-frequency representation of the signal, while retaining the versatility of uniform MGEs in representing signal dynamics at many different scales. Examples of implementation and practical application will be presented.
Signal arrival times
A femtosecond electromagnetic pulse propagating in a linear dispersive medium is analyzed in phase spaces of (space, wave number) and (time, frequency). First moment densities are computed as eigenvalues of differential operators acting on Hilbert ray representations of the pulse in relevant domains. These moment densities are used as metrics in the relevant phase spaces to measure the local mean values and their deviations in position and wavenumber, and in time and frequency. For example, arrival times of the pulse over its spectral content are computed as the mean arrival time at a particular frequency to within one half the local deviation at that frequency. These metrics on the pulse are compared to the relevant phase space Wigner distribution.
Large dynamic range time-frequency signal analysis with application to helicopter Doppler radar data
Despite the enhanced time-frequency analysis (TFA) detailing capability of quadratic TFAs like the Wigner and Cohen representations, their performance with signals of large dynamic range (DNR in excess of 40 dB) is not acceptable due to the inability to totally suppress the cross-term artifacts which typically are much stronger than the weakest signal components that they obscure. AMTI and GMTI radar targets exhibit such high dynamic range when microDoppler is present, with the aspects of interest being the weakest components. This paper presents one of two modifications of linear TFA to provide the enhanced detailing behavior of quadratic TFAs without introducing cross terms, making it possible to see the time-frequency detail of extremely weak signal components. The technique described here is based on subspace-enhanced linear predictive extrapolation of the data within each analysis window to create a longer data sequence for conventional STFT TFA. The other technique, based on formation of a special two-dimensional transformed data matrix analyzed by high-definition two-dimensional spectral analysis methods such as 2-D AR or 2-D minimum variance, is compared to the new technique using actual AMTI and GMTI radar data.
Adaptive Sensor Network
icon_mobile_dropdown
Transformation: growing role of sensor networks in defense applications
Karl J Gunzelman, Kwan S. Kwok, Eric P. Krotkov
The Department of Defense (DoD) is undergoing a transformation. What began as theoretical thinking, under the notion of a Revolution in Military Affairs (RMA) is now beginning to manifest itself in a “Transformation.” The overall goal of the transformation described in Joint Vision 2020 is the creation of a force that is dominant across the full spectrum of military operations. The warfighting concept that will allow us to achieve Joint Vision 2020 operational capabilities is Network Centric Warfare (NCW). NCW is no less than the embodiment of an Information Age transformation of the DoD. It involves a new way of thinking about how we accomplish our missions, how we organize and interrelate, and how we acquire, field and use the systems that support us. It will involve ways of operating that have yet to be conceived, and it will employ technologies yet to be invented. NCW has the potential to increase warfighting capabilities by orders of magnitude, and it will do so by leveraging information superiority. A major condition to success is an infostructure that is robustly networked to support information collection, sharing and collaboration; which will require increased emphasis on sensor research, development and implementation. DARPA is taking steps today to research, develop and implement those sensor capabilities. The Multi-Body Control program is a step in that direction.
Determination of vehicle behavior based on distributed sensor network data
Vehicle tracking can be done automatically based on data from a distributed sensor network. The determination of vehicle behavior must currently be done by humans. Behaviors of interest include searching, attacking and retreating. The purpose of this paper is to show an approach for the automatic interpretation of vehicle behaviors based on data from distributed sensor networks. The continuous dynamics of the sensor network are converted into symbolic dynamics by dividing its phase space into hypercubes and associating a symbol with each region. When the phase-space trajectory enters a region, its corresponding symbol is emitted into a symbol stream. Substrings from the stream are interpreted as a formal language defining the behavior of the vehicle. The formal language from the sensor network is compared to the languages associated with known behaviors of interest. Techniques for performing quantitative comparisons between formal languages are presented. The abstraction process is shown to be powerful enough to distinguish two simple behaviors of a robot based on data from a pressure sensitive floor.
Numerical implemention of the AML algorithm for wideband DOA estimation
Len Yip, Joe C Chen, Ralph E. Hudson, et al.
In this work, three algorithms are proposed to reduce the computational complexity of the Approximated Maximum Likelihood (AML) for wideband Direction of Arrival (DOA) estimation. The first two methods, conjugate gradient and Gauss-Newton, are iterative methods that are based on gradient information of the log-likelihood function. The third method, Alienor method, is based on function approximation theory which transform a multi-variable function into a one-variable function. Therefore, a multi-dimension search is reduced to a one-dimension search. Complexity as well as computational time of these methods are compared by simulations. Effectiveness of the AML algorithm is also demonstrated by experimental data.
Automated generation of discrete event controllers for dynamic reconfiguration of autonomous sensor networks
Sarah Damiani, Christopher Griffin, Shashi Phoha
Autonomous Sensor Networks have the potential for broad applicability to national security, intelligent transportation, industrial production and environmental and hazardous process control. Distributed sensors may be used for detecting bio-terrorist attacks, for contraband interdiction, border patrol, monitoring building safety and security, battlefield surveillance, or may be embedded in complex dynamic systems for enabling fault tolerant operations. In this paper we present algorithms and automation tools for constructing discrete event controllers for complex networked systems that restrict the dynamic behavior of the system according to given specifications. In our previous work we have modeled dynamic system as a discrete event automation whose open loop behavior is represented as a language L of strings generated with the alphabet 'Elipson' of all possible atomic events that cause state transitions in the network. The controlled behavior is represented by a sublanguage K, contained in L, that restricts the behavior of the system according to the specifications of the controller. We have developed the algebraic structure of controllable sublanguages as perfect right partial ideals that satisfy a precontrollability condition. In this paper we develop an iterative algorithm to take an ad hoc specification described using a natural language, and to formulate a complete specification that results in a controllable sublanguage. A supervisory controller modeled as an automaton that runs synchronously with the open loop system in the sense of Ramadge and Wonham is automatically generated to restrict the behavior of the open loop system to the controllable sublanguage. A battlefield surveillance scenario illustrates the iterative evolution of ad hoc specifications for controlling an autonomous sensor network and the generation of a controller that reconfigures the sensor network to dynamically adapt to environmental perturbations.
Locomotion-based dynamic power management in embedded real-time systems
Lara D Oliver, Krishnendu Chakrabarty, Richard R. Brooks
Most contemporary real-time distributed systems minimize computation energy consumption to reduce total system energy. However, wireless sensor networks expend a significant portion of total energy consumption on communication energy costs. Wireless transmission energy depends directly on the desired transmission distance, so the energy required for communication between neighboring nodes is less than that for distant ones. Mobile nodes can therefore reduce transmission energy costs by approaching one another before communicating. The penalty for energy reduction through locomotion is an increase in time consumed, thus care must be taken to meet system deadlines. We combine locomotion as a communication energy reduction strategy with well-known computation energy reduction schemes and demonstrate the resultant energy savings for representative systems.
Adaptive routing using emergent protocols in wireless ad hoc sensor networks
Richard R. Brooks, Matthew Pirretti, Mengxia Zu, et al.
This paper presents distributed adaptation techniques for use in wireless sensor networks. As an example application we consider data routing by a sensor network in an urban terrain. The adaptation methods are based on ideas from physics, biology, and chemistry. All approaches are emergent behaviors in that they: (i) perform global adaptation using only locally available information, (ii) have strong stochastic components, and (iii) use both positive and negative feedback to steer themselves. We analyze the approaches’ ability to adapt, robustness to internal errors, and power consumption. Comparisons to standard wireless communications techniques are given.
Recursive training methods for robust classification: a sequential analytic centering approach to the support vector machine
Katherine Comanor, Lieven Vandenberghe
The support vector machine (SVM) is a supervised learning algorithm used in a variety of applications, including robust target classification. The SVM training problem can be formulated as dense quadratic programming problem (QP). In practice, this QP is solved in batch mode, using general-purpose interior-point solvers. Although quite efficient, these implementations are not well suited in situations where the training vectors are made available sequentially. In this paper we discuss a recursive algorithm for SVM training. The algorithm is based on efficient updates of approximate solutions on the dual central path of the QP and can be analyzed using the convergence theory recently developed for interior-point methods. The idea is related to cutting-plane methods for large-scale optimization and sequential analytic centering techniques used successfully in set-membership estimation methods in signal processing.
Wireless Communication
icon_mobile_dropdown
Performance of OFDM in high-mobility environments
This paper focuses on communications within a high-mobility context with OFDM as the modulation of choice. We first show that the Air-to-Ground (AtG) channel is the most difficult to adapt using standard channel equalization methods. We then derive a new channel model (AtGMM) for the AtG case and use it as the basis for analysis and for simulating the effects within the AtG environment. Next, we derive the optimal channel-estimation-based equalizer for the AtG channel; showing performance equivalent to a comparable time-invariant channel. We then analyze the performance of conventional OFDM-based systems using the AtGMM channel model and provide guidelines to maximizing the throughput within differing mobility contexts.
A BER analysis of a space-time signal processing scheme that combines transmitter diversity with beamforming
Ilhan Kim, Joohwan Chun
We introduce a new space-time signal processing scheme that uses both transmitter diversity technique and beamforming technique for code-division multiple access (CDMA) systems. The introduced scheme achieves the diversity gain over Rayleigh fading channel through the transmitter diversity technique, and the signal-to-noise ratio (SNR) gain by the beamforming technique. Bit error rate (BER) analyses are given each of the three cases in which the transmitter diversity, beamforming and introduced combined scheme are used, in the frequency-flat Rayleigh fading channel. The analytic results are shown to coincide with the Monte-Carlo simulation results. We show that the combined scheme is more robust to the channel correlation and the speed of mobile than the diversity and beamforming scheme. Furthermore, the combined scheme gives a reduced multiple-access-interference because of its beamforming capability.
Design and implementation of a preamble-based burst mode CPM modem over Rayleigh fading channels
Gang Xiong, Oscar Y. Takeshita, Michael P. Fitz, et al.
This paper discusses the design and implementation of a burst mode Continuous Phase Modulation (CPM) modem over flat, Rayleigh fading channels in the 220MHz frequency band. Our design focused on bandwidth efficiency while maintaining good synchronization performance and low complexity. The designed preamble is based on the combination of Minimum Shift Keying (MSK) signals and 3RC (Raised Cosine) signals. The use of MSK allows a closed form result for an initial Maximum Likelihood (ML) timing estimate. The 3RC signal, which has a better timing characteristic than MSK, is used to refine the timing and frequency estimates. The payload uses 3RC signals. Pilot symbol assisted modulation (PSAM) is used to assist channel estimation. The designed packet structure meets the Federal Communications Commission (FCC) frequency emission mask for 220MHz frequency band. A CPM modem using the designed packet structure has been implemented on the testbed and the simulation, bench test, and field test results are presented in this paper.
Space-time adaptive processing and subband array implementations
Inter-symbol interference (ISI) and co-channel interference (CCI) are two major sources to signal impairment in mobile communications. To suppress both ISI and CCI, space-time adaptive processing (STAP) systems are shown to be effective, leading to increased capacity and improved quality of service. The high complexity and slow convergence, however, are often the hurdles in practical implementation of the STAP systems. Several subband array implementations have been proposed for STAP over the past few years. These methods are to provide optimal or sub-optimal steady state performance with reduced implementation complexity and improved convergence performance. The purpose of this paper is to investigate the steady state performance of subband arrays with different decimation rates and to derive analytical expressions of the minimum mean square error (MMSE). The discrete Fourier transform (DFT) based subband arrays and both unconstrained and constrained weight adaptations are considered.
Performance of space-frequency codes for MIMO OFDM systems
Concatenation of space-time (ST) coding with orthogonal frequency-division multiplexing (OFDM) has gained much interest recently. In this work, we derive the exact pairwise error probability (PEP) of space-frequency (SF) codes for MIMO OFDM Systems. Based on the exact PEP, we derive the tighter upper and lower bounds for the PEP. For asymptotically high SNRs, the design criteria for SF codes differ significantly from those for ST codes over flat fading channels. In this paper, by drawing an analogy between SF and ST codes, we show that when the number of receive antennas is large, the minimum Euclidean distance among code words dominates the performance of SF codes. Therefore, SF codes can be optimized by using the Euclidean-distance criterion valid for AWGN channels. Simulation results are given to show that the results valid for a number of receive antennas tending to infinity still provide correct indications when the number of antennas is small.
Performance evaluation of space-time coded modulation in Rayleigh fading with diversity and channel estimation
Prasad Shamain, Laurence B. Milstein
We obtain the exact pairwise error probability for an MPSK based space-time code employing receive diversity and noisy channel estimates and experiencing fast Rayleigh fading. The exact expression for the pairwise error probability lends itself to the application of classical transfer bounding techniques. We also obtain a union bound by applying the transfer function bound in conjunction with the pairwise error probability, and show that the union bound is within a few dBs for fast Rayleigh fading. In the case of MQAM based space-time codes and/or quasi-static Rayleigh fading channels, the same technique provides an upper bound on the pairwise error probability, which, in conjunction with the transfer function bound, is used to obtain a union bound on the bit error rate.
Interleaver design for high speed turbo decoders
Recently some efficient parallel architectures for turbo decoder have been proposed. In these architectures the bottleneck for the parallelization of the decoder is the interleaver. On the other hand, turbo codes achieve a very impressive near Shannon-capacity performance in which the interleaver plays a crucial role. Therefore, it is of great interest to design interleavers that are good from both performance and parallelization point of views. In this paper we have proposed an interleaver structure that is suitable for parallelization of turbo decoders. It is shown that such an interleaver can be designed to have good BER performance as well. By this structure not only fast decoders with very low latency can be built, but also the regularity of the decoder and the simplicity of the interleaver structure make them promising for VLSI implementation. We also present a fast algorithm to design such an interleaver, which can be used to design S-random interleavers as well. Such interleavers have been designed and the performances are compared to that of S-random interleavers by simulations.
Image Processing
icon_mobile_dropdown
Parallel deconvolution methods for three dimensional image restoration
Restoration by deconvolution of three-dimensional images that have been contaminated by noise and spatially invariant blur is computationally demanding. We describe efficient parallel implementations of iterative methods for image deconvolution on a distributed memory computing cluster.
Blind superresolution from undersampled blurred measurements
Superresolution is the problem of reconstructing a single high-resolution image from several blurred and downsampled low-resolution versions of it. We solve this problem for the case of unknown blurring functions. The image and functions must have finite support, and the number of low-resolution images must equal or exceed the number of pixels in each blurring function. Using a 2-D polyphase decomposition of the image, we show that the obvious reformulation as an MIMO blind deconvolution problem fails unless the grid of downsampling is chosen carefully, in which case 2X2 downsampling can be achieved. We also show that irregular sampling allows reconstruction of an MXM high-resolution image from L2 low-resolution images blurred with an LXL blurring function can be achieved with as few as L2 + (M/L)2 pixels in each low-resolution image. Illustrative examples illustrate the points with explicit numbers.
Exploitation of Structures in Imaging and Signal Processing
icon_mobile_dropdown
Restoring chopped and nodded images by tight frames
Raymond Hon-fu Chan, Lixin Shen, Zuowei Shen
In infrared astronomy, the observed chopped and nodded image g can be viewed as the image obtained by passing the true image f through a highpass filter. Here we propose an iterative restoration algorithm by building up a tight frame wavelet system from a multiresolution analysis that has the highpass filter as one of the wavelet filters. To recover f, the low frequency information of f hidden in g is unfolded by a wavelet decomposition and reconstruction algorithm and combined with the given high frequency information in g. The main advantage of using our method to restore chopped and nodded images is that there are fewer artifacts as compared to the well-known projected Landweber method. Also the noise in the restored image is significantly reduced. Simulated and real images are tested to illustrate the efficiency of our method. Here, we briefly describe the main ideas of our recent paper and the details can be found there.
Fast Poisson solvers on nonequispaced grids: multigrid and Fourier methods compared
Gisela Poeplau, Daniel Potts
The solution of partial differential equations on adaptively generated grids play an important role in scienti£c computation. In this paper we compare two Poisson solvers for data on nonequispaced mesh points. A new meshless Fourier method based on NFFT is constructed in R3. This algorithm is compared to the well-established multigrid method working on nonequidistant meshes. Our investigations are motivated especially by simulations of the behaviour of charged particles in accelerators.
Super-resolution images from blurred observations
Andy C. Yau, Michael Kwok-po Ng
In this paper, we present a technique for generating a high- resolution image from a blurred image sequence. The image sequence consists of decimated, blurred and noisy versions of the high- resolution image. The high-resolution image is modeled as a Markov random field, and a maximum a posteriori estimation technique is used for image restoration. A fast algorithm based on Fast Fourier Transforms (FFTs) is derived to solve the resulting linear system. Numerical examples are given to illustrate the effectiveness of the method.
Approach to regularization preconditioners for image processing
Preconditioning techniques for linear systems are widely used in order to speed up the convergence of iterative methods. Unfortunately, linear systems arising in image processing are highly ill-conditioned and preconditioners often give bad results, since the noise components on the data are strongly amplified already at the early iterations. In this work, we propose filtering strategies which allow to obtain preconditioners with rgularization features for Toeplitz systems of image deblurring. Regularization preconditioners are able to speed up the convergence in the space less sensitive to the noise and, simultaneously, they slow down the restoration from components mainly corrupted by noise. A 2-d numerical simulation concerning astronomical image deblurring confirms the effectiveness of the arguments.
Integrated optical-digital approaches for enhancing image restoration and focus invariance
A novel and successful optical-digital approach for removing certain aberrations in imaging systems involves placing an optical mask between an image-recording device and an object to encode the wavefront phase before the image is recorded, followed by digital image deconvolution to decode the phase. We have observed that when appropriately engineered, such an optical mask can also act as a form of preconditioner for certain deconvolution algorithms. It can boost information in the signal before it is recorded well above the noise level, leveraging digital restorations of very high quality. In this paper, we 1) examine the influence that a phase mask has on the incoming signal and how it subsequently affects the performance of restoration algorithms, and 2) explore the design of optical masks, a difficult nonlinear optimization problem with multiple design parameters, for removing certain aberrations and for maximizing restorability and information in recorded images.
A Multigrid method for restoration and regularization of images with Dirichlet boundary conditions
In many 2D image restoration problems, such as image deblurring with Dirichlet boundary conditions, we deal with two-level linear systems whose coefficient matrix is a banded block Toeplitz matrix with banded Toeplitz blocks (BTTB). Usually, these matrices are ill-conditioned since they are associated to generating functions which vanish at (π, π) or in neighborhood of it. In this work, we solve such BTTB systems by applying an Algebraic Multi-Grid method (AMG). The technique we propose has an optimality property, i.e., its cost is of O(n1n2) arithmetic operations, where n1 x n2 is the size of the images. Unfortunately, in the case of images affected by noise, our AMG method does not provide satisfactory regularization effects. Therefore, we propose two Tikhonov-like techniques, based on a regularization parameter, which can be applied to AMG method in order to reduce the noise effects.
Kronecker products in image restoration
A flexible preconditioning approach based on Kronecker product and singular value decomposition (SVD) approximations is presented. The approach can be used with a variety of boundary conditions, depending on what is most appropriate for the specific deblurring application. It is shown that, regardless of the imposed boundary condition, SVD approximations can be used effectively in filtering methods, such as the truncated SVD, as well as in designing preconditioners for iterative methods.
Anti-reflective boundary conditions and fast 2D deblurring models
Serra-Capizzano recently introduced anti-reflecting boundary conditions (AR-BC) for blurring models: the idea seems promising both from the computational and approximation viewpoint. The key point is that, under certain symmetry conditions, the AR-BC matrices can be essentially simultaneously diagonalized by the (fast) sine transform DST I and, moreover, a C1 continuity at the border is guaranteed in the 1D case. Here we give more details for the 2D case and we perform extensive numerical simulations which illustrate that the AR-BC can be superior to Dirichlet, periodic and reflective BCs in certain applications.
Matrix Algorithms
icon_mobile_dropdown
Fast non-iterative single-blur 2-D blind deconvolution of separable and low-rank point-spread functions from finite-support images
Andrew E Yagle, Faisal M. Al-Salem
The problem of 2-D blind deconvolution is to reconstruct an unknown image from its 2-D convolution with an unknown blur function. Motivated by the superior restoration quality achieved by the recently proposed nullspace-based multichannel image restoration methods, we propose a single-blur restoration approach that avoids the restrictive assumption of multichannel blurring and has the advantage of lower complexity. The assumption made about the image and the blur function is that they both have a finite spatial extent with that of the image being known. Also, the blur is assumed to be either separable or low-rank. If the blur is separable the image can be restored perfectly under noiseless conditions. When the blur is low-rank, favorable results can be achieved if the blur function has large spatial extent relative to the image. This requirement makes the proposed solution suitable for the cases where the degraded images are severely blurred.
A comrade-matrix-based derivation of the different versions of fast cosine and sine transforms
The paper provides a fully self-contained derivation of fast algorithms to compute discrete Cosine and Sine transforms I - II based on the concept of the comrade matrix. The comrade matrices associated with different versions of the transforms differ in only a few boundary elements; hence, in each case algorithms can be derived in a unified manner.
Optimal inversions of uncertain matrices - an estimation and control perspective
Many system and signal related problems involve matrix inversion of some kind. For example, in estimation and signal recovery applications, inversion of the channel response matrix is often required in order to estimate the source signals. In the control of multivariable systems, inverting a process gain matrix may be called for in order to deliver appropriate control actions. There are situations where these matrices should be considered as uncertain (or random): for example, when the process/channel environments vary randomly, or when significant uncertainties are involved in estimating these matrices. Based on a unified approach, this paper considers both the right inversion (for control) and the left inversion (for estimation) of random matrices. In both cases, minimizing a statistical error function leads to the determination of optimal or linear optimal inversion. Connections with related techniques, such as the total least squares (TLS), the ridge regression, the Levenberg-Marquardt algorithm and the regularization theory are discussed. A variant Kalman filtering problem with randomly varying measurement gain matrix is among the applications addressed. Monte Carlo simulation results show substantial benefits by taking process/model uncertainty into consideration.
Signal Processing Applications
icon_mobile_dropdown
Generalized causal moving average (GCMA) smoothing filter for real-time applications
Moving average filters are commonly used in industries for real-time processing of noisy data. Though they perform well in filtering out the noise, they introduce significant lag in the signal. The resulting peak value of the filtered signal at the operating point is likely to be lower due to averaging of higher and lower peak signals in the averaging interval. The generalized moving average smoothing filter by Golay-Savitzky preserves the higher moments and does not suffer from the limitations imposed by the conventional moving average filter. The smoothing strategy is derived from least squares fitting of a lower order polynomial to a number of consecutive points. Due to polynomial curve fitting as opposed to a line fitting in the case of conventional moving average filter, this filter preserves the higher frequency components of the signal and their line width. This paper presents a generalized casual moving average filter deduced using the concepts in Golay-Savitzky smoothing filter for real-time applications. Golay-Savitzky filter is non-casual, relies on the future data that is not available, hence not suitable for real-time applications. Further, the designed casual filter makes use of the filtered data as opposed to the original data in the case of Golay-Savitzky. This approach allows us to conduct frequency response studies to evaluate the quality and the applicability of the filter for various signals in the aircraft engines and other engineering applications. Frequency response studies cannot carried out using the Golay-Savitzky filter. This paper also investigates the performance of various polynomial orders in reproducing the signal from a noisy data. Some of the performance measures used are bandwidth, overshoots, and lags introduced by the filter. The mathematical technique to extract the signal and deduce the coefficients in off-line is also presented.
Clustering of color map pixels: an interactive approach
Yiu Sang Moon, Franklin T. Luk, K. N. Yuen, et al.
The demand for digital maps continues to arise as mobile electronic devices become more popular nowadays. Instead of creating the entire map from void, we may convert a scanned paper map into a digital one. Color clustering is the very first step of the conversion process. Currently, most of the existing clustering algorithms are fully automatic. They are fast and efficient but may not work well in map conversion because of the numerous ambiguous issues associated with printed maps. Here we introduce two interactive approaches for color clustering on the map: color clustering with pre-calculated index colors (PCIC) and color clustering with pre-calculated color ranges (PCCR). We also introduce a memory model that could enhance and integrate different image processing techniques for fine-tuning the clustering results. Problems and examples of the algorithms are discussed in the paper.
Web-based distributed processing tools for crop classification using CGDI databases
George A. Lampropoulos, J. Chan, Adina Bardas, et al.
This paper is devoted to change detection for precision agriculture. It presents a new change detection methodology and an overview of an online image processing tool (www.signalfusion.com) with direct applicability for crop classification. The change detection algorithms discussed in the paper will be implemented on the web. They preserve the fine details of the pixel changes and furthermore may be used for crop classification. Image registration, calibration/normalization, pixel level transformations, clutter suppression and pixel level change detection are discussed and new technologies are presented. The effectiveness of the proposed technology is being tested on real SAR data with partial truthing.
Computer Arithmetic
icon_mobile_dropdown
CR-LIBM: a correctly rounded elementary function library
Catherine Daramy, David Defour, Florent de Dinechin, et al.
We present a new elementary function library, called CR-LIBM. This library implements the various functions defined by the Ansi99 C standard. It provides correctly rounded functions: the returned result is always the floating-point number that is closest to the exact result. When writing this library, our primarily goal was to certify correct rounding, and make it reasonably fast, and with a low utilisation of memory. Hence, our library can be used without any problem on real-scale problems.
Relaxing the reciprocal error needed to achieve a fixed quotient error bound
High-performance arithmetic algorithms are often based on functional iteration and these algorithms do not directly produce a remainder. Without the remainder, rounding often requires additional computation or increased quotient precision. Often multiplicative divide algorithms compute the quotient as the product of the dividend and the reciprocal of the divisor, Q =a x (1/b). Typical rounding techniques require that the quotient error be less than a maximum bound such as 1/2 unit in the last place (ulp). When using normalized floating point numbers the quotient error may be approximately twice as large as the reciprocal error since amax ≈ 2 and Eq ≈ 2 x Er. If the rounding algorithm requires |Eq| < 1/2 ulp, then the reciprocal error bound must be |Er| < 1/4 ulp. This work proposes a quantitative method to relax the reciprocal error bound for normalized floating point numbers to achieve a fixed quotient error bound. The proposed error bound of Er < 1/(2 x b) guarantees the quotient error, Eq < 1/2 ulp and the reciprocal error is in the range of 1/4 to 1/2 ulp. Using the relaxed error bound, the reciprocal error may be greater in the region where it is hardest to compute without increasing the quotient error bound.
Rounding in redundant digit floating point systems
Hossam A. H. Fahmy, Michael J. Flynn
Redundant representations are used to increase the performance of arithmetic units. If redundancy is eliminated, the bits needed to represent a number may increase or decrease depending on the type of redundancy used. In such a redundant representation, finding the exact location and correct decision for rounding without eliminating the redundancy or loosing its performance gains is difficult. This paper discusses the different issues related to rounding in redundant systems. It also presents a solution that was used to maintain the gains of redundancy in a floating point unit while correctly implementing the IEEE rounding modes.
Use of multiple number representation in automatic arithmetic data-path design
Raphael Daouphars, Laurent-Stephane Didier
We present in this paper a study of a framework for a FIR filter with fixed coefficients targeting CMOS 0.35µ for binary standard and redundant representations. Booth recoded and Lefevre constant multipliers are used. The internal data coding being both borrow-save and carry-save, the framework manages the generation of adapted adders.
Comparison of modular multipliers on FPGAs
Jean-Luc Beuchat, Laurent Imbert, Arnaud Tisserand
The choice of modular multiplication algorithms for hardware implementation is not a straightforward problem. In this paper, we analyze and compare FPGA implementations of several state-of-the-art dedicated modular multipliers. For a given constant modulus M, there are several possible methods for generating an optimized modular multiplier, i.e. the dedicated (X x Y) mod M operator. Those modular multipliers can be generated using two kinds of algorithms: those that work for all values of M and those that only work for specific values of the modulo such as 2n ± 1. Several algorithms will be compared for both kind of algorithms. We also deal with two FPGA families, Virtex E and Virtex-II from Xilinx, to measure the impact of new specific built-in resources such as small embedded multipliers. The synthesizable VHDL files of the generated modular multipliers will be available on a web page.
Two-dimensional signal gating for low power in high-performance multipliers
We propose two-dimensional signal gating for high-performance multipliers including tree multipliers and array multipliers with an upper/lower left-to-right leapfrog (ULLRLF) structure. In ULLRLF array multipliers, the G-Y gating line follows the boundary of existing upper/lower partitioning. The G-X gating line goes through the upper and lower LRLF arrays. In tree multipliers, the G-Y gating line follows the existing partitioning of tree branches. The G-X line goes through all carry-save adders for partial product reduction. Because of the irregularity of the tree reduction structure, signal gating in tree multipliers is more complex than that in array multipliers. Experimental results indicate that two-dimensional gating is quite efficient in high-performance multipliers, with 65% power reduction under test data with large dynamic range.
Efficient sign extension for multiple addition
Robert T. Grisamore, Earl E. Swartzlander Jr.
A technique for reducing the sign extension overhead in adder trees is presented. A generalized version of the technique is shown to reduce the number of redundant sign extension computations required for reducing parallel adder trees from N terms to two terms. Additionally, the technique eliminates the fan-out latency that traditional sign extension places on late arriving sign bits. Twos complement number growth is also managed in carry-save form without the need for carry propagation. The application of the technique to 2N term adder trees is demonstrated. The implementation requires no computational overhead and needs minimal hardware. This design not only reduces hardware complexity, but also reduces computation delay. Finally, a simple circuit transformation to the traditional 4-2 compressor allows simple construction of circuits utilizing the technique.
Flexible arithmetic and logic unit for multimedia processing
Novel arithmetic units are needed to achieve the cost, performance, power, and functionality requirements of emerging multimedia systems. This paper presents the design and implementation of a 64-bit arithmetic and logic unit (ALU) for multimedia processing. The 64-bit ALU supports subword-parallel processing by allowing one 64-bit, two 32-bit, four 16-bit, or eight 8-bit operations to be performed in parallel. In addition to conventional ALU operations, the ALU also supports several operations for enhanced multimedia processing including parallel compare, parallel average, parallel minimum, parallel maximum, and parallel shift and add. To efficiently implement a variety of multimedia applications, the ALU supports saturating and wrap-around arithmetic operations on unsigned and two's complement operands. This paper compares the area and delay of the 64-bit multimedia ALU to those of a more conventional 64-bit ALU.
Arithmetic and Architectures for Real-Time Applications
icon_mobile_dropdown
A radix-4 on-line division design and its application to networks of on-line modules
On-line division is one of the slowest operations among the basic arithmetic operations and naturally becomes a bottleneck in networks of on-line modules that use it. A higher radix divider has a good potential to attain higher throughput than radix-2 dividers and therefore improve the overall throughput of networks where division is needed. The improvement in throughput when using radix 4 is not straightforward since several components of the divider become more complex than in the radix-2 case. Previously proposed radix-4 designs were based on operand pre-scaling to simplify the selection function and reduce the critical path delay, at the cost of more complexity in the algorithm conditions and operations, plus a variable on-line delay, which is a very unattractive feature when small precision values are used (usually the case for DSP). These designs include several phases for pre-scaling and actual division. This paper proposes a design approach based on overlapped replication that results in a radix-4 on-line division module with low algorithm complexity, single division phase, less restrictions to the input values, and a small and fixed on-line delay.
High-performance finite field multiplier for cryptographic applications
In FIPS 186-2, NIST recommends several finite fields to be used in the elliptic curve digital signature algorithm (ECDSA). Of the ten recommended finite fields, five are binary extension fields with degrees ranging from 163 to 571. The performance of the underlying field operations (i.e. addition and multiplication) directly affect the performance of the ECDSA algorithm. In this paper we discuss a high performance look-up table-based VLSI architecture which performs multiplication over a given finite field. First we present the architecture in a general form which can be implemented for any finite field and corresponding reduction polynomial. Following, we discuss a prototype implementation of the multiplier for the binary extension field with degree 163. The prototype is capable of performing a finite field multiplication in .06 microseconds when implemented on a Xilinx XCV2000E FPGA.
A comparison of Dadda and Wallace multiplier delays
The two well-known fast multipliers are those presented by Wallace and Dadda. Both consist of three stages. In the first stage, the partial product matrix is formed. In the second stage, this partial product matrix is reduced to a height of two. In the final stage, these two rows are combined using a carry propagating adder. In the Wallace method, the partial products are reduced as soon as possible. In contrast, Dadda's method does the minimum reduction necessary at each level to perform the reduction in the same number of levels as required by a Wallace multiplier. It is generally assumed that, for a given size, the Wallace multiplier and the Dadda multiplier exhibit similar delay. This is because each uses the same number of pseudo adder levels to perform the partial product reduction. Although the Wallace multiplier uses a slightly smaller carry propagating adder, usually this provides no significant speed advantage. A closer examination of the delays within these two multipliers reveals this assumption to be incorrect. This paper presents a detailed analysis for several sizes of Wallace and Dadda multipliers. These results indicate that despite the presence of the larger carry propagating adder, Dadda's design yields a slightly faster multiplier.
Interconnect effects in deep submicron implementation of high performance arithmetic architectures
The conventional trend in algorithm implementation has been the reliance on advancements in process technology in order to satisfy the ever-increasing demand for high-speed processors, and computational systems. As current device technology approaches sub-100nm minimum device size, not only does the device geometry decrease, but switching times, and operating voltages also scale down. These gains come at the expense of increased layout complexity, and a greater susceptibility to parasitic effects in the interconnections. In this paper we will briefly overview the challenges that digital designers will have to face in the imminent future, and will provide suggestions on algorithmic measures which may be taken in order to overcome some of these obstacles. To illustrate our point, we will present an analysis of a digital multiplication algorithm, which is predicted to outperform current schemes, for future technologies.
Using truncated multipliers in DCT and IDCT hardware accelerators
Truncated multipliers offer significant improvements in area, delay, and power. However, little research has been done on their use in actual applications, probably due to concerns about the computational errors they introduce. This paper describes a software tool used for simulating the use of truncated multipliers in DCT and IDCT hardware accelerators. Images that have been compressed and decompressed by DCT and IDCT accelerators using truncated multipliers are presented. In accelerators based on Chen's algorithm (256 multiplies per 8 x 8 block for DCT, 224 multiplies per block for IDCT), there is no visible difference between images reconstructed using truncated multipliers with 55% of the multiplication matrix eliminated and images reconstructed using standard multipliers with the same operand lengths and intermediate precision.
Design technologies for DSP algorithm implementation on heterogeneous architectures
John McAllister, Ying Yi, Roger F. Woods, et al.
Computationally intensive digital signal processing (DSP) systems sometimes have real time requirements beyond that which programmable processor platform solutions, consisting of RISC and DSP processors, can achieve. The addition of Field Programmable Gate Array (FPGA) components to these platforms provides a configurable hardware resource where increased parallelism levels allow very large computational rates. Techniques to implement circuit architectures from signal flow graph (SFG) algorithm expression can produce highly efficient processor implementations. Applying folding transformations produces implementations where hardware resource usage is reduced at the possible expense of throughput. In this paper a new development methodology is presented which analyses the SFG algorithm to suggest appropriate folding techniques. By characterizing different folding techniques, a template circuit architecture can be created early in the design process which does not alter throughout the remainder of the implementation process. Retiming techniques applied to the algorithm SFG produces the properly timed implementation from the template. By applying this methodology, architectural exploration can be quickly and efficiently performed to generate a set of implementations (an 'implementation space’) to best meet the constraints of the system. When applied to a Normalised Lattice Filter design example, the results demonstrate high savings on FPGA resource usage, with little reduction in real time performance, demonstrating the implementation advantage of employing this methodology.
Multiplication-free architecture for Daubechies wavelet transforms using algebraic integers
Khan Wahid, Vassil Dimitrov, Graham A. Jullien
The 2-Dimensional Wavelet Transform has been proven to be a highly effective tool for image analysis and used in JPEG2000 standard. There are many publications which demonstrate that using wavelet transform in time and space, combined with a multiresolution approach, leads to an efficient and effective method of compression. In particular, the four and six coefficient Daubechies filters have excellent spatial and spectral locality, properties which make them useful in image compression. In this paper, we propose a multiplication-free and parallel VLSI architecture for Daubechies wavelets where the computations are free from round-off errors until the final reconstruction step. In our algorithm, error-free calculations are achieved by the use of Algebraic Integer encoding of the wavelet coefficients. Compared to other DWT algorithms such as: embedded zero-tree, recursive or semi-recursive and conventional fixed-point binary architecture, our technique has lower hardware cost, lower computational power and optimized data-bus utilization.
Performance of the European logarithmic microprocessor
J. Nick Coleman, C. I. Softley, Jiri Kadlec, et al.
We have developed a new microprocessor. In contrast to existing devices, which perform real arithmetic using the floating-point system, the European Logarithmic Microprocessor uses the logarithmic number system for this purpose. This paper describes the ELM device, and compares its architecture with that of a conventional floating-point digital signal processor. We then illustrate the operation of both devices using an example from a class of recursive-least-squares algorithms. The results suggest that logarithmic arithmetic may be of particular benefit in applications with less regular processing patterns, e.g. in scalar or short vector code or triangular matrix processing, or where there is a preponderance of multiplications or significant use of division or square-root operations. These criteria appear to point to the more advanced digital adaptive filtering algorithms, and also to graphics applications. Results indicate that the logarithmic number system also offers an improvement in accuracy of around one bit.