Proceedings Volume 0341

Real-Time Signal Processing V

Joel Trimble
cover
Proceedings Volume 0341

Real-Time Signal Processing V

Joel Trimble
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 28 December 1982
Contents: 1 Sessions, 43 Papers, 0 Presentations
Conference: 1982 Technical Symposium East 1982
Volume Number: 0341

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • All Papers
All Papers
icon_mobile_dropdown
Progress On A Systolic Processor Implementation
J. J. Symanski
Parallel algorithms using systolic and wavefront processors have been proposed for a number of matrix operations important for signal processing; namely, matrix-vector multiplication, matrix multiplication/addition, linear equation solution, least squares solution via orthogonal triangular factorization, and singular value decomposition. In principle, such systolic and wavefront processors should greatly facilitate the application of VLSI/VHSIC technology to real-time signal processing by providing modular parallelism and regularity of design while requiring only local interconnects and simple timing. In order to validate proposed architectures and algorithms, a two-dimensional systolic array testbed has been designed and fabricated. The array has programmable processing elements, is dynamically reconfigurable, and will perform 16-bit and 32-bit integer and 32-bit floating point computations. The array will be used to test and evaluate algorithms and data paths for future implementation in VLSI/VHSIC technology. This paper gives a brief system overview, a description of the array hardware, and an explanation of control and data paths in the array. The software system and a matrix multiplication operation are also presented.
Configurable,Highly Parallel (CHiP) Approach For Signal Processing Applications
Lawrence Snyder
Between the conception of a real time signal processor and its functional, VLSI realization there is an enormous amount of effort devoted to designing, revising, optimizing and testing. Since the process is cumulative -- later work builds on previous work -- and since the activity becomes progressively more detailed, more constrained and more exacting, it follows that the global design parameters should be fully explored. Global design decisions, when correct, can have a greater effect on performance than many local optimiza-tions. When the decisions are wrong, they can cause continual difficulty. Accordingly, we propose a design methodology based on the Configurable, Highly Parallel (CHiP) architecture family1 that focuses on exploring global design parameters and is especially well suited to the VLSI implementation of signal processing systems.
Integrating High-Performance Special Purpose Devices Into A System
H. T. Kung, S. Q. Yu
An emerging belief among many researchers is that a significant portion of the next generation of high performance computers will be based on architectures capable of exploiting very large scale integration (VLSI) modules. In particular, it is desirable to have a compact system that can be plugged in with interchangeable high performance modules to fit various application requirements. The system can be an efficient signal processor when special purpose signal processing modules are used; it can also be an efficient database machine when the modules are replaced with data processing modules. This paper discusses some of the issues in the design of such a system, and describes the framework of a system that is being developed at CMU.
Numerical Considerations In Integrated Functions
Noble R. Powell
The harmonic analysis of binary approximations to trigonometric reference functions employed in digital signal processing functions is described. An analytic representation of the decomposition of sine or cosine functions into elementary rational binary functions is suggested which permits a direct solution to the problem of calculating the harmonics of the error of approximation. For applications, such as ROM based frequency-syn-thesizers, angle-encoders, and discrete Fourier transformations, the approach illustrated provides a convenient method of relating the precision of functional implementation of microelectronically integrated functions to the harmonic content of the functional error.
Systolic Arrays For Eigenvalue Computation
Robert Schreiber
A machine architecture for computing the eigenvalues and eigenvectors of an Hermitian matrix is presented. Two systolic arrays are used, one for reducing full matrices to band matrices, the second for performing QR iteration on band matrices. A one-parameter family of systems, parameterized by the bandwidth of the reduced matrix, is available. This allows a tradeoff of processors for execution time.
Systolic Array Computation Of The Singular Value Decomposition
Alan M. Finn, Franklin T. Luk, Christopher Pottle
Linear time computation of the singular value decomposition (SVD) would be useful in many real time signal processing applications. Two algorithms for the SVD have been developed for implementation on a quadratic array of processors. A specific architecture is proposed and we demonstrate the mapping of the algorithms to the architecture. The algorithms and architecture together have been verified by functional level and register transfer level simulation.
Synchronizing Large Systolic Arrays
Allan L. Fisher, H. T. Kung
Parallel computing structures consist of many processors operating simultaneously. If a concurrent structure is regular, as in the case of a systolic array. it may be convenient to think of all processors as operating in lock step. This synchronized view, for example, often makes the definition of the structure and its correctness relatively easy to follow. However, large, totally synchronized systems controlled by central clocks are difficult to implement because of the inevitable problem of clock skews and delays. An alternative means of enforcing necessary synchronization is the use of self-timed, asynchronous schemes, at the cost of increased design complexity and hardware cost. Realizing that different circumstances call for different synchronization methods, this paper provides a spectrum of synchronization models; based on the assumptions made for each model, theoretical lower bounds on clock skew are derived, and appropriate or best-possible synchronization schemes for systolic arrays are proposed. In general, this paper represents a first step towards a systematic study of synchronization problems for large systolic arrays. One set of models is based on assumptions that allow the use of a pipelined clocking scheme, where more than one clock event is propagated at a time. In this case, it is shown that even assuming that physical variations along clock lines can produce skews between wires of the same length, any one-dimensional systolic array can be correctly synchronized by a global pipelined clock while enjoying desirable properties such as modularity, expandability and robustness in the synchronization scheme. This result cannot be extended to two-dimensional arrays, however--the paper shows that under this assumption, it is impossible to run a clock such that the maximum clock skew between two communicating cells will be bounded by a constant as systems grow. For such cases or where pipelined clocking is unworkable, a synchronization scheme incorporating both clocked and "asynchronous" elements is proposed.
Synchronous Versus Asynchronous Computation In Very Large Scale Integrated (VLSI) Array Processors
S. Y. Kung, R. J. Gal-Ezer
This paper compares timing and other aspects of a synchronous and asynchronous square array of processing elements, fabricated by means of VLSI technology. Timing models are developed for interprocessor communications and data transfer for both cases. The synchronous timing model emphasizes the clock skew phenomenon, and enables derivation of the dependence of the global clock period on the size of the array. This 0(N**3) dependence, along with the limited flexiblity with regards to programmability and extendability, call for a serious consideration of the asynchronous configuration. A self timed (asynchronous) model, based on the concept of wavefront oriented propagation of computation, is presented as an attractive alternative to the synchronous scheme. Some potential hazards, unique to the asynchronous model presented, and their solutions are also noted.
Novel Multibit Convolver/Correlator Chip Design Based On Systolic Array Principles
J. G. McWhirter, J. V. McCanny, K. W. Wood
A novel multi-bit convolver/correlator circuit is described. The circuit has been designed to operate as a systolic array of simple one bit processor and memory cells and, as a result it can operate at relatively high data rates by making efficient use of silicon area. Since the design is extremely regular in nature and requires very little control it should be easy to implement in VLSI technology. The size of circuit which can be fabri-cated and the data rate which can be achieved will of course depend on the specific tech-nology which is chosen.
Incoherent Electro-Optical Processing: A Tutorial
Dennis D. McCrady
A tutorial review of incoherent electro-optical processing (EOP), a relatively new approach to compact, high-speed signal processing is presented. The tutorial begins with a brief explanation of the differences between coherent and incoherent optical processors. Different EOP architectures will be covered next with emphasis placed on temporal scanning processors which have the capability of performing real-time, linear transformations on time varying inputs. Systems of this type are directly applicable to many signal processing problems in radar or sonar. The presentation is concluded with a discussion of the state-of-the-art temporal scanning EOP which includes a light emitting diode and imaging charge coupled device as its main components.
New Acousto-Optic Devices For Fourier Transformation
John N. Lee, Shih-Chun Lin, A. B. Tveten
Acousto-optical computation of continuous and discrete Fourier transforms have been performed using a time-integrating architecture. Time-integration in conjunction with diode lasers and bulk optics can be used to produce inherently compact optical systems, and several compact processors have been demonstrated. Performance parameters and tradeoffs have been analyzed for these processors, and present device limitations identified. Additional new concepts for miniaturization, including application of integrated optics, and multichannel operation are discussed.
Robust Adaptive Thresholder For Document Scanning Applications
To R. Hsing
In document scanning applications, thresholding is used to obtain binary data from a scanner. However, due to: (1) a wide range of different color backgrounds; (2) density variations of printed text information; and (3) the shading effect caused by the optical systems, the use of adaptive thresholding to enhance the useful information is highly desired. This paper describes a new robust adaptive thresholder for obtaining valid binary images. It is basically a memory type algorithm which can dynamically update the black and white reference level to optimize a local adaptive threshold function. The results of high image quality from different types of simulate test patterns can be obtained by this algorithm. The software algorithm is described and experiment results are present to describe the procedures. Results also show that the techniques described here can be used for real-time signal processing in the varied applications.
Pyramidal Normalization Filter: Visual Model With Applications To Image Understanding
P. S. Schenker, D. R. Unangst, T. F. Knaak, et al.
This paper introduces a new nonlinear filter model which has applications in low-level machine vision. We show that this model, which we designate the normalization filter, is the basis for non-directional, multiple spatial frequency channel resolved detection of image edge structure. We show that the results obtained in this procedure are in close correspondence to the zero-crossing sets of the Marr-Hildreth edge detector.6 By comparison to their model, ours has the additional feature of constant-contrast thresholding, viz., it is spatially brightness adaptive. We describe a highly efficient and flexible realization of the normalization filter based on Burt's algorithm for pyramidal filtering.18 We present illustrative experimental results that we have obtained with a computer implementation of this filter design.
Differential Based Gradient Contour Segmentation Algorithm
John F. Gilmore
A new technique of image segmentation using pixel differences is discussed. Current IR imaging seekers tend to be noisy and lead to noise-generated clutter. Due to its design, the magnitude contrast segmenter reduces the amount of clutter generated during segmentation while consistently generating objects of interest. The basic steps in the algorithm are magnitude difference, contrast evaluation and edge degapping. The edges generated form a closed boundary without using the iterative processing required by other segmenters. The algorithm also segments the high and low intensity areas of an object into one region and identifies the internal structure separating each. Intermediate results are presented in order to document each step in the algorithm. The final result is a clutter reduced, segmented image of well defined regions. A diverse set of images is presented to demonstrate the effectiveness of this algorithm in handling contrastingly different images.
Partitioning And Tearing Applied To Cellular Array Processing
James Fawcett
Cellular arrays are regular structures of computing elements with fixed and simple modes of communication and control. they exhibit both parallel computation and pipelined data flow to achieve high performance for the execution of regular algebraic operations, as in matrix multiplication and solution of simultaneous linear equations. This paper is concerned with the use of partitioning and tearing algorithms to deal with problems which are not matched to the array size or have certain irregularities in structure. Lack of regularity may arise from a sparse model formulation or from irregularity in data flow, caused by pivoting failure during elimination. We provide specific algorithms for stable solution of partitioned linear equations, without conventional pivoting, and briefly discuss their use in efficiently handling sparse equation models.
Cytocomputer Signal Processing Concepts For Industrial Technology
Peter W. VanAtta, Patrick F. Leonard, David L. McCubbrey
A set of logical and algebraic image operations is introduced. These image operators include operators for binary, silhouette, and grey-scale imagery. Simple illustrations of these image operators are given. Examples of industrial inspection and robotic visual feedback problems are considered. Results of object identification and defect locations obtained using these image operations are presented.
High Speed Programmable Filtering Using Fast Fourier Transform (FFT) Processing
Norman F. Krasner
Fast Fourier transform methods are attractive techniques for implementing high speed programmable filters, especially when flexibility, accuracy, and sharp filter transition regions are important considerations. In this paper are some design considerations in the implementation of such filters. A new high capacity, high speed FFT architecture is presented which is being incorporated in a nearly completed development model for a flexible series of FFT processors. It is shown how this processor may be efficiently embedded in an overall programmable filtering structure.
Adaptive Filtering With Correlation Cancellation Loops
Joanne F. Rhodes, Douglas E. Brown
Two optical adaptive filter designs using correlation cancellation loops are described. The experimental results and performance evaluation of a time-domain implementation are presented. They are encouraging, although limited by equipment. A frequency-domain architecture is described, and its projected performance is compared to that of the time-domain implementation.
Real Time Space Integrating Optical Ambiguity Processor
Jonathan D. Cohen
An optical ambiguity processor is described which allows realtime processing of wideband signals. One-dimensional acousto-optic cells are used as input transducers and no light-to-light modulators are required. Performance is predicted and measured. The material in this paper is distilled from the author's master's thesis of spring, 19801.
Spread Spectrum Product Code Processing Using The Triple Product Processor
David Casasent, Giora Silbershatz
A new application of the triple product processor in the self-synchronization of long spread spectrum codes with high processing gain and a large range delay search is described. The technique presented is useful for fine ranging, message decoding and C3I applications.
High Speed Charge-Coupled Device (CCD) Two-Dimensional Correlator
B. E. Burke, A. M. Chiang, W. H. McGonagle, et al.
A CCD-based two-dimensional correlator system is described which correlates a 256 x 256 image with a 32 x 32 reference in less than 1 second. The high computation rate (more than 100 million operations per second) is achieved using two high speed CCDs: a 32-stage programmable transversal filter (PTF) which correlates an analog signal with a set of 32 6-bit tap weights at a 5 MHz rate, and an accumulating memory which sums successive correlation records from the PTF. This system uses a technique which performs a series of one-dimensional correlations in the PTF and sums them in the accumulator to form the two-dimensional correlation. This approach is capable of considerable flexibility and can be extended to correlations of much larger image and reference sizes even with a transversal filter of limited length, and also to correlations of data of more than two dimensions.
Frequency Division Multiplexing Optical Processors
Jonathan D. Cohen
This paper describes a method by which optical time- and space-integrating processors may be generalized and expanded by accommodating vector inputs. Many architectures using this frequency division multiplexing approach are presented, demonstrating new operations and generalizations of old ones.
Linear Acousto-Optic Filters For Programmable And Adaptive Filtering
Jerry L. Erickson
A class of broadband, linear acousto-optic filters whose frequency response can be programmed or adapted directly in the frequency domain is described. A practical, compact architecture for these broadband linear filters is presented. Experimental measurements of the dynamic range and filter response functions are presented and compared with theory. Applications for interference excision, wideband recording, wideband digitizing, fast frequency synthesis, and fast tracking superhet receivers are outlined.
Adaptive Phased Array Radar Processing Using An Optical Matrix-Vector Processor
David Casasent, Mark Carlotto
An iterative optical matrix-vector processor is described and its use in computing the adaptive weights for a multi-dimensional phased array radar is detailed. Experimental data is provided and the accuracy of the processor is discussed. A systolic version of the processor requiring only a 1-D acousto-optic transducer is also described.
Two-Dimensional Magneto-Optic Spatial Light Modulator For Signal Processing
William E. Ross, Demetri Psaltis, Robert H . Anderson
A 2D Magneto-optic device with high speed non-volatile random access capability is described in this paper. Drive requirements and structure are compatible with LSI technology. Pixel switching is electromagnetic, non-thermal, random access addressed by current pulses in crossed conductors deposited on the garnet. The perfection of the solid state crystal structure provides the potential for very high quality, high resolution, optical characteristics. The device has been named LIGHT-MOD™. This stands for Litton Iron Garnet H Triggered Ilagneto Optic Device.
Problems In Two Dimensions
Peter S. Guilfoyle
This paper describes the problems encountered when one attempts to architect a two dimensional acousto-optical signal processing system using conventional Bragg cell devices. Specifically addressed will be the solution to the problem by utilizing degenerate and/or tangential mode birefringent A.O. devices. This novel configuration will demonstrate a highly efficient hybrid time/space integrating two-dimensional processor for range/doppler ambiguity function calculation.
Evaluation Of Spatial Filters For Background Suppression In Infrared Mosaic Sensor Systems
T. L. Bergen, P. K. Mazaika
Spaceborne infrared mosaic sensors have been proposed for future surveillance systems. Because these systems will generate a large volume of data, background suppression will require algorithms which use innovative architectures and minimal storage. This paper analyzes the implementation and performance of candidate temporal and spatial filters. Spatial filters are attractive because they require far less memory, can effectively exploit a parallel, pipelined architecture, and are relatively insensitive to target speed. However, the performance of spatial filtering is substantially worse than that of temporal filtering when the sensor has good line-of-sight stability.
Pattern Nonuniformities Of Imaging Charge Injection Device Arrays Due To Distributed Resistance Capacitance Effects
C.H. Chen, D. N. Pocock
A potential source of response nonuniformities of imaging CID's is attributed to the transient behavior of the distributed resistance capacitance (DRC) network associated with the row and column electrodes. The DRC network is evidenced and characterized by the measurement of frequency-dependent admittance. Simulated response nonuniformities are compared favorably to experimentally observed patterns of an InSb CID array. The characteristics and their imposed limitations on the array size, pixel density, and readout rate are discussed.
Evaluation Method For Infrared Focal Plane Arrays With Metal Insulator Semiconductor (MIS) Structure
Yoshihiro Miyamoto, Tohru Maekawa, Toshiro Yamamoto, et al.
This paper describes a new method of surface potential measurement for MIS infrared focal plane arrays. The key feature of this method is a charge sensitive amplifier which detects the surface potential directly. The surface potential is subject to photo-generated charge carriers stored in a potential well as well as the gate voltage. Therefore, this measurement can be used for both electronic and optical characterization of an MIS infrared imager such as an infrared charge coupled device (IRCCD) or an infrared charge injection device (IRCCD). Mercury cadmium telluride (HgCdTe) IRCIDs with 3 x 5 pixels were evaluated using this technique. The measurement was controlled by HP System 35 and proved more accurate, informative, and speedy than the conventional capacitance-voltage (C-V) measurement.
Very High Speed Integrated Circuit (VHSIC) Chip Sets And Brassboards
Donald W. Burlage
The chip sets and brassboards being developed in Phase I of the VHSIC Program are described.
Relative Performance Of Very High Speed Integrated Circuit (VHSIC) Chip Sets For Selected Signal Processing Functions
James D. Marr
The Very High Speed Integrated Circuit (VHSIC) program was started in 1978 to produce high performance signal processing chips for military systems. Contracts have been let for Phase I for six chip sets. Modules using these chips are also being developed under this phase. Six of the funded modules and two hypothetical modules are examined for nine signal processing tasks.
Concurrent Very Large Scale Integration (VLSI) Structure For Digital Signal Processing
Ralph K. Cavin III, Noel R. Strader II
A highly concurrent architecture for a high-speed digital signal processing engine is described. Initially, a repetitive structure of identical processing elements which realizes a one-dimensional state variable filter is defined. This structure uses primarily local communication with topologically simple interconnections among the processing elements. It is then shown that these relatively simple processing engines can be combined, again in a regular manner, to solve two-dimensional filter problems. Application of this basic filter structure to provide high-speed solutions for other standard signal processing transforms such as the Discrete Fourier Transform , the Chirp Z Transform, and the Running Discrete Discrete Transform is also presented.
Signal Processor Architecture Performance Assessment For Very High Speed Integrated Circuits (VHSIC)
R. W. Priester
This paper discusses the problem of digital signal processor architecture performance evaluation. This aspect of signal processor technology, while not totally ignored in the past, has not received the explicit consideration which it deserves. If effective use of available resources is to be achieved, future implementation of complex VHSIC/VLSI-based systems probably will require increased consideration of the architecture performance assessment problem. Three broad approaches to this problem are discussed and a brief example of each is presented. Of the approaches considered, it appears at present that techniques based upon computer-aided simulation and/or analyses of signal processor models represent the most promising approach to this problem. Quantitative figures of merit for use in evaluating/comparing signal processor performance are presented and briefly discussed. These are typically defined in terms of a limited number of selected system parameters which might be of concern in a given application. Several techniques that are of interest to the architecture performance assessment problem are briefly reviewed and discussed.
Performance Analysis Of Systolic Array Architectures
J. A. Bannister, J. B. Clary
In this paper we briefly describe the systolic array architecture. We discuss performance issues that arise in the evaluation of systolic array architectures. We review the fundamental concepts of Petri nets and consider their suitability as a tool for the modeling and analysis of systolic array architectures. We review known results concerning the use of timed decision-free Petri nets for performance evaluation of computing systems. We propose a new class of Petri nets (called coherent safety nets) that appear to be useful for performance evaluation of pipelined signal processing architectures. These techniques are applied to systolic array architectures.
Computer Networking In The Context Of Very Large Scale Integration (VLSI)
Earl E. Swartzlander Jr.
As requirements for computing increase in magnitude, multiple processor networks have become increasingly important. The demand is a direct outgrowth of the success of VLSI in providing high levels of computation at a modest price. Future levels of performance may well be most effectively met with computer networks where the elemental processes are single VLSI circuits. This paper surveys several common computer networking approaches and presents a novel concept, the Gatlinburg Rings. It is shown to be attractive for large networks.
Real Time Computer Network For War Games
B. Ayres, J. Cotten, A. Hafen
The U.S. Army operates a field laboratory where realistic combat simulations between jet aircraft, helicopters, tanks, and infantry can be closely observed. Lasers are used to simulate the weapons carried by as many as 200 players. Laser firings, hits, and player location are monitored by a telemetry and range measurement system controlled by a computer network. Player combat engagements are evaluated in real-time by the computer network and the results returned to the player. The computer network primarily consists of 12 PDP-11/45s and a DEC-1060. The PDP-11/45s operate under RSX-llM and RSX-11S. Each PDP 11/45 processor communicates with the other processors through a 32K shared memory. Application software includes telemetry polling and control, player position calculation, real-time casualty assessment, and various monitors and displays. The major focus of this paper is the development of a successful high speed, general purpose interprocessor/intertask data communication system, operating within the shared memories, which facilitates concurrent processing of data with minimal overhead.
Fault Tolerant Distributive Processing
Harris Quesnell
A fault tolerant design used to enhanced the survivability of a distributive processing system is described. Based on physical limitations, mission duration and maintenance support, the approach has emphasized functional redundancy in place of the traditional hardware or software level redundancy. A top down architecture within the system's hierarchy allows sharing of common resources. Various techniques used to enhance the survivability of the hardware at the equipment, module and component level were analyzed. The intent of the on going work is to demonstrate the ability of a distributive processing system to maintain itself for a long period of time.
Image Processing Computer Using Three-Dimensional Cellular Logic Architecture
Allen Klinger, Kendall Preston
Specialized logic to accomplish image processing has been available since the early 1960's. Systems like CELLSCAN, GLOPR, and diff3 have the capability to perform, (respectively) at least 103, 105 and 107 picture point or pixel operations per second using a local organization of inputs to each gate. This kind of image processing system has been known as cellular logic. The term goes back to the early days of computers through the work of von Neumann [1,2] and Moore [3] on automata; a recent survey paper co-authored by one of us [4] discusses cellular logic and applications in medical image processing. Neighborhood processing is a similar term used to describe a system with pipelining added to conserve the number of gates needed; see [5].
Distributed Data Flow Signal Processors
Jay A. Eggert
Near term advances in technology such as VHSIC promise revolutionary progress in programmable signal processor capabilities. However, meeting projected signal processing requirements for radar, sonar and other high throughput systems requires effective multi-processor networks. This paper describes a distributed signal processor architecture currently in development at Texas Instruments that is designed to meet these high through-put, multi-mode system requirements. The approach supports multiple, functionally spe-cialized, autonomous nodes (processors) interconnected via a flexible, high speed communication network. A common task scheduling mechanism based upon "data flow" concepts provides an efficient high level programming and simulation mechanism. The Ada syntax compatible task level programming and simulation software support tools are also described.
Advanced Flexible Processor�A Multiprocessor Computing System
Bruce Colton
There is a realization that the future of high speed computing will see many great changes, both in the hardware systems to which people are presently accustomed, as well as in the software and algorithmic approaches currently employed. The 1980's will present the utmost challenges in meeting the burgeoning computational demands of high speed computer users. Recent advancements made in the field of multiprocessing offer strong promise in meeting these expanding requirements. The Advanced Flexible Processor represents a quite mature implementation of one of those "new" multiprocessor systems finding major application to signal processing and data handling problems.
S-1 Multiprocessor System
J. M. Broughton, P. M. Farmwald, T. M. McWilliams
This paper describes the S-1 multiprocessor system. It is composed of 16 supercomputer class uniprocessors with local caches, an extremely large, medium latency shared memory, and a low latency synchronization bus for passing short messages. The system is applicable to a wide variety of applications, including large-scale physical simulation, real-time command and control, and program development in a time-sharing environment. The hardware organization, its implications, and software supporting the efficient utilization of the multiprocessor are discussed.
Controllable Multiple-Instruction, Multiple-Data Stream (MIMD) Architecture
Stephen F. Lundstrom
A MIMD architecture suitable for the Flow Model Processor (FMP) of the Numerical Aerodynamic Simulator (NAS) Processin System (NPS) has been described to NASA, a result of extensive studies and evaluations. 1,2 The FMP architecture is targeted to support a throughput in excess of 1000 Mflop/sec over a range of applications. This paper summarizes the architecture and describes the strategies adopted for making this many-processor multiprocessor controllable and efficient. The key language components which allow efficient control of the system by the user, while at the same time supporting straight-forward application definition, will be described.
Software Allocation For Distributed Signal Processors
K. W. Doty, P. L. McEntire, J. G. O'Reilly, et al.
This paper presents a mathematical technique for the allocation of software tasks to the elements of a distributed signal processor. The method exploits the regularity and periodicity of signal processing software to yield optimal task-to-processor allocations. Although the method permits the use of various performance criteria, emphasis is placed upon minimizing the time of completion, or "cycle time," of the signal processing software. The problem formulation includes inter-task communication, task memory requirements, and time constraints on task execution. Finally, the paper presents and analyzes a realistic software allocation problem.