Proceedings Volume 8662

Intelligent Robots and Computer Vision XXX: Algorithms and Techniques

Juha Röning, David Casasent
cover
Proceedings Volume 8662

Intelligent Robots and Computer Vision XXX: Algorithms and Techniques

Juha Röning, David Casasent
View the digital version of this volume at SPIE Digital Libarary.

Volume Details

Date Published: 22 January 2013
Contents: 8 Sessions, 26 Papers, 0 Presentations
Conference: IS&T/SPIE Electronic Imaging 2013
Volume Number: 8662

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 8662
  • Invited Papers on Intelligent Robotics
  • Navigation and Visual Path Planning
  • Computer Vision Algorithms and Tracking
  • Image Understanding and Scene Analysis
  • Industrial Robots and Applications
  • Outdoor Ground Robotics
  • Interactive Paper Session
Front Matter: Volume 8662
icon_mobile_dropdown
Front Matter: Volume 8662
This PDF file contains the front matter associated with SPIE Proceedings Volume 8662, including the Title Page, Copyright Information, Table of Contents, and the Conference Committee listing.
Invited Papers on Intelligent Robotics
icon_mobile_dropdown
Control issues and recent solutions for voltage controlled piezoelectric elements utilizing artificial neural networks
Performing actuation in nanomanipulation at the necessary accuracy is largely possible thanks to many new piezoelectric actuation systems. Although piezoelectric actuators can provide means to perform near infinitely small displacements at extremely high resolutions, the output of the actuator motion can be quite nonlinear, especially under voltage based control modulation. In this work, we will cover some of the control issues, related especially to piezoelectric actuation in nanomanipulation tasks. We will also take a look at some of the recent improvements made possible by methods utilizing artificial neural networks for improving the generalization capability and the accuracy of piezoelectric hysteresis models used in inverse modelling and control of the solid-state voltage controlled piezoelectric actuators. We will also briefly discuss the problem areas in which the piezoelectric control method research should be especially focused on and some of the weaknesses of the existing methods. In addition, some of the common issues related to testing and result representations are discussed.
The 20th annual intelligent ground vehicle competition: building a generation of robotists
Bernard L. Theisen, Andrew Kosinski
The Intelligent Ground Vehicle Competition (IGVC) is one of four, unmanned systems, student competitions that were founded by the Association for Unmanned Vehicle Systems International (AUVSI). The IGVC is a multidisciplinary exercise in product realization that challenges college engineering student teams to integrate advanced control theory, machine vision, vehicular electronics and mobile platform fundamentals to design and build an unmanned system. Teams from around the world focus on developing a suite of dual-use technologies to equip ground vehicles of the future with intelligent driving capabilities. Over the past 20 years, the competition has challenged undergraduate, graduate and Ph.D. students with real world applications in intelligent transportation systems, the military and manufacturing automation. To date, teams from over 80 universities and colleges have participated. This paper describes some of the applications of the technologies required by this competition and discusses the educational benefits. The primary goal of the IGVC is to advance engineering education in intelligent vehicles and related technologies. The employment and professional networking opportunities created for students and industrial sponsors through a series of technical events over the four-day competition are highlighted. Finally, an assessment of the competition based on participation is presented.
Navigation and Visual Path Planning
icon_mobile_dropdown
Visual homing with a pan-tilt based stereo camera
Visual homing is a navigation method based on comparing a stored image of the goal location and the current image (current view) to determine how to navigate to the goal location. It is theorized that insects, such as ants and bees, employ visual homing methods to return to their nest. Visual homing has been applied to autonomous robot platforms using two main approaches: holistic and feature-based. Both methods aim at determining distance and direction to the goal location. Navigational algorithms using Scale Invariant Feature Transforms (SIFT) have gained great popularity in the recent years due to the robustness of the feature operator. Churchill and Vardy have developed a visual homing method using scale change information (Homing in Scale Space, HiSS) from SIFT. HiSS uses SIFT feature scale change information to determine distance between the robot and the goal location. Since the scale component is discrete with a small range of values, the result is a rough measurement with limited accuracy. We have developed a method that uses stereo data, resulting in better homing performance. Our approach utilizes a pan-tilt based stereo camera, which is used to build composite wide-field images. We use the wide-field images combined with stereo-data obtained from the stereo camera to extend the keypoint vector described in to include a new parameter, depth (z). Using this info, our algorithm determines the distance and orientation from the robot to the goal location. We compare our method with HiSS in a set of indoor trials using a Pioneer 3-AT robot equipped with a BumbleBee2 stereo camera. We evaluate the performance of both methods using a set of performance measures described in this paper.
Panoramic stereo sphere vision
Weijia Feng, Baofeng Zhang, Juha Röning, et al.
Conventional stereo vision systems have a small field of view (FOV) which limits their usefulness for certain applications. While panorama vision is able to “see” in all directions of the observation space, scene depth information is missed because of the mapping from 3D reference coordinates to 2D panoramic image. In this paper, we present an innovative vision system which builds by a special combined fish-eye lenses module, and is capable of producing 3D coordinate information from the whole global observation space and acquiring no blind area 360°×360° panoramic image simultaneously just using single vision equipment with one time static shooting. It is called Panoramic Stereo Sphere Vision (PSSV). We proposed the geometric model, mathematic model and parameters calibration method in this paper. Specifically, video surveillance, robotic autonomous navigation, virtual reality, driving assistance, multiple maneuvering target tracking, automatic mapping of environments and attitude estimation are some of the applications which will benefit from PSSV.
Loop closure detection using local Zernike moment patterns
Evangelos Sariyanidi, Onur Sencan, Hakan Temeltas
This paper introduces a novel image description technique that aims at appearance based loop closure detection for mobile robotics applications. This technique relies on the local evaluation of the Zernike Moments. Binary patterns, which are referred to as Local Zernike Moment (LZM) patterns, are extracted from images, and these binary patterns are coded using histograms. Each image is represented with a set of histograms, and loop closure is achieved by simply comparing the most recent image with the images in the past trajectory. The technique has been tested on the New College dataset, and as far as we know, it outperforms the other methods in terms of computation efficiency and loop closure precision.
Stabilization and control of quad-rotor helicopter using a smartphone device
Alok Desai, Dah-Jye Lee, Jason Moore, et al.
In recent years, autonomous, micro-unmanned aerial vehicles (micro-UAVs), or more specifically hovering micro- UAVs, have proven suitable for many promising applications such as unknown environment exploration and search and rescue operations. The early versions of UAVs had no on-board control capabilities, and were difficult for manual control from a ground station. Many UAVs now are equipped with on-board control systems that reduce the amount of control required from the ground-station operator. However, the limitations on payload, power consumption and control without human interference remain the biggest challenges. This paper proposes to use a smartphone as the sole computational device to stabilize and control a quad-rotor. The goal is to use the readily available sensors in a smartphone such as the GPS, the accelerometer, the rate-gyros, and the camera to support vision-related tasks such as flight stabilization, estimation of the height above ground, target tracking, obstacle detection, and surveillance. We use a quad-rotor platform that has been built in the Robotic Vision Lab at Brigham Young University for our development and experiments. An Android smartphone is connected through the USB port to an external hardware that has a microprocessor and circuitries to generate pulse-width modulation signals to control the brushless servomotors on the quad-rotor. The high-resolution camera on the smartphone is used to detect and track features to maintain a desired altitude level. The vision algorithms implemented include template matching, Harris feature detector, RANSAC similarity-constrained homography, and color segmentation. Other sensors are used to control yaw, pitch, and roll of the quad-rotor. This smartphone-based system is able to stabilize and control micro-UAVs and is ideal for micro-UAVs that have size, weight, and power limitations.
Computer Vision Algorithms and Tracking
icon_mobile_dropdown
Optimizing feature selection strategy for adaptive object identification in noisy environment
Sagar Pandya, Thomas Lu, Tien-Hsin Chao
We present the development of a multi-stage automatic target recognition (MS-ATR) system for computer vision in robotics. This paper discusses our work in optimizing the feature selection strategies of the MS-ATR system. Past implementations have utilized Optimum Trade-off Maximum Average Correlation Height (OT‐MACH) filtering as an initial feature selection method, and principal component analysis (PCA) as a feature extraction strategy before the classification stage. Recent work has been done in the implementation of a modified saliency algorithm as a feature selection method. Saliency is typically implemented as a “bottom-up” search process using visual sensory information such as color, intensity, and orientation to detect salient points in the imagery. It is a general saliency mapping algorithm that receives no input from the user on what is considered salient. We discuss here a modified saliency algorithm that accepts the guidance of target features in locating regions of interest (ROI). By introducing target related input parameters, saliency becomes more focused and task oriented. It is used as an initial stage for the fast ROI detection method. The ROIs are passed to the later stages for feature extraction and target identification process.
GPU-based real-time trinocular stereo vision
Yuanbin Yao, R. J. Linton, Taskin Padir
Most stereovision applications are binocular which uses information from a 2-camera array to perform stereo matching and compute the depth image. Trinocular stereovision with a 3-camera array has been proved to provide higher accuracy in stereo matching which could benefit applications like distance finding, object recognition, and detection. This paper presents a real-time stereovision algorithm implemented on a GPGPU (General-purpose graphics processing unit) using a trinocular stereovision camera array. Algorithm employs a winner-take-all method applied to perform fusion of disparities in different directions following various image processing techniques to obtain the depth information. The goal of the algorithm is to achieve real-time processing speed with the help of a GPGPU involving the use of Open Source Computer Vision Library (OpenCV) in C++ and NVidia CUDA GPGPU Solution. The results are compared in accuracy and speed to verify the improvement.
Remotely controlling of mobile robots using gesture captured by the Kinect and recognized by machine learning method
Roy CHaoming Hsu, Jhih-Wei Jian, Chih-Chuan Lin, et al.
The main purpose of this paper is to use machine learning method and Kinect and its body sensation technology to design a simple, convenient, yet effective robot remote control system. In this study, a Kinect sensor is used to capture the human body skeleton with depth information, and a gesture training and identification method is designed using the back propagation neural network to remotely command a mobile robot for certain actions via the Bluetooth. The experimental results show that the designed mobile robots remote control system can achieve, on an average, more than 96% of accurate identification of 7 types of gestures and can effectively control a real e-puck robot for the designed commands.
Image Understanding and Scene Analysis
icon_mobile_dropdown
Relating vanishing points to catadioptric camera calibration
Wenting Duan, Hui Zhang, Nigel M. Allinson
This paper presents the analysis and derivation of the geometric relation between vanishing points and camera parameters of central catadioptric camera systems. These vanishing points correspond to the three mutually orthogonal directions of 3D real world coordinate system (i.e. X, Y and Z axes). Compared to vanishing points (VPs) in the perspective projection, the advantages of VPs under central catadioptric projection are that there are normally two vanishing points for each set of parallel lines, since lines are projected to conics in the catadioptric image plane. Also, their vanishing points are usually located inside the image frame. We show that knowledge of the VPs corresponding to XYZ axes from a single image can lead to simple derivation of both intrinsic and extrinsic parameters of the central catadioptric system. This derived novel theory is demonstrated and tested on both synthetic and real data with respect to noise sensitivity.
Natural image understanding using algorithm selection and high-level feedback
Martin Lukac, Michitaka Kameyama, Kosuke Hiura
Natural Image processing and understanding encompasses hundreds or even thousands of different algorithms. Each algorithm has a certain peak performance for a particular set of input features and configurations of the objects/regions of the input image (environment). To obtain the best possible result of processing, we propose an algorithm selection approach that permits to always use the most appropriate algorithm for the given input image. This is obtained by at first selecting an algorithm based on low level features such as color intensity, histograms, spectral coefficients. The resulting high level image description is then analyzed for logical inconsistencies (contradictions) that are then used to refine the selection of the processing elements. The feedback created from the contradiction information is executed by a Bayesian Network that integrates both the features and a higher level information selection processes. The selection stops when the high level inconsistencies are all resolved or no more different algorithms can be selected.
Improving shape context using geodesic information and reflection invariance
In this paper, we identify some of the existing problems in shape context matching. We first identify the need for reflection invariance in shape context matching algorithms and propose a method to achieve the same. With the use of these reflection invariance techniques, we bring all the objects, in a database, to their canonical form, which halves the time required to match two shapes using their contexts. We then show how we can build better shape descriptors by the use of geodesic information from the shapes and hence improve upon the well-known Inner Distance Shape Context (IDSC). The IDSC is used by many pre- and post-processing algorithms as the baseline shape-matching algorithm. Our improvements to IDSC will remain compatible for use with those algorithms. Finally, we introduce new comparison metrics that can be used for the comparison of two or more algorithms. We have tested our proposals on the MPEG-7 database and show that our methods significantly outperform the IDSC.
A proposed super-fast scheme for instant-detect-instant-kill of a ground-to-air missile
When we apply the newly developed LPED (local polar edge detection) image processing method to a binary IR-image which contains a special meteorite-like streak produced by the enemy SAM, the image processing speed can be even enhanced further if another novel preprocessing scheme is used. This novel preprocessing scheme is achieved by taking the advantage of the characteristic geometry of the meteorite-like target into consideration. That is, we only take the clustered high temperature image points making the shape of a slender cylinder ended with a broom-like exhaust fume into consideration. Then we can spatial-filter or pre-extract the cylinder by its geometrical property before we apply the LPED method. This will then result in a super-fast detection, super-fast tracking and super-fast targeting on the CM (center of mass) point of the cylinder. This CM point is just the “heart” of the flying missile. Incorporating this targeting system with a high power laser gun through the use of a Wollaston prism, an air-borne instant detect-instant-kill SAM killer system may then be constructed.
Finger tracking for hand-held device interface using profile-matching stereo vision
Yung-Ping Chang, Dah-Jye Lee, Jason Moore, et al.
Hundreds of millions of people use hand-held devices frequently and control them by touching the screen with their fingers. If this method of operation is being used by people who are driving, the probability of deaths and accidents occurring substantially increases. With a non-contact control interface, people do not need to touch the screen. As a result, people will not need to pay as much attention to their phones and thus drive more safely than they would otherwise. This interface can be achieved with real-time stereovision. A novel Intensity Profile Shape-Matching Algorithm is able to obtain 3-D information from a pair of stereo images in real time. While this algorithm does have a trade-off between accuracy and processing speed, the result of this algorithm proves the accuracy is sufficient for the practical use of recognizing human poses and finger movement tracking. By choosing an interval of disparity, an object at a certain distance range can be segmented. In other words, we detect the object by its distance to the cameras. The advantage of this profile shape-matching algorithm is that detection of correspondences relies on the shape of profile and not on intensity values, which are subjected to lighting variations. Based on the resulting 3-D information, the movement of fingers in space from a specific distance can be determined. Finger location and movement can then be analyzed for non-contact control of hand-held devices.
Industrial Robots and Applications
icon_mobile_dropdown
Training industrial robots with gesture recognition techniques
In this paper we propose to use gesture recognition approaches to track a human hand in 3D space and, without the use of special clothing or markers, be able to accurately generate code for training an industrial robot to perform the same motion. The proposed hand tracking component includes three methods: a color-thresholding model, naïve Bayes analysis and Support Vector Machine (SVM) to detect the human hand. Next, it performs stereo matching on the region where the hand was detected to find relative 3D coordinates. The list of coordinates returned is expectedly noisy due to the way the human hand can alter its apparent shape while moving, the inconsistencies in human motion and detection failures in the cluttered environment. Therefore, the system analyzes the list of coordinates to determine a path for the robot to move, by smoothing the data to reduce noise and looking for significant points used to determine the path the robot will ultimately take. The proposed system was applied to pairs of videos recording the motion of a human hand in a „real‟ environment to move the end-affector of a SCARA robot along the same path as the hand of the person in the video. The correctness of the robot motion was determined by observers indicating that motion of the robot appeared to match the motion of the video.
A restrained-torque-based motion instructor: forearm flexion/extension-driving exoskeleton
Takuya Nishimura, Yoshihiko Nomura, Ryota Sakamoto
When learning complicated movements by ourselves, we encounter such problems as a self-rightness. The self-rightness results in a lack of detail and objectivity, and it may cause to miss essences and even twist the essences. Thus, we sometimes fall into the habits of doing inappropriate motions. To solve these problems or to alleviate the problems as could as possible, we have been developed mechanical man-machine human interfaces to support us learning such motions as cultural gestures and sports form. One of the promising interfaces is a wearable exoskeleton mechanical system. As of the first try, we have made a prototype of a 2-link 1-DOF rotational elbow joint interface that is applied for teaching extension-flexion operations with forearms and have found its potential abilities for teaching the initiating and continuing flection motion of the elbow.
3D recovery of human gaze in natural environments
Lucas Paletta, Katrin Santner, Gerald Fritz, et al.
The estimation of human attention has recently been addressed in the context of human robot interaction. Today, joint work spaces already exist and challenge cooperating systems to jointly focus on common objects, scenes and work niches. With the advent of Google glasses and increasingly affordable wearable eye-tracking, monitoring of human attention will soon become ubiquitous. The presented work describes for the first time a method for the estimation of human fixations in 3D environments that does not require any artificial landmarks in the field of view and enables attention mapping in 3D models. It enables full 3D recovery of the human view frustum and the gaze pointer in a previously acquired 3D model of the environment in real time. The study on the precision of this method reports a mean projection error ≈1.1 cm and a mean angle error ≈0.6° within the chosen 3D model - the precision does not go below the one of the technical instrument (≈1°). This innovative methodology will open new opportunities for joint attention studies as well as for bringing new potential into automated processing for human factors technologies.
Outdoor Ground Robotics
icon_mobile_dropdown
CANINE: a robotic mine dog
Brian A. Stancil, Jeffrey Hyams, Jordan Shelley, et al.
Neya Systems, LLC competed in the CANINE program sponsored by the U.S. Army Tank Automotive Research Development and Engineering Center (TARDEC) which culminated in a competition held at Fort Benning as part of the 2012 Robotics Rodeo. As part of this program, we developed a robot with the capability to learn and recognize the appearance of target objects, conduct an area search amid distractor objects and obstacles, and relocate the target object in the same way that Mine dogs and Sentry dogs are used within military contexts for exploration and threat detection. Neya teamed with the Robotics Institute at Carnegie Mellon University to develop vision-based solutions for probabilistic target learning and recognition. In addition, we used a Mission Planning and Management System (MPMS) to orchestrate complex search and retrieval tasks using a general set of modular autonomous services relating to robot mobility, perception and grasping.
Development of dog-like retrieving capability in a ground robot
Douglas C. MacKenzie, Rahul Ashok, James M. Rehg, et al.
This paper presents the Mobile Intelligence Team's approach to addressing the CANINE outdoor ground robot competition. The competition required developing a robot that provided retrieving capabilities similar to a dog, while operating fully autonomously in unstructured environments. The vision team consisted of Mobile Intelligence, the Georgia Institute of Technology, and Wayne State University. Important computer vision aspects of the project were the ability to quickly learn the distinguishing characteristics of novel objects, searching images for the object as the robot drove a search pattern, identifying people near the robot for safe operations, correctly identify the object among distractors, and localizing the object for retrieval. The classifier used to identify the objects will be discussed, including an analysis of its performance, and an overview of the entire system architecture presented. A discussion of the robot's performance in the competition will demonstrate the system’s successes in real-world testing.
Multidisciplinary unmanned technology teammate (MUTT)
Nenad Uzunovic, Anne Schneider, Alberto Lacaze, et al.
The U.S. Army Tank Automotive Research, Development and Engineering Center (TARDEC) held an autonomous robot competition called CANINE in June 2012. The goal of the competition was to develop innovative and natural control methods for robots. This paper describes the winning technology, including the vision system, the operator interaction, and the autonomous mobility. The rules stated only gestures or voice commands could be used for control. The robots would learn a new object at the start of each phase, find the object after it was thrown into a field, and return the object to the operator. Each of the six phases became more difficult, including clutter of the same color or shape as the object, moving and stationary obstacles, and finding the operator who moved from the starting location to a new location. The Robotic Research Team integrated techniques in computer vision, speech recognition, object manipulation, and autonomous navigation. A multi-filter computer vision solution reliably detected the objects while rejecting objects of similar color or shape, even while the robot was in motion. A speech-based interface with short commands provided close to natural communication of complicated commands from the operator to the robot. An innovative gripper design allowed for efficient object pickup. A robust autonomous mobility and navigation solution for ground robotic platforms provided fast and reliable obstacle avoidance and course navigation. The research approach focused on winning the competition while remaining cognizant and relevant to real world applications.
R-MASTIF: robotic mobile autonomous system for threat interrogation and object fetch
Aveek Das, Dinesh Thakur, James Keller, et al.
Autonomous robotic “fetch” operation, where a robot is shown a novel object and then asked to locate it in the field, re- trieve it and bring it back to the human operator, is a challenging problem that is of interest to the military. The CANINE competition presented a forum for several research teams to tackle this challenge using state of the art in robotics technol- ogy. The SRI-UPenn team fielded a modified Segway RMP 200 robot with multiple cameras and lidars. We implemented a unique computer vision based approach for textureless colored object training and detection to robustly locate previ- ously unseen objects out to 15 meters on moderately flat terrain. We integrated SRI’s state of the art Visual Odometry for GPS-denied localization on our robot platform. We also designed a unique scooping mechanism which allowed retrieval of up to basketball sized objects with a reciprocating four-bar linkage mechanism. Further, all software, including a novel target localization and exploration algorithm was developed using ROS (Robot Operating System) which is open source and well adopted by the robotics community. We present a description of the system, our key technical contributions and experimental results.
LABRADOR: a learning autonomous behavior-based robot for adaptive detection and object retrieval
Brian Yamauchi, Mark Moseley, Jonathan Brookshire
As part of the TARDEC-funded CANINE (Cooperative Autonomous Navigation in a Networked Environment) Program, iRobot developed LABRADOR (Learning Autonomous Behavior-based Robot for Adaptive Detection and Object Retrieval). LABRADOR was based on the rugged, man-portable, iRobot PackBot unmanned ground vehicle (UGV) equipped with an explosives ordnance disposal (EOD) manipulator arm and a custom gripper. For LABRADOR, we developed a vision-based object learning and recognition system that combined a TLD (track-learn-detect) filter based on object shape features with a color-histogram-based object detector. Our vision system was able to learn in real-time to recognize objects presented to the robot. We also implemented a waypoint navigation system based on fused GPS, IMU (inertial measurement unit), and odometry data. We used this navigation capability to implement autonomous behaviors capable of searching a specified area using a variety of robust coverage strategies – including outward spiral, random bounce, random waypoint, and perimeter following behaviors. While the full system was not integrated in time to compete in the CANINE competition event, we developed useful perception, navigation, and behavior capabilities that may be applied to future autonomous robot systems.
Interactive Paper Session
icon_mobile_dropdown
Method and application of active visual tracking based on illumination invariants
Jie Su, Gui-sheng Yin, Lei Wang, et al.
With the development of the application of visual tracking technology, the performance of visual tracking algorithm is important. Due to many kinds of voice, robust of tracking algorithm is bad. To improve identification rate and track rate for quickly moving target, expand tracking scope and lower sensitivity to illumination varying, an active visual tracking system based on illumination invariants is proposed. Camera motion pre-control method based on particle filter pre-location is used to improve activity and accuracy of track for quickly moving target by forecasting target position and control camera joints of Tilt, Pan and zoom. Pre-location method using particle sample filter according to illumination invariants of target is used to reduce the affect of varying illumination during tracking moving target and to improve algorithm robust. Experiments in intelligent space show that the robust to illumination vary is improved and the accuracy is improved by actively adjust PTZ parameters.
Supervised linear dimensionality reduction with robust margins for object recognition
F. Dornaika, A. Assoum
Linear Dimensionality Reduction (LDR) techniques have been increasingly important in computer vision and pattern recognition since they permit a relatively simple mapping of data onto a lower dimensional subspace, leading to simple and computationally efficient classification strategies. Recently, many linear discriminant methods have been developed in order to reduce the dimensionality of visual data and to enhance the discrimination between different groups or classes. Many existing linear embedding techniques relied on the use of local margins in order to get a good discrimination performance. However, dealing with outliers and within-class diversity has not been addressed by margin-based embedding method. In this paper, we explored the use of different margin-based linear embedding methods. More precisely, we propose to use the concepts of Median miss and Median hit for building robust margin-based criteria. Based on such margins, we seek the projection directions (linear embedding) such that the sum of local margins is maximized. Our proposed approach has been applied to the problem of appearance-based face recognition. Experiments performed on four public face databases show that the proposed approach can give better generalization performance than the classic Average Neighborhood Margin Maximization (ANMM). Moreover, thanks to the use of robust margins, the proposed method down-grades gracefully when label outliers contaminate the training data set. In particular, we show that the concept of Median hit was crucial in order to get robust performance in the presence of outliers.
Using a multi-port architecture of neural-net associative memory based on the equivalency paradigm for parallel cluster image analysis and self-learning
Vladimir G. Krasilenko, Alexander A. Lazarev, Sveta K. Grabovlyak, et al.
We consider equivalency models, including matrix-matrix and matrix-tensor and with the dual adaptive-weighted correlation, multi-port neural-net auto-associative and hetero-associative memory (MP NN AAM and HAP), which are equivalency paradigm and the theoretical basis of our work. We make a brief overview of the possible implementations of the MP NN AAM and of their architectures proposed and investigated earlier by us. The main base unit of such architectures is a matrix-matrix or matrix-tensor equivalentor. We show that the MP NN AAM based on the equivalency paradigm and optoelectronic architectures with space-time integration and parallel-serial 2D images processing have advantages such as increased memory capacity (more than ten times of the number of neurons!), high performance in different modes (1010 – 1012 connections per second!) And the ability to process, store and associatively recognize highly correlated images. Next, we show that with minor modifications, such MP NN AAM can be successfully used for highperformance parallel clustering processing of images. We show simulation results of using these modifications for clustering and learning models and algorithms for cluster analysis of specific images and divide them into categories of the array. Show example of a cluster division of 32 images (40x32 pixels) letters and graphics for 12 clusters with simultaneous formation of the output-weighted space allocated images for each cluster. We discuss algorithms for learning and self-learning in such structures and their comparative evaluations based on Mathcad simulations are made. It is shown that, unlike the traditional Kohonen self-organizing maps, time of learning in the proposed structures of multi-port neuronet classifier/clusterizer (MP NN C) on the basis of equivalency paradigm, due to their multi-port, decreases by orders and can be, in some cases, just a few epochs. Estimates show that in the test clustering of 32 1280- element images into 12 groups, the formation of neural connections of the matrix with dimension of 128x120 elements occurs to tens of iterative steps (some epochs), and for a set of learning patterns consisting of 32 such images, and at time of processing of 1-10 microseconds, the total learning time does not exceed a few milliseconds. We offer criteria for the quality evaluation of patterns clustering with such MP NN AAM.