Proceedings Volume 10993

Mobile Multimedia/Image Processing, Security, and Applications 2019

cover
Proceedings Volume 10993

Mobile Multimedia/Image Processing, Security, and Applications 2019

Purchase the printed version of this volume at proceedings.com or access the digital version at SPIE Digital Library.

Volume Details

Date Published: 22 August 2019
Contents: 6 Sessions, 23 Papers, 10 Presentations
Conference: SPIE Defense + Commercial Sensing 2019
Volume Number: 10993

Table of Contents

icon_mobile_dropdown

Table of Contents

All links to SPIE Proceedings will open in the SPIE Digital Library. external link icon
View Session icon_mobile_dropdown
  • Front Matter: Volume 10993
  • Innovative Image Processing Techniques
  • Image Analysis Techniques
  • Multimedia Algorithms and Systems
  • Image Security, Authentication and Digital Forensics
  • Poster Session
Front Matter: Volume 10993
icon_mobile_dropdown
Front Matter: Volume 10993
This PDF file contains the front matter associated with SPIE Proceedings Volume 10993, including the title page, copyright information, table of contents, and author and conference committee lists.
Innovative Image Processing Techniques
icon_mobile_dropdown
Non-linear contrast stretching with optimizations
The primary objective of enhancement is to improve the contrast an image, that the outcome image is more appropriate than the original image for the given application. One of the simplest, computationally effective and most used empirical algorithms that may improve overall contrast is the class (linear stretching and non-linear stretching) of stretching methods. However, linear and non-linear stretching suffer from several issues, for instance, a low-contrast effect by organizing intensities or an over-brightness effect by super-imposing intensities. The goal of this paper is to present new techniques for image contrast enhancement: (1) a bi-non-linear contrast-stretching algorithm, (2) the optimized combination of linear contrast and non-linear contrast stretching algorithms, and (3) the optimized combination of a linear contrast, a non-linear contrast stretching and a local histogram equalization algorithm. Computer simulations on publicly available Thermal Focus Image Database and the Tufts Face Database show that the proposed methods increase the dynamic image range and demonstrate a significantly improved global and local contrast by taking the most exquisite details and edges. In addition, the simulation results show that the proposed method well correlates with subjective evaluations of image quality. The presented concept is useful in guiding the future design of cutting-edge image enhancement methods.
Quaternion-based local and global color image enhancement algorithm
V. Voronin, E. Semenishchev, A. Zelensky, et al.
Many images like medical images, satellite images, and real-life photographs may suffer poor contrast degradation. Image enhancement is the image processing of improving the quality that the results are more suitable for display or further image analysis. In this paper, we present a Hamiltonian quaternion framework-based model for color image enhancement. The basic idea is to apply α-rooting image enhancement approach for different image blocks. For this purpose, we split image in moving windows on disjoint blocks. The resulting image is a weighted mean of all processing blocks. The weights for every local and global enhanced image driven through optimization using a measure of enhancement (EME) as a cost function. We demonstrate our color image enhancement scheme and compare it with other state-of-the-art methods. The extensive computer simulations indicate that the proposed method outperforms the commonly used methods in both visual image quality and statistical information.
Image Analysis Techniques
icon_mobile_dropdown
Spectrally-shaped correlation with application to image registration
Stephen DelMarco, Helen Webb, Victor Tom
Image correlation has proven useful for image filtering, matching, pattern recognition, and image registration over many decades. The two classical correlation forms, amplitude and phase correlation, display different properties. Amplitude correlation often provides a low, broad peak in the correlation domain. The broadness of the peak provides robustness to matching imagery exhibiting non-translational geometric offsets, such as rotation or scale differences. By contrast, phase correlation tends to provide a high, narrow peak. The high peak signifies high matching confidence while the narrow peak width provides accurate shift localization. However, the phase correlation peak degrades rapidly when matching against images with non-translational geometric offset. To provide tradeoffs between properties of these traditional correlation forms, in this paper we present a general, flexible form of correlation called Spectrally-Shaped Correlation (SSC). SSC provides control over the Fourier domain normalization of the correlation components. We apply SSC to the problem of image registration. We show how SSC contains the classical amplitude, phase, and phase-only correlation forms as special cases. First, we present the general theory of Fourier transforms for multi-channel imagery, modeled as hypercomplex-valued imagery. We present mathematical details of the transform techniques and develop the SSC approach. We then present numerical results demonstrating registration of real image data, acquired from a UAV operating in an urban environment, to reference imagery. We demonstrate a performance improvement of the SSC over the classical forms of correlation.
Automatic spatial accuracy estimation for correlation-based image registration
Stephen DelMarco, Helen Webb, Victor Tom
Accurate and successful image registration is a key enabling technology in applications such as image fusion, matching and pattern recognition. Knowledge of registration solution quality and accuracy can help prevent an inaccurate registration from degrading or corrupting performance of downstream image processing applications. However, estimating the spatial accuracy of image registration solutions can be difficult in the absence of ground-truth information on feature content or fiducial marker correspondences. This paper presents an automated spatial registration accuracy measurement for estimating and quantifying the spatial accuracy of correlation-based image registration in the absence of ground-truth information. For correlation surfaces exhibiting a single dominant peak, the approach consists of fitting an appropriate region of the correlation surface, about the peak coefficient, with a two-dimensional Gaussian. It then uses the covariance of the Gaussian to model the registration spatial error covariance. Use of a fitted Gaussian provides an intuitive probabilistic interpretation to the registration solution; the Gaussian function value at a spatial offset from the Gaussian peak gives the likelihood of that offset value. For more complicated regions containing multiple correlation local peak values, we extend the approach to fit a Gaussian mixture model to the region and use the mixture model covariance for the spatial accuracy metric. We describe an energy-based method for choosing the model region of the correlation surface. We discuss implementation subtleties and provide perturbation methods for handling numerically illconditioned matrices. We present numerical spatial error estimation results generated from registration of real video imagery acquired from a UAV platform.
TERNet: A deep learning approach for thermal face emotion recognition
Facial emotion recognition technology finds numerous real-life applications in areas of virtual learning, cognitive psychology analysis, avatar animation, neuromarketing, human machine interactions, and entertainment systems. Most state-of-the-art techniques focus primarily on visible spectrum information for emotion recognition. This becomes very arduous as emotions of individuals vary significantly. Moreover, visible images are susceptible to variation in illumination. Low lighting, variation in poses, aging, and disguise have a substantial impact on the appearance of images and textural information. Even though great advances have been made in the field, facial emotion recognition using existing techniques is often not satisfactory when compared to human performance. To overcome these shortcomings, thermal images are preferred to visible images. Thermal images a) are less sensitive to lighting conditions, b) have consistent thermal signatures, and c) have a temperature distribution formed by the face vein branches. This paper proposes a robust emotion recognition system using thermal images- TERNet. To accomplish this, customized convolutional neural network(CNNs) is employed, which possess excellent generalization capabilities. The architecture adopts features obtained via transfer learning from the VGG-Face CNN model, which is further fine-tuned with the thermal expression face data from the TUFTS face database. Computer simulations demonstrate an accuracy of 96.2% when compared to the state-of-the-art models.
Multimedia Algorithms and Systems
icon_mobile_dropdown
Deep learning on mobile devices: a review
Recent breakthroughs in deep learning and artificial intelligence technologies have enabled numerous mobile applications. While traditional computation paradigms rely on mobile sensing and cloud computing, deep learning implemented on mobile devices provides several advantages. These advantages include low communication bandwidth, small cloud computing resource cost, quick response time, and improved data privacy. Research and development of deep learning on mobile and embedded devices has recently attracted much attention. This paper provides a timely review of this fast-paced field to give the researcher, engineer, practitioner, and graduate student a quick grasp on the recent advancements of deep learning on mobile devices. In this paper, we discuss hardware architectures for mobile deep learning, including Field Programmable Gate Arrays (FPGA), Application Specific Integrated Circuit (ASIC), and recent mobile Graphic Processing Units (GPUs). We present Size, Weight, Area and Power (SWAP) considerations and their relation to algorithm optimizations, such as quantization, pruning, compression, and approximations that simplify computation while retaining performance accuracy. We cover existing systems and give a state-of-the-industry review of TensorFlow, MXNet, Mobile AI Compute Engine (MACE), and Paddle-mobile deep learning platform. We discuss resources for mobile deep learning practitioners, including tools, libraries, models, and performance benchmarks. We present applications of various mobile sensing modalities to industries, ranging from robotics, healthcare and multimedia, biometrics to autonomous drive and defense. We address the key deep learning challenges to overcome, including low quality data, and small training/adaptation data sets. In addition, the review provides numerous citations and links to existing code bases implementing various technologies. These resources lower the user’s barrier to entry into the field of mobile deep learning.
ARNature: augmented reality style colorization for enhancing tourism experience
Qianwen Wan, Huilin Tong, Aleksandra Kaszowska, et al.
Augmented Reality (AR) can seamlessly combine a real scene viewed by a user and a virtual component generated by a computer. This work introduces a system architecture integrating augmented reality technology with state-of-art computer vison techniques such as image semantic segmentation and style colorization. The proposed production system, ARNature, is able to superimpose a virtual scene, audio, and other enhancements in real time over a realworld environment for enhancing tourism experience. ARNature: A lot times, tourists have limited money and time to experience a tourist site during different seasons or weather conditions. The visitors are able to go on an augmented reality journey using an AR device, such as HoloLens, tablet or cellphones, and interact with real objects in a natural scene. Different enhancements of a tourism site are digitally overlaid onto visitors’ direct field of vision in real time. In addition, a voice module can be used to play music and provide additional information. Related algorithms, system design, and simulation results for a prototype ARNature system are presented. Furthermore, a no-reference image quality measure, Naturalness Image Quality Evaluator (NIQE), was utilized to evaluate the immersiveness and naturalness of ARNature. The results demonstrate that ARNature has the ability to enhance tourist experience in a truly immersive manner.
On privacy-protected outsourced processing for mobile multimedia
Mobile devices are usually computationally restricted in carrying out complicated processing algorithms on its multimedia document. While cloud computing provides resources for customers, a major concern is the customer data's privacy and security. The information-theoretic secure (ITS) approach has some advantages over the homomorphic encryption approach. ITS is faster and requires much shorter encryption. In this paper, we proposed a framework based on the ITS paradigm using the linear secret sharing scheme for image and audio processing tasks on mobile devices. Our experiments were carried out on the Amazon EC2 commercial cloud platform with better performance than existing ones.
Image Security, Authentication and Digital Forensics
icon_mobile_dropdown
Data security and privacy in the cloud
Relying on the cloud for storing data and performing computations has become a popular solution in today’s society, which demands large data collections and/or analysis over them to be readily available, for example, to make knowledge-based decisions. While bringing undeniable benefits to both data owners and end users accessing the outsourced data, moving to the cloud raises a number of issues, ranging from choosing the most suitable cloud provider for outsourcing to effectively protecting data and computation results. In this paper, we discuss the main issues related to data protection arising when data and/or computations over them are moved to the cloud. We also illustrate possible solutions and approaches for addressing such issues.
HeartID-based authentication for autonomous vehicles using deep learning and random number generators
HeartID biometric authentication technology is integrated into the multi-faceted steering wheel and car seat, allowing only authorized personnel to operate the vehicle, with access to the vehicle's connected devices and computers. The application of this HearID will be used for law enforcement and ride-sharing services where the person can access the car using keyless entry technology. In this study, we investigate the possibility of incorporating human heart signal called ECG into autonomous cars. Our platform can facilitate a secure authentication for end users using their heart signal to enable entry to the car. In this paper, we have presented the ECG-based biometric authentication for connecting autonomous vehicle that can act as an interface between humans and sensors for authentication purposes. In this study, we turn the ECG noise into the good feature where the noise is used for random number generators with high entropy. For evaluation of HeartID, NIST test suit is applied to evaluate the randomness of TRNG.
A trust-based, multi-factor system for assessing the veracity of video files in the era of ‘deep fakes’ (Conference Presentation)
A technology for creating video files that purport to be of an individual speaking words he or she has never necessarily said has been developed. This technology, commonly known as ‘deep fakes,’ uses machine learning technologies to train a system with images of the prospective subject of a video and reconstruct a video based upon words spoken by another individual. As these videos are convincing and can have negative effects ranging from embarrassing a subject to interfering with elections to impacting national security, it is critical to identify ways to determine whether a prospective video is genuine or not. This paper proposes a system to evaluate a presented video file, based on multiple characteristics, and make a recommendation as to the confidence of the veracity of the file. From a technical perspective, it combines assessment of multiple details related to the audio and video stored in the file, as well as other file characteristics. Additionally, other inputs related to assessment of the content of the video, its impact and timing can be added to this technical confidence metric to have a combined single metric that characterizes trust in the video to support decision making related to it. Several examples are presented and assessed. The paper concludes with a discussion of the problems of nefarious fake videos, the impact of fakeness detection and future work in this area. In particular, the impact of identifying a video as a fake towards countering the impact of the nefarious video is considered.
Analysis of the efficacy of the use of subject face color analysis to detect fake videos (Conference Presentation)
Videos called ‘deep fakes’ are created by an algorithm, based on deep learning techniques, that matches one individual’s (the target) facial patterns to another’s (the source). These videos are compelling. In many cases they are visually indistinguishable from real recordings. Being able to identify fake videos is critical to refuting the misinformation that they may be used to promulgate. While some approaches have been proposed, they are largely based on exploiting temporary gaps in the video production algorithm (such as not introducing significant blinking) that may be fixed in the future. This paper proposes a prospective systematic way of evaluating the veracity of possible deep fake videos. The proposed approach is based on assessing the color patterns and variations of the target’s face. Videos produced by the construction algorithm are compared to a similar actual recording. Also, videos made using the construction algorithm based on using multiple sources are compared to each other and to the actual recording of the target and key differences between the different videos are discussed. The paper then proceeds to analyze whether the identified differences and patterns are suitable for differentiating fake videos from actual recordings, across multiple application areas. The paper concludes with a discussion of the impact of so-called ‘deep fakes’ technology on public trust in video recordings and the media. The impact of a technique, like the proposed, on multiple application areas is discussed. Its efficacy for multiple critical applications is considered and topics for next steps are discussed.
Poster Session
icon_mobile_dropdown
Using embedded operating system as a modular provisioning platform for IP telephony
Filip Rezac, Jan Rozhon, Jakub Safarik, et al.
This article deals with the system for modular provisioning the IP telephony devices. Theoretical part deals with issues in mass configuration and data synchronization, followed by the exact practical implementations of the provisioning solutions. The article also describes the design and implementation of the whole platform, including the subsequent testing of functionality with possible design of further improvements.
Asymmetric and symmetric gradient operators with application in face recognition in Renaissance portrait art
Automatic face recognition research includes a wide range of commercial and law enforcement applications. However, only a few works focus on face recognition in the Renaissance portrait artworks which is essential to characterize the individual artists. The primary challenge of the portrait recognition is the availability of limited portraits. To cope with those issues, we develop a new class of new gradient operators for face recognition in renaissance portrait art. In mathematics, the gradient is an extension of the derivative. Gradient operators have been extensively used in many image processing and computer vision applications. The simplest examples are the Roberts, Prewitt and Sobel operators, the circular operator, the rotationally symmetric operator, and the isotropic operator. In this paper, we propose a class of gradient asymmetric and symmetric a small size (3×3 and 5×5) operators. Different examples of generated 5×5 gradient operators in the different directions are described. Extensive computer simulation is directed on 270 Renaissance portraits, including Raphael, Michelangelo, and Leonardo Da Vinci portraits. The experimental results show that the fusion of local binary patterns (LBP) and asymmetric and symmetric operators are better than traditional LBP features for face recognition, including in Renaissance portraits.
Thermal image stitching for examination industrial buildings
The problem of image stitching (mosaicing), or the formation of a panorama from a set of overlapping images, is one of the most widely deployed techniques in computer vision and computer graphics applications The goal of image stitching is to create natural-looking mosaics free of artifacts that may occur due to relative camera motion, illumination changes, and optical aberrations. The commonly used currently methods cannot be used in stitching thermal images.
A pixel-based color transfer system to recolor nighttime imagery
Most state-of-the-art techniques focus primarily on daytime surveillance, and limited research has been performed on nighttime monitoring. Surveillance is often more important in darker environments since many activities of interest often occur at night. But, nighttime imagery presents its challenges: they are mostly monochrome and very noisy. Night vision systems are also affected in the presence of a bright light or glare off a shiny surface. Color imagery has several benefits over grayscale imagery. The human eye can discriminate broad spectrum of colors but is limited to about 100 shades of gray. Also, color drives visual attention and aids in better understanding of the scene. Moreover, context information of the scene affects the way humans distinguish and recognize things. The essential step of a coloring process is the choice of an appropriate color image model and color mapping scheme. To enhance relevant information of nighttime images, a color mapping or color transfer technique is employed. The paper proposes a robust pixel-based color transfer architecture that maps the color characteristics of the daytime images to the nighttime images. The architecture is also capable of compensating for image registration issues encountered during acquisition. A visual analysis of the results demonstrate that the proposed method performs better in comparison to the state-of-the-art methods and is robust to different imaging sensors.
Gradient based histogram equalization in grayscale image enhancement
This paper presents a new method of histogram equalization for grayscale images, which is called the gradient based histogram equalization. The histogram equalization is performed on the image filtered by means of gradient operators. The proposed method is simple, fast, and the preliminary experimental examples with different images show that the method is effective for image enhancement. While preserving the range and mean intensity of the image, the new method allows for reducing the standard deviation and significantly straitening the graph of the histogram, when comparing with the traditional (or global) histogram equalization.
Augmented reality-based vision-aid indoor navigation system in GPS denied environment
Srijith Rajeev, Qianwen Wan, Kenny Yau, et al.
High accuracy localization and user positioning tracking is critical in improving the quality of augmented reality environments. The biggest challenge facing developers is localizing the user based on visible surroundings. Current solutions rely on the Global Positioning System (GPS) for tracking and orientation. However, GPS receivers have an accuracy of about 10 to 30 meters, which is not accurate enough for augmented reality, which needs precision measured in millimeters or smaller. This paper describes the development and demonstration of a head-worn augmented reality (AR) based vision-aid indoor navigation system, which localizes the user without relying on a GPS signal. Commercially available augmented reality head-set allows individuals to capture the field of vision using the front-facing camera in a real-time manner. Utilizing captured image features as navigation-related landmarks allow localizing the user in the absence of a GPS signal. The proposed method involves three steps: a detailed front-scene camera data is collected and generated for landmark recognition; detecting and locating an individual’s current position using feature matching, and display arrows to indicate areas that require more data collects if needed. Computer simulations indicate that the proposed augmented reality-based vision-aid indoor navigation system can provide precise simultaneous localization and mapping in a GPS-denied environment.
Vision based pointing error estimation for mobile eye-tracking system
Translating environmental knowledge from bird’s eye view perspective, such as a map, to first person egocentric perspective is notoriously challenging, but critical for effective navigation and environment learning. Pointing error, or the angular difference between the perceived location and the actual location, is an important measure for estimating how well the environment is learned. Traditionally, errors in pointing estimates were computed by manually noting the angular difference. With the advent of commercial low-cost mobile eye trackers, it becomes possible to couple the advantages of automated image processing based techniques with these spatial learning studies. This paper presents a vision based analytic approach for calculating pointing error measures in real-world navigation studies relying only on data from mobile eye tracking devices. The proposed method involves three steps: panorama generation, probe image localization using feature matching, and navigation pointing error estimation. This first-of-its-kind application has game changing potential in the field of cognitive research using eye-tracking technology for understanding human navigation and environment learning and has been successfully adopted by cognitive psychologists.
Video quality assessment using generative adversarial network
V. Voronin, V. Franc, A. Zelensky, et al.
Video related technology has grown rapidly due to the progress of digital devices such as virtual reality, 3D cameras, 3D films, and 3D display and Internet. During video acquisition and processing (compression, transmission, and reproduction) they may suffer some types of distortions which lead degradation and has a direct effect on the subjective sensation about human eyes. Moreover, the subjective evaluation is boring, time-consuming, and we do not have a specialist to do this kind of work. So, it is necessary to evaluate the quality of the videos by computers. Which means that the video quality evaluation/assessment has become vital. The goal of video quality assessment is to predict the perceptual quality for improving the performance of practical video application systems. In other words, the users’ experience is worse, so we need a metric to measure the distortions. The commonly used videos’ quality assessment methods: a) consider the video as a sequence of two-dimensional images and the videos’ quality assessment (scores) computing by weighted averaging of per frames (2D images) score, which conflicts with the fact that a video signal is a 3D volume and which ignores the movement features, b) are designed for specific distortions (for example blockiness and blurriness). In this paper, we present a novel deep learning architecture for no-reference video quality assessment. It based on a 3D convolutional neural network and generative adversarial network (GAN). We evaluate the proposed approach on the LIVE, ECVQ, TID2013 and EVVQ databases. Computer simulations show that the proposed video quality assessment: a) get convergence on a small amount of data, b) more “universal”- it can be used for different video quality degradation, including denoising, deblocking, deconvolution, and c) outperforms existing no-reference video quality assessment/methods. In addition, we demonstrate how our predicted no-reference quality metric correlates with qualitative opinion in a human observer study.
Quaternion alpha-rooting image enhancement of grayscale images
The proposed method is a new approach for enhancing grayscale images, when the images are map to quaternion space, and then, the quaternion based enhancement technique is used. Namely, the quaternion alpha-rooting method to enhance the so generated “quaternion” image. Currently, there are only very limited techniques to convert a grayscale image to color image, and in this article we propose a novel conversion technique which helps in easily converting a grayscale image to a color or quaternion image. In addition to that, we describe the quaternion alpha-rooting method of quaternion image enhancement. Quaternion approach of enhancement allows for processing the multi-signaled image as a single unit. The fast algorithm of quaternion discrete Fourier transforms makes the implementation of the enhancement method practically possible and effective. The results of image enhancement by the proposed method and comparison with the traditional alpha-rooting of grayscale images are described. The metric used to assess the quality of enhancement shows good values for the results of the proposed enhancement. One of the enhancement metrics is the contrast-based metric referred to as the enhancement measure estimation (EME). Other metrics used to assess the quality of the enhanced images are signal-to-noise ratio (SNR), mean-square-root error (MSRE).
Human stress detection from the speech in danger situation
Pavol Partila, Jaromir Tovarek, Jan Rozhon, et al.
Besides facial expression or gestures, human speech is still the main channel of communication in ordinary human life. In addition to speech content, this signal also contains additional source / human status information. Gender, age, but also the emotional state of man can be extracted from spoken speech. This research is focused on the classification of the emotional state of man, the stress in particular. Accordingly, we have created a speech database of emergency phone calls. The database contains recordings of the Integrated Rescue System (IRS) of 112 emergency line from Czech Republic. It was designed to detect the stress from the human voice. Due to the detection of stress from a neutral (resting) state, the database was divided into neutral speech and human speech in stress. The neutral subgroup consists of voice recordings of the IRS operator. The stress subgroup is made up of people in danger. We have deliberately selected events with great stressful stimuli such as car accident, domestic violence, situations close to death, and so on. The speech signal is then pre-processed and analyzed for the feature extraction. The feature vectors represents classifier input data. Old-fashioned classification methods such as Support Vector Machine (SVM) or k-Nearest Neighbors (k-NN) classifiers and new artificial intelligence methods such as Convolutional Neural Networks (CNN) are used to detect and recognize human stress. The applications of achieved results are broad: from phone services through Smart Health to security components analysis.
Skin detection in image and video founded in clustering and region growing
Region growing is defined as a procedure of finding regions containing user defined objects of interest. Growing region is a vital phase for various image processing applications. Growing region in images has been very challenging as it is the base for further image analysis, interpretation, and classification. Region growing varies for different purpose of aim. However, the identified region are widely used for various domain-skin detection, detect object in image, hand gesture detection, etc. In this paper, the main concentration is to defining region of interest from an image based on skin detection. A clustering method was used. Skin detection can be used as a preprocessing step for several applications included but not limited to various Human Computer Interaction (HCI) tasks. However, skin detection is a challenging problem due to sparse variations of skin tone of human. Skin tone can be confused with background color, attire color, ethnicity, individual characteristics-age, sex, body parts, makeup, hair color, presence of non-human objects, and camera calibration. Besides that, lightning conditions also plays a vital role. Researchers have been working tirelessly for an efficient skin detection method but those are not beyond limitations. Various approach including pixel wise threshold for various color spaces, segmentation, face and hand detection based approaches are proposed. But it still lacks from a method which can be applied for all types of skin detection. In this paper, a novel skin detection method is proposed which is free from any manual threshold values and automatically define number of clusters.
Cardiovascular PPG biometric key generation for IoT in healthcare domain
The wearable IoT monitor device becomes increasingly popular in a market where heart-rate monitors, pulse oximeters sensor are integrated into a device and already play an important role in everyday life. With these considerations in mind, it is important to maintain the security and privacy of users. Biometric authentication offers several benefits such as improved facilitation, enablement, and automation. However, traditional biometric modalities such as fingerprint, face, and iris require specific hardware or sensors to capture the biometric. In this paper, we introduce next-generation biometric called photoplethysmograph (PPG) that are internal to the body, offering a number of advantages. First of all, they are harder to clone, to harvest and to potentially hack, by the nature of the fact they are internal. Other benefits include liveness detection, and interoperability, which traditional modalities don't necessarily have. In this study, we developed the PPG biometric-based key generation that can be extracted by our adaptive quantization approach. The experimental result is shown that 175 key bits with 99.9% average reliability and 0.89 min-entropy can be achieved.