Τετάρτη 27 Νοεμβρίου 2019

Journal of Real-Time Image Processing: sixth issue of volume 16

Video similarity detection using fixed-length Statistical Dominant Colour Profile (SDCP) signatures

Abstract

This paper presents a fast and effective technique for videos’ visual similarity detection and measurement using compact fixed-length signatures. The proposed technique facilitates for building real-time and scalable video matching/retrieval systems through generating a representative signature for a given video shot. The generated signature (Statistical Dominant Colour Profile, SDCP) effectively encodes the colours’ spatio-temporal patterns in a given shot, towards a robust real-time matching. Furthermore, the SDCP signature is engineered to better address the visual similarity problem, through its relaxed representation of shot contents. The compact fixed-length aspect of the proposed signature is the key to its high matching speed (>1000 fps) compared to the current techniques that relies on exhaustive processing, such as dense trajectories. The SDCP signature encodes a given video shot with only 294 values, regardless of the shot length, which facilitates for speedy signature extraction and matching. To maximize the benefit of the proposed technique, compressed-domain videos are utilized as a case study following their wide availability. However, the proposed technique avoids full video decompression and operates on tiny frames, rather than full-size decompressed frames. This is achievable through using the tiny DC-images sequence of the MPEG compressed stream. The experiments on various standard and challenging datasets (e.g. UCF101 13k videos) shows the technique’s robust performance, in terms of both, retrieval ability and computational performances.

Accelerating block-matching and 3D filtering method for image denoising on GPUs

Abstract

Denoising photographs and video recordings is an important task in the domain of image processing. In this paper, we focus on block-matching and 3D filtering (BM3D) algorithm, which uses self-similarity of image blocks to improve the noise-filtering process. Even though this method has achieved quite impressive results in the terms of denoising quality, it is not being widely used. One of the reasons is a fact that the method is extremely computationally demanding. In this paper, we present a CUDA-accelerated implementation which increased the image processing speed significantly and brings the BM3D method much closer to real applications. The GPU implementation of the BM3D algorithm is not as straightforward as the implementation of simpler image processing methods, and we believe that some parts (especially the block-matching) can be utilized separately or provide guidelines for similar algorithms.

Fast computation of 2D and 3D Legendre moments using multi-core CPUs and GPU parallel architectures

Abstract

Legendre moments and their invariants for 2D and 3D image/objects are widely used in image processing, computer vision, and pattern recognition applications. Reconstruction of digital images by nature required higher-order moments to get high-quality reconstructed images. Different applications such as classification of bacterial contamination images utilize high-order moments for feature extraction phase. For big size images and 3D objects, Legendre moments computation is very time-consuming and compute-intensive. This problem limits the use of Legendre moments and makes them impractical for real-time applications. Multi-core CPUs and GPUs are powerful processing parallel architectures. In this paper, new parallel algorithms are proposed to speed up the process of exact Legendre moments computation for 2D and 3D image/objects. These algorithms utilize multi-core CPUs and GPUs parallel architectures where each pixel/voxel of the input digital image/object can be handled independently. A detailed profile analysis is presented where the weight of each part of the entire computational process is evaluated. In addition, we contributed to the parallel 2D/3D Legendre moments by: (1) a modification of the traditional exact Legendre moment algorithm to better fit the parallel architectures, (2) we present the first parallel CPU implementation of Legendre moment, and (3) we present the first parallel CPU and GPU acceleration of the reconstruction phase of the Legendre moments. A set of numerical experiments with different gray-level images are performed. The obtained results clearly show a very close to optimal parallel gain. The extreme reduction in execution times, especially for 8-core CPUs and GPUs, makes the parallel exact 2D/3D Legendre moments suitable for real-time applications.

Real-time scene reconstruction and triangle mesh generation using multiple RGB-D cameras

Abstract

We present a novel 3D reconstruction system that can generate a stable triangle mesh using data from multiple RGB-D sensors in real time for dynamic scenes. The first part of the system uses moving least squares (MLS) point set surfaces to smooth and filter point clouds acquired from RGB-D sensors. The second part of the system generates triangle meshes from point clouds. The whole pipeline is executed on the GPU and is tailored to scale linearly with the size of the input data. Our contributions include changes to the MLS method for improving meshing, a fast triangle mesh generation method and GPU implementations of all parts of the pipeline.

Architecture for parallel marker-free variable length streams decoding

Abstract

Due to throughput requirements above 1 gigapixel/sec for the real-time compression of modern image and video data streams, parallelism for encoding and decoding is inevitable. To achieve parallel decoding, a well-established technique is to insert markers into the variable length code (VLC) stream. By locating markers, it is then possible to extract the sub-streams that are, in turn, decoded in parallel. The use of markers adversely affects compression especially when a high degree of parallelism is required. In this paper, we propose an architecture of a marker-free parallel decoding approach of VLC streams. Instead of multiple local entropy decoders, the proposed architecture is based on using a single parallel entropy decoder in conjunction with a novel format to construct the VLC stream. The approach runs at high clock rates supporting parallelism to a high number of decoders. A synthesized clock frequency well above 110 MHz is achieved for up to 20 decoders on a medium-sized FPGA.

A GPU-based elastic shape registration approach in implicit spaces

Abstract

In this paper, we present a GPU-based implementation of an elastic shape registration approach in implicit spaces. Shapes are represented using signed distance functions, while deformations are modeled by cubic B-splines. In a variational framework, an incremental free form deformation strategy is adopted to handle smooth deformations through an adaptive size control lattice grid. The grid control points are estimated by a closed-form solution which avoids the gradient descent iterations. However, even this solution is very far from real time. We show in detail that such an algorithm is computationally expensive with a time complexity of \({\mathbf O} (NCP_xNCP^2X^2Y^2)\) where \(NCP_x\) and NCP are the grid lattice resolution parameters in the shape domain of size \(X\times Y\). Moreover, the problem becomes more time-consuming with the increase in the number of control points because this requires the execution of the incremental algorithm several times. The closed-form solution was implemented using eight different GPU techniques. Our experimental results demonstrate speedups of more than \(150{\times}\) compared to the \(\texttt {C}\) implementation on a CPU.

HD number plate localization and character segmentation on the Zynq heterogeneous SoC

Abstract

Automatic number plate recognition (ANPR) systems have become widely used in safety, security, and commercial aspects. A typical ANPR system consists of three main stages: number plate localization (NPL), character segmentation (CS), and optical character recognition (OCR). In recent years, to provide a better recognition rate, high-definition (HD) cameras have started to be used. However, most known techniques for standard definition (SD) are not suitable for real-time HD image processing due to the computationally intensive cost of processing several-folds more of image pixels, particularly in the NPL stage. In this paper, algorithms suitable for hardware implementation for NPL and CS stages of an HD ANPR system are presented. Software implementation of the algorithms was carried on as a proof of concept, followed by hardware implementation on a heterogeneous system-on-chip (SoC) device that contains an ARM processor and a field-programmable gate array (FPGA). Heterogeneous implementation of these stages has shown that this HD NPL algorithm can localize a number plate in 16.17 ms, with a success rate of 98.0%. The CS algorithm can then segment the detected plate in 0.59 ms, with a success rate of 99.05%. Both stages utilize only 21% of the available on-chip configurable logic blocks.

Efficient velocity estimation for MAVs by fusing motion from two frontally parallel cameras

Abstract

Efficient velocity estimation is crucial for the robust operation of navigation control loops of micro aerial vehicles (MAVs). Motivated by the research on how animals exploit their visual topographies to rapidly perform locomotion, we propose a bio-inspired method that applies quasi-parallax technique to estimate the velocity of an MAV equipped with a forward-looking stereo camera without GPS. Different to the available optical flow-based methods, our method can realize efficient metric velocity estimation without applying any depth information from either additional distance sensors or from stereopsis. In particular, the quasi-parallax technique, which claims to press maximal benefits from the configuration of two frontally parallel cameras, leverages pairs of parallel visual rays to eliminate rotational flow for translational velocity estimation, followed by refinement of the estimation of rotational velocity and translational velocity iteratively and alternately. Our method fuses the motion information from two frontal-parallel cameras without performing correspondences matching, achieving enhanced robustness and efficiency. Extensive experiments on synthesized and actual scenes demonstrate the effectiveness and efficiency of our method.

Techniques of medical image processing and analysis accelerated by high-performance computing: a systematic literature review

Abstract

Techniques of medical image processing and analysis play a crucial role in many clinical scenarios, including in diagnosis and treatment planning. However, immense quantities of data and high complexity of the algorithms often used are computationally demanding. As a result, there now exists a wide range of techniques of medical image processing and analysis that require the application of high-performance computing solutions in order to reduce the required runtime. The main purpose of this review is to provide a comprehensive reference source of techniques of medical image processing and analysis that have been accelerated by high-performance computing solutions. With this in mind, the articles available in the Scopus and Web of Science electronic repositories were searched. Subsequently, the most relevant articles found were individually analyzed in order to identify: (a) the metrics used to evaluate computing performance, (b) the high-performance computing solution used, (c) the parallel design adopted, and (d) the task of medical image processing and analysis involved. Hence, the techniques of medical image processing and analysis found were identified, reviewed, and discussed, particularly in terms of computational performance. Consequently, the techniques reviewed herein present the progress made so far in reducing the computational runtime involved, and the difficulties and challenges that remain to be overcome.

Δεν υπάρχουν σχόλια:

Δημοσίευση σχολίου

Αρχειοθήκη ιστολογίου