Abstract
Object retrieval and classification in point cloud data is challenged by noise, irregular sampling density and occlusion. To address this issue, we propose a point pair descriptor that is robust to noise and occlusion and achieves high retrieval accuracy. We further show how the proposed descriptor can be used in a 4D convolutional neural network for the task of object classification. We propose a novel 4D convolutional layer that is able to learn class-specific clusters in the descriptor histograms. Finally, we provide experimental validation on 3 benchmark datasets, which confirms the superiority of the proposed approach.
Paper preview
For full-text of the paper, see IEEE version. Preprint available at Arxiv
First version of the code can be found at code folder
Main contributions
-
We present a novel 4D convolutional neural network architecture that takes a 4D descriptor as input and outperforms existing deep learning approaches on realistic point cloud datasets.
-
We design a handcrafted point pair function-based 4D descriptor that offers high robustness for realistic noisy point cloud data.
Overview of the pipeline
Fig. Overview of the proposed object classification pipeline that is a combination of a novel handcrafted descriptor and a 4D convolutional neural network (CNN). Here, FC denotes a fully connected layer.
Fig. Architecture of the proposed 4D neural network.
Results
TABLE I. Retrieval performance of the handcrafted descriptors. The mean value is given in the corresponding column, while the standard deviation is given in brackets. Best performance is shown in bold.
Dataset | Metric | Descriptor | ||||
---|---|---|---|---|---|---|
OUR-CVFH | ESF | Wahl | EPPF Short | EPPF | ||
Stanford | Total accuracy (%) | 62.79 | 71.34 | 75.13 | 77.26 | 80.18 |
Mean accuracy (%) | 42.91 | 54.54 | 57.00 | 60.53 | 64.01 | |
Mean recall (%) | 49.90 | 52.28 | 57.45 | 60.16 | 64.58 | |
F1-score | 0.437 | 0.530 | 0.567 | 0.601 | 0.640 | |
ScanNet | Total accuracy (%) | 56.23 | 53.41 | 63.72 | 63.49 | 65.29 |
Mean accuracy (%) | 39.83 | 33.69 | 45.40 | 42.02 | 44.95 | |
Mean recall (%) | 38.21 | 32.72 | 45.94 | 45.17 | 47.54 | |
F1-score | 0.382 | 0.327 | 0.444 | 0.430 | 0.457 | |
M40 | Total accuracy (%) | 53.22 | 65.87 | 74.41 | 73.00 | 73.68 |
Mean accuracy (%) | 46.43 | 58.91 | 67.50 | 65.79 | 66.43 | |
Mean recall (%) | 49.26 | 59.96 | 70.33 | 69.12 | 69.79 | |
F1-score | 0.465 | 0.588 | 0.680 | 0.666 | 0.671 |
Table II. Classification performance of deep learning approaches using 2D, 3D and 4D convolutional layers.
Dataset | Metric | PointNet | EPPF 2D | EPPF 3D | EPPF 4D |
---|---|---|---|---|---|
Stanford | Total accuracy (%) | 64.30 | 82.01 | 81.94 | 83.22 |
Mean accuracy (%) | 42.48 | 64.26 | 66.37 | 65.11 | |
Mean recall (%) | 40.47 | 70.88 | 60.94 | 72.13 | |
F1-score | 0.395 | 0.652 | 0.665 | 0.672 | |
ScanNet | Total accuracy (%) | 63.04 | 70.39 | 70.57 | 72.10 |
Mean accuracy (%) | 37.50 | 38.98 | 44.35 | 45.70 | |
Mean recall (%) | 19.53 | 63.52 | 54.53 | 56.58 | |
F1-score | 0.209 | 0.433 | 0.472 | 0.488 | |
M40 | Total accuracy (%) | 87.01 | 81.64 | 81.15 | 82.13 |
Mean accuracy (%) | 82.08 | 76.37 | 75.87 | 77.05 | |
Mean recall (%) | 83.48 | 77.30 | 77.51 | 76.99 | |
F1-score | 0.824 | 0.765 | 0.762 | 0.769 |
Fig. Descriptor and 4D neural network responses for the object table in the ScanNet dataset. Left: descriptor values. Middle: response of the first filter in the first layer. Right: filter response in the second layer. The rows show slices of the fourth dimension. Transparent bins correspond to constant offset values for the response (or 0 for the descriptor values), colored bins - to varying values. The bins are colored so that low values are shown in blue color, while high in red.
References
-
C. R. Qi, H. Su, K. Mo, and L. J. Guibas, “Pointnet: Deep learning on point sets for 3d classification and segmentation,” Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
-
E. Wahl, U. Hillenbrand, and G. Hirzinger, “Surflet-pair-relation histograms: a statistical 3d-shape representation for rapid classification,” in Proceedings of IEEE International Conference on 3-D Digital Imaging and Modeling (3DIM), 2003, pp. 474–481.
Contact
For any questions or inquiries, please contact Dmytro Bobkov at with a subject “Object Descriptor RAL”.
Last updated 24.04.2018