基于转移分块Transformer和特征金字塔的点-体素三维目标检测方法-工程学研究-CSCIED科技核心评价数据库-手机版

基于转移分块Transformer和特征金字塔的点-体素三维目标检测方法

Point-voxel 3D object detection method based on transfer block Transformer and feature pyramid

ES评分 0 浏览量：171 下载量：0

DOI	10.12208/j.jer.20250007
刊名	工程学研究 Journal of Engineering Research
年，卷(期)	2025, 4(1)
作者	刘良杰¹, 任宏乾³, 孙凯², 谢国涛⁴,
作者单位	¹株洲中车时代软件技术有限公司湖南株洲 ²国家能源集团陕西神延煤炭有限责任公司西湾露天煤矿陕西榆林 ³湖南大学机械与运载工程学院湖南长沙 ⁴湖南大学无锡智能控制研究院江苏无锡
摘要	随着环境感知技术的发展，激光雷达三维目标检测取得了显著进展。然而，基于体素的三维检测器在划分点云时，难以捕捉丰富的上下文信息和细节特征，尤其在处理遮挡和截断问题时，原始点云的细节信息常常丢失。为解决这些挑战，本文提出了一种新型数据增强策略，增强了模型对不完整点云的处理能力；并提出了基于转移分块Transformer和特征金字塔的点-体素三维目标检测模型PV-FMRTNet，有效解决了点云转换为体素过程中位置信息丢失的问题。此外，设计了一种新的二维特征编码网络，提升了基于体素的三维目标检测系统的性能。评估结果显示，本文模型在检测汽车、行人和骑行者方面的准确度分别达到84.30%、61.76%和78.08%，相比主流算法PointPillars等基准模型平均提升2.08%，展现出先进的准确性和鲁棒性。
Abstract	With the development of environmental sensing technology, lidar three-dimensional target detection has made significant progress. However, it is difficult for voxel-based 3D detectors to capture rich contextual information and detailed features when dividing point clouds. Especially when dealing with occlusion and truncation problems, the detailed information of the original point cloud is often lost. To address these challenges, this paper proposes a new data augmentation strategy to enhance the model's ability to handle incomplete point clouds; and proposes a point-to-voxel 3D target detection model PV-FMRTNet based on transfer block Transformer and feature pyramid, which effectively solves the problem of position information loss in the process of converting point clouds to voxels. In addition, a new 2D feature encoding network was designed to improve the performance of the voxel-based 3D object detection system. The evaluation results show that the accuracy of the proposed model in detecting cars, pedestrians and cyclists reached 84.30%, 61.76% and 78.08% respectively, which is an average improvement of 2.08% over the mainstream algorithm PointPillars and other benchmark models, showing advanced accuracy and robustness.
关键词	自动驾驶；深度学习；三维目标检测；特征金字塔；点体素
KeyWord	Autonomous driving; Deep learning; 3D object detection; Feature pyramid; Point voxel
基金项目
页码	46-58

参考文献
相关文献

[1] Xie G, Zhang X, Gao H, et al. Situational assessments based on uncertainty-risk awareness in complex traffic scenarios
[J]. Sustainability, 2017, 9(9): 1582.

[2] 刘秀红,姜圣.基于自动驾驶的车辆安全技术应用探讨
[J].时代汽车, 2023(6):175-177.

[3] 李克强,戴一凡,李升波等.智能网联汽车(ICV)技术的发展现状及趋势
[J].汽车安全与节能学报,2017,8(01):1-14.

[4] 吕璐,程虎,朱鸿泰等.基于深度学习的目标检测研究与应用综述
[J].电子与封装,2022,22(01):72-80.

[5] 郑少武,李巍华,胡坚耀.基于激光点云与图像信息融合的交通环境车辆检测
[J].仪器仪表学报,2019,40(12): 143-151.

[6] 张银,任国全,程子阳,孔国杰.三维激光雷达在无人车环境感知中的应用研究
[J].激光与光电子学进展,2019, 56(13):1-11.

[7] 叶语同,李必军,付黎明.智能驾驶中点云目标快速检测与跟踪
[J].武汉大学(信息科学版),2019,44(01):139-144+152.

[8] 王刚,王沛.基于深度学习的三维目标检测方法研究
[J].计算机应用与软件,2020,37(12):164-168.

[9] Qi C R, Su H, Mo K, et al. Pointnet: Deep learning on point sets for 3d classification and segmentation
[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 652-660.

[10] Qi C R, Yi L, Su H, et al. Pointnet++: Deep hierarchical feature learning on point sets in a metric space
[J]. Advances in neural information processing systems, 2017, 30.

[11] Qi C R, Liu W, Wu C, et al. Frustum pointnets for 3d object detection from rgb-d data
[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 918-927.

[12] Shi S, Wang X, Li H. Pointrcnn: 3d object proposal generation and detection from point cloud
[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 770-779.

[13] Tang Q, Bai X, Guo J, et al. DFAF3D: A dual-feature-aware anchor-free single-stage 3D detector for point clouds
[J]. Image and Vision Computing, 2023, 129: 104594.

[14] 周燕,蒲磊,林良熙,et al.激光点云的三维目标检测研究进展
[J].计算机科学与探索, 2022, 16(12):23.

[15] Zhou Y, Tuzel O. Voxelnet: End-to-end learning for point cloud based 3d object detection
[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2018: 4490-4499.

[16] Yan Y, Mao Y, Li B. Second: Sparsely embedded convolutional detection
[J]. Sensors, 2018, 18(10): 3337.

[17] Lang A H, Vora S, Caesar H, et al. Pointpillars: Fast encoders for object detection from point clouds
[C] //Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2019: 12697-12705.

[18] Zhou S, Tian Z, Chu X, et al. FastPillars: a deployment-friendly pillar-based 3D detector
[J]. arXiv preprint arXiv:2302.02367, 2023.

[19] Shi S, Guo C, Jiang L, et al. Pv-rcnn: Point-voxel feature set abstraction for 3d object detection
[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 10529-10538.

[20] Deng J, Shi S, Li P, et al. Voxel r-cnn: Towards high performance voxel-based 3d object detection
[C]// Proceedings of the AAAI conference on artificial intelligence. 2021, 35(2): 1201-1209.

[21] Liu Z, Tang H, Lin Y, et al. Point-voxel CNN for efficient 3D deep learning
[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. 2019: 965-975.

[22] Liu Z, Lin Y, Cao Y, et al. Swin transformer: Hierarchical vision transformer using shifted windows
[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 10012-10022.

[23] Lin T Y, Dollár P, Girshick R, et al. Feature pyramid networks for object detection
[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 2117-2125.

[24] Lin T Y, Goyal P, Girshick R, et al. Focal loss for dense object detection
[C]//Proceedings of the IEEE international conference on computer vision. 2017: 2980-2988.

[25] Shannon C E. A mathematical theory of communication
[J]. The Bell system technical journal, 1948, 27(3): 379-423.

[26] Geiger A, Lenz P, Urtasun R. Are we ready for autonomous driving? the kitti vision benchmark suite
[C]//2012 IEEE conference on computer vision and pattern recognition. IEEE, 2012: 3354-3361.

[27] Ku J, Mozifian M, Lee J, et al. Joint 3d proposal generation and object detection from view aggregation
[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2018: 1-8.

[28] Zhao X, Liu Z, Hu R, et al. 3D object detection using scale invariant and feature reweighting networks
[C]// Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01): 9267-9274.

[29] Yoo J H, Kim Y, Kim J, et al. 3d-cvf: Generating joint camera and lidar features using cross-view spatial feature fusion for 3d object detection
[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXVII 16. Springer International Publishing, 2020: 720-736.

[30] Huang T, Liu Z, Chen X, et al. Epnet: Enhancing point features with image semantics for 3d object detection
[C]//Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XV 16. Springer International Publishing, 2020: 35-52.

[31] Chen X, Ma H, Wan J, et al. Multi-view 3d object detection network for autonomous driving
[C]//Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2017: 1907-1915.

[32] Liang M, Yang B, Wang S, et al. Deep continuous fusion for multi-sensor 3d object detection
[C]//Proceedings of the European conference on computer vision (ECCV). 2018: 641-656.

[33] Pang S, Morris D, Radha H. CLOCs: Camera-LiDAR object candidates fusion for 3D object detection
[C]//2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2020: 10386-10393.

[34] He C, Zeng H, Huang J, et al. Structure aware single-stage 3d object detection from point cloud
[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020: 11873-11882.

[35] Hu J S K, Kuai T, Waslander S L. Point density-aware voxels for lidar 3d object detection
[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 8469-8478.

[36] Guan T, Wang J, Lan S, et al. M3detr: Multi-representation, multi-scale, mutual-relation 3d object detection with transformers
[C]//Proceedings of the IEEE/CVF winter conference on applications of computer vision. 2022: 772-782.

[37] Yang H, He T, Liu J, et al. GD-MAE: generative decoder for MAE pre-training on lidar point clouds
[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023: 9403-9414.

引用本文

刘良杰, 任宏乾, 孙凯, 谢国涛. 基于转移分块Transformer和特征金字塔的点-体素三维目标检测方法 [J]. 工程学研究. 2025; 4; (1). 46 - 58.

文献评论

相关学者

相关机构