OAK

Argonet ???jsp.layout.header.link.name.2???

HSU Repository 일반대학원 AI응용학과 2. Thesis

다시점 영상에서 객체 추적 및 세그멘테이션을 위한 프레임워크

= Framework for Multi-view Object Tracking and Segmentation

Metadata Downloads

Type: Thesis

Abstract: 단일 카메라 기반 영상은 시야 제한과 객체 가림(Occlusion)으로 인해 복잡한 장면에서의 객체 인식 및 추적 성능이 제한된다. 다시점 영상은 이러한 한계를 완화할 수 있으나, 한 프레임에서 처리해야 하는 이미지 수가 증가함에 따라 실시간 객체 추적 및 세그멘테이션에서는 높은 계산 복잡도와 방대한 데이터를 처리해야 하는 문제가 발생한다. 본 논문에서는 다시점 영상의 대표적 사례인 플렌옵틱 이미지와 멀티 카메라 이미지를 연구 대상으로 활용하여 이에 적합한 효율적인 객체 추적 및 세그멘테이션 프레임워크를 제안한다. 먼저, 플렌옵틱 이미지 환경에서는 기존 2D 비디오 기반 객체 추적기를 플렌옵틱 구조에 적합하도록 재구성하였다. 또한, 한 프레임 내 다수의 포컬 플레인 이미지 중 필수 정보만을 선별하는 포컬 플레인 이미지 선택 전략을 도입하고, 프레임워크 내부의 딥러닝 기반 특징 추출 모듈과 전처리 단계를 멀티코어로 구성된 CPU와 GPU 환경에서 병렬화하여 계산 효율을 극대화하였다. 또한 본 논문에서는 멀티 카메라 이미지 환경에서는 low-rank projection matrix를 적용한 경량화된 Video Multi-Object Segmenter와 경량화된 Mask refiner를 원본 모델과 동적으로 조합하여 사용하는 효율적인 세그멘테이션 프레임워크를 제안한다. 연속된 프레임 간 코사인 유사도를 적용하여 현재 프레임의 이미지들의 경량화 정도를 적응적으로 조정함으로써, 더욱 fine-grained한 모델 적용을 가능하게 한다. 이때 다중 GPU 환경에서는 경량화 모델들과 원본 모델들이 혼재되어 수행된다. 이로 인해 발생하는 GPU 간 실행 시간 불균형은 프레임 단위 지연을 초래할 수 있다. 이를 해결하기 위하여 매 프레임마다 시스템 내부 GPU들의 하드웨어적인 연결 상태를 고려하여 GPU 간 데이터 이동을 최적으로 수행한다. 이를 통해 각 GPU의 프레임당 세그멘테이션 실행 시간을 균형적으로 유지함으로써, 전체 시스템의 평균 프레임 실행 시간을 최소화하도록 설계하였다. 실험 결과, 제안한 플렌옵틱 이미지 기반 추적 프레임워크는 기존 대비 81.7%의 실행 시간을 단축하였으며, 멀티 카메라 이미지 기반 세그멘테이션 프레임워크는 경량 모델 사용에 따른 IoU 감소를 2.86% 이내로 유지하면서 프레임당 실행 시간을 34.3% 절감하였다.

【주요어】플렌옵틱, 스레드 풀, 멀티 스트림, 멀티뷰, low-rank 근사, 적응형 GPU 부하 재분배|Single-camera-based video suffers from limited object recognition and tracking performance in complex scenes due to restricted field of view and occlusion. Multi-view video can mitigate these limitations, but the increased number of images to process per frame leads to high computational complexity and the need to handle massive data volumes in real-time object tracking and segmentation. This paper proposes efficient object tracking and segmentation frameworks for two representative types of multi-view video: plenoptic imaging and multi-camera imaging. First, in the plenoptic imaging setting, we restructure existing 2D video-based object trackers to better align with the characteristics of plenoptic images. In addition, we introduce an image selection strategy that extracts only the essential focal plane images from the numerous ones available in each frame, and we maximize computational efficiency by parallelizing the deep learning-based feature extraction module and preprocessing stages across a multi-core CPU and GPU environment. Second, in the multi-camera imaging setting, we propose an efficient segmentation framework that dynamically switches between lightweight and original models, and uses Video Multi-Object Segmenter with a low-rank projection matrix and a lightweight Mask Refiner. Furthermore, cosine similarity between consecutive frames is used to accurately determine the extent of motion or variation of target objects. This information enables adaptive adjustment of the lightweight level for the current frame, allowing fine-grained model selection. In a multi-GPU setting, the coexistence of lightweight and original models can lead to execution time imbalance across GPUs, causing frame-level latency. To mitigate this, the proposed framework optimally manages inter-GPU data transfers at each frame by considering the hardware connectivity of GPUs. As a result, the segmentation execution time per frame is balanced across GPUs, minimizing the overall average per-frame execution time. Experimental results demonstrate that the proposed framework maintains the IoU drop within 2.86% due to lightweight model usage, while achieving a 34.3% reduction in average per-frame execution time.

【Keywords】Plenoptic, Thread pool, Multi-stream, Multi-view, low-rank approximation, Adaptive GPU load redistribution