OAK

OAK National Repository

HSU Repository 일반대학원 IT융합공학과 1. Thesis

클라우드 비 의존적인 엣지 디바이스 내 최적화된 심층 신경망 연산 기법 연구

Metadata Downloads

Abstract: 엣지 디바이스에서 DNN을 효율적으로 연산하는 방법에 대해서는 많은 연구가 이루어지고 있다. 특히 자율주행 분야에서는 네트워크 지연시간과 보안문제로 인해 클라우드 기반 시스템이 아닌 자체 임베디드 시스템에서 처리하는 것이 선호된다. 자율주행 분야에서는 많은 센서 데이터를 처리하기 위해 사용되는 DNN의 수가 늘어나고 있다. 본 논문에서는 DNN연산을 최적화 하기 위해 두 가지 관점에서 DNN을 분석한다. 그리고 두 가지 관점의 최적화 기법을 제안한다. 첫번째로 엣지 디바이스에 최적화된 DNN을 탐색하기 위해 NAS시스템을 적용한다. 모바일 환경을 위한 NAS인 MnasNet을 바탕으로 정확도와 실행시간에 대한 최적화를 하는 다중목적 보상함수에 집중한다. MnasNet은 추론 실행시간 제약 조건을 만족하면서 정확도가 가장 높은 DNN을 탐색하는 것이 목표이다. 그리고 주어진 제한시간보다 짧은 실행시간이면서 정확도가 최대가 되는 DNN을 탐색할 수 있도록 새로운 보상함수를 제안한다. 두번째로 여러 개의 DNN을 동시에 실행하였을 때 GPU뿐만 아니라 CPU도 적절히 이용하는 CPU-GPU Co-Scheduling 프레임워크를 제안한다. 이를 위해 멀티 컨텍스트 환경과 싱글 컨텍스트 환경에서 여러 개의 DNN을 실행될 때 각 환경에서 발생하는 오버헤드를 분석한다. 그 다음 GPU의 계산 부하를 줄이기 위해 사용되지 않는 CPU사이클을 이용한다. CPU와 GPU의 원활한 통신을 위해 오버헤드가 작은 데이터 동기화 방법을 제안한다. 마지막으로 레이어 작업을 실행할 코어를 선택하기 위해 실행시간 예측 모델을 만든다. 이러한 솔루션들을 적용한 후 다중 DNN 실행은 크게 개선되었고 GPU-only 솔루션에 비해 실행시간을 최대 46.6%까지 단축시켰다.|Many research works have been conducted to optimize computations of DNN(deep neural network) models in edge devices. Especially in autonomous driving, due to network latency and security issues, executing DNN models inside embedded systems is preferred rather than cloud servers. In case of autonomous driving, the number of DNN models has been continuously increased to process many raw sensor data. In this paper, to optimize the computational behavior of DNN models, we study DNN models in two different ways, and then, we propose each optimization technique. First, we apply NAS(Neural Architecture Search) approach to obtain optimized DNN models for edge devices. At here, we focus on a multi-objective reward function of MnasNet providing optimization tools in terms of the accuracy and the execution time. The goal of MnasNet is to explore DNN models with the highest accuracy while satisfying the execution time constraints. Based on this, we propose a new reward function to explore DNN models with the maximum accuracy while ensuring that the inference is completed within the time limit. Second, we propose CPU-GPU Co-Scheduling framework that orchestrates the CPU as well as the GPU for multi-DNN execution. In the development of the proposed framework, we conducted further analysis for definitely finding the overhead that inevitably incurred when multiple DNNs are executed in a multiple-context environment and a single-context environment. Moreover, our framework utilizes unused CPU cycles for DNN computations to ease the computational burden of the GPU. For seamless communication between the CPU and the GPU, we propose low overhead data synchronization method. Finally, we apply layer execution time prediction model to select the appropriate core whenever a new DNN layer execution is issued. After applying the proposed solutions, the performance of multi-DNN execution is improved and the execution time is reduced by up to 46.6% compared to the GPU-only solution.