Title: | Promoting Usage of Deep Learning Object Detection in Ecology by Improving Performance and Accessibility |
Contributor(s): | Shepley, Andrew Jason (author) ; Falzon, Gregory (supervisor) ; Kwan, Paul (supervisor) |
Conferred Date: | 2021-10-06 |
Copyright Date: | 2021 |
Handle Link: | https://hdl.handle.net/1959.11/56752 |
Related DOI: | 10.1002/ece3.7344 10.48550/arXiv.2012.00257 10.3390/s21082611 |
Related Research Outputs: | https://hdl.handle.net/1959.11/56753 |
Abstract: | | Artificially intelligent computer vision systems are becoming increasingly prevalent in an ever-expanding range of applications, providing greater automation in data-driven tasks, reducing resource expenditure, and enabling new insights to be gained.
Underpinning these systems are Deep Convolutional Neural Networks (DCNNs), which learn discriminative features present in data, allowing classification and object detection tasks to be performed. For object detection tasks, which are the focus of this thesis, DCNNs are usually trained by computer scientists on large numbers of domain specific images, using site-specific and target object features to locate objects in images. Greedy Non-Maxima Suppression (NMS) is used to return one optimal bounding box per object in a given image.
Although the capabilities and benefits of DCNNs have heralded a new age of automation, access to and performance of these networks in small projects is often inadequate, inhibiting widespread adoption. Individuals who are not trained in computer science and do not have access to copious quantities of annotated images struggle to train robust object detectors, often resorting to time consuming manual processing, or resource expensive collaboration with or employment of computer scientists. This is particularly true in the case of ecological image processing tasks, which are characterized by large volumes of complex image data, which must be classified and interpreted to enable effective ecological monitoring and management.
This thesis aims to facilitate broader adoption of DCNN-based computer vision in ecology and beyond, by improving object detection performance, and bridging the technical gap between ecology and computer science. We identified poor domain adaptability caused by reliance on large numbers of similar camera trap images in training, inadequate performance of NMS in the task of bounding box retention and removal and technical barriers to object detector development. Accordingly, a training protocol was developed that leverages high inter and intra class variability in training data to enable the development of robust object detectors using relatively small, publicly available image datasets with minimal ‘infusion’ of domain-specific images for optimisation. A novel non-Intersect Over Union alternative algorithm to NMS, dubbed Confluence, is proposed, which uses the normalised Manhattan Distance between confluent candidate bounding boxes to reach a better balance between retention of true positives and removal of false positives. These contributions were brought together in the development of an open-source desktop application dubbed U-Infuse, which allows those not trained in computer science to use the location invariance training protocol and Confluence to develop and use their own high performance custom object detectors. Confluence was evaluated on standardised object detection benchmarks including MS COCO and PASCAL VOC, using multiple DCNN architectures, achieving state-of-theart results, validating its use in replacing NMS in any object detection application. The proposed training protocol was extensively evaluated out-of-sample and in-sample on a range of challenging datasets, including Snapshot Serengeti, Wildlife Conservation Society datasets and Camera CATalogue, demonstrating its robustness to single class and multi-class object detection for any species. Finally, U-Infuse was evaluated in a real-life case study; the task of feral cat detection in camera trap data collected from the New England Gorges.
This thesis has succeeded in its aims by providing a unique opportunity to advance the democratisation of artificially intelligent object detection by developing an open source, freely available app that leverages the power of the location invariance training method, and the optimal performance of Confluence allowing non-computer scientist to develop and deploy their own object detectors using their own data, on their own devices. Furthermore, the findings of this research indicate that broad adoption of Confluence would have extensive benefits in applications ranging from autonomous vehicles, aerial surveying and crowd counting.
Publication Type: | Thesis Doctoral |
Fields of Research (FoR) 2008: | 070702 Veterinary Anatomy and Physiology 080104 Computer Vision 080108 Neural, Evolutionary and Fuzzy Computation |
Socio-Economic Objective (SEO) 2008: | 830301 Beef Cattle 830311 Sheep - Wool 890299 Computer Software and Services not elsewhere classified |
HERDC Category Description: | T2 Thesis - Doctorate by Research |
Description: | | Please contact rune@une.edu.au if you require access to this thesis for the purpose of research or study.
Appears in Collections: | School of Science and Technology Thesis Doctoral
|