“How many images do I need?” Understanding how sample size per class affects deep learning model performance metrics for balanced designs in autonomous wildlife monitoring

Shahinfar, Saleh; Meek, Paul; Falzon, Gregory

Please use this identifier to cite or link to this item: https://hdl.handle.net/1959.11/58355

Title:	“How many images do I need?” Understanding how sample size per class affects deep learning model performance metrics for balanced designs in autonomous wildlife monitoring
Contributor(s):	Shahinfar, Saleh (author); Meek, Paul (author); Falzon, Gregory (author)
Publication Date:	2020
DOI:	10.1016/j.ecoinf.2020.101085
Handle Link:	https://hdl.handle.net/1959.11/58355
Abstract:	Deep learning (DL) algorithms are the state of the art in automated classification of wildlife camera trap images. The challenge is that the ecologist cannot know in advance how many images per species they need to collect for model training in order to achieve their desired classification accuracy. In fact there is limited empirical evidence in the context of camera trapping to demonstrate that increasing sample size will lead to improved accuracy. In this study we explore in depth the issues of deep learning model performance for progressively increasing per class (species) sample sizes. We also provide ecologists with an approximation formula to estimate how many images per animal species they need for certain accuracy level a priori. This will help ecologists for optimal allocation of resources, work and efficient study design. In order to investigate the effect of number of training images" seven training sets with 10, 20, 50, 150, 500, 1000 images per class were designed. Six deep learning architectures namely ResNet-18, ResNet-50, ResNet-152, DnsNet-121, DnsNet-161, and DnsNet-201 were trained and tested on a common exclusive testing set of 250 images per class. The whole experiment was repeated on three similar datasets from Australia, Africa and North America and the results were compared. Simple regression equations for use by practitioners to approximate model performance metrics are provided. Generalizes additive models (GAM) are shown to be effective in modelling DL performance metrics based on the number of training images per class, tuning scheme and dataset. Overall, our trained models classified images with 0.94 accuracy (ACC), 0.73 precision (PRC), 0.72 true positive rate (TPR), and 0.03 false positive rate (FPR). Variation in model performance metrics among datasets, species and deep learning architectures exist and are shown distinctively in the discussion section. The ordinary least squares regression models explained 57%, 54%, 52%, and 34% of expected variation of ACC, PRC, TPR, and FPR according to number of images available for training. Generalised additive models explained 77%, 69%, 70%, and 53% of deviance for ACC, PRC, TPR, and FPR respectively. Predictive models were developed linking number of training images per class, model, dataset to performance metrics. The ordinary least squares regression and Generalised additive models developed provides a practical toolbox to estimate model performance with respect to different numbers of training images.
Publication Type:	Journal Article
Source of Publication:	Ecological Informatics, v.57, p. 1-16
Publisher:	Elsevier BV
Place of Publication:	The Netherlands
ISSN:	1878-0512 1574-9541
Fields of Research (FoR) 2020:	3003 Animal production not elsewhere classified
Socio-Economic Objective (SEO) 2020:	tbd
Peer Reviewed:	Yes
HERDC Category Description:	C1 Refereed Article in a Scholarly Journal
Appears in Collections:	Journal Article School of Environmental and Rural Science School of Science and Technology

Files in This Item:

1 files

File	Size	Format

Show full item record

SCOPUS^TM
Citations

101

checked on Jul 6, 2024

Google Scholar^TM

Check

Research UNE

Files in This Item:

SCOPUS^TM
Citations

Google Scholar^TM

Altmetric

Research UNE

Files in This Item:

SCOPUSTM Citations

Google ScholarTM

Altmetric

SCOPUS^TM
Citations

Google Scholar^TM