Introduction

Hunters and conservationists alike delight in estimating the age and health of wildlife. Since capturing and tagging animals in the wild can prove difficult, tracking the wellness of species and individuals is typically performed by analyzing trail camera (“trail cam”) photographs of the wildlife. From the images, the age, health, and overall wellbeing of different species can be discerned. The goal of this project is to develop a machine learning (ML) algorithm to accurately predict the age of male whitetail deer (“bucks”) based on trail camera images.

Aspects of Aging

Whitetail buck age predictions are always estimated in half years (ex. 1.5 years, 2.5 years, etc.) due to their life cycle. Deer are typically born in the spring and harvested in the fall (September through January), roughly half-way through their current age year. Additionally, bucks shed their antlers near the end of deer hunting season between January and March. For this reason, estimating buck age during the winter is considerably more difficult.

Newborns (fawns) are the easiest to identify — they are skinny, awkward looking, and have white spots on their sides for the first 3-4 months of their lives. Needless to say, many hunters are interested in aging deer beyond the newborn stage, so many images will feature deer at 1.5 years or older. Considering the average lifetime of a deer in the wild ranges between three to six years, we expect the age ranges of the deer in our image database to lie somewhere in the set of values: 1.5, 2.5, 3.5, 4.5, or 5.5 years. In turn, the job of our ML algorithm becomes very straight forward: classify each image at a discrete age value.

In many of the articles we collect our data from, the authors provide some insight as to how a buck’s body changes throughout its lifetime; although not directly applicable to the ML model, their insights are helpful in building intuition into which features or patterns the ML model will likely learn. For instance, much like humans, each deer’s body grows and changes with age, and beyond peak maturity, the buck’s body may actually decrease in stature. Other common body features are listed below, compared for young and mature bucks.

FeatureYoungMature
Hind quarter widthThinWide
BellyAbove the brisketBelow the brisket
Muscular definitionLittle definitionSignificant definition
Antler spreadWidth of the earsWider than the ears
Tine lengthShortLong
Relative leg lengthLongShort
Neck widthThinWide
Table 1. Feature comparison of young male deer versus mature male deer.

Understanding the data

Image sources

There are thousands of trail camera images taken by hunters across the United States each year, and many of those hunters want to know how old the deer in their photo is. Sadly, very few of these photographs are seen by deer aging professionals; as such, it is left to the hunter to predict the deer’s age. Of the images that are seen by professionals, a very small fraction are printed online or in publications for other hunters and enthusiasts to see.

While the abundance of trail cam imagery may seem like a good thing for our ML model, we still need a validated age for each image; since these are in short supply, our database suffers. To alleviate our data drought, images used in this project are taken from a wide variety of sources including the National Deer Association (NDA), Field & Stream (F&S), state agencies, universities, and other conservation resources. If we chose to use images from a single source (ex. NDA’s “Age This!” competition), we could ensure consistency across the panel of experts. On the other hand, we’ve chosen to utilize multiple sources; not only will this allow us to grow out database faster, but it also allows a wider swath of professionals to weigh in on the variables within an image that indicate the deer’s age, resulting in an overall more robust algorithm.

When an whitetail buck’s image is accompanied by a validated age estimate, the deer’s predicted age and other information (geographic location, date/time of the image capture, etc.) is stored in the image’s metadata, and ultimately used to create the truth labels within the ML model.

Image Standardization

The images captured by trail cameras can differ wildly; in bright lighting conditions, trail cameras typically produce colored imagery, while dim conditions will produce grayscale images. Furthermore, all trail cameras are not created equal — their sensors and optics result in different aspect ratios, pixel resolution, memory, motion sensitivity, and other features. Additionally, some of our data is pulled directly from websites or PDFs, which contain adjustments to the image’s size and color.

Like any ML problem, we begin by cleaning our dataset. In our case, each image is cropped to remove additional information (ex. background clutter) and maximizing the amount of space taken up by the deer in each image. Each image is then proportionally resized and cropped to fit inside of a pre-determined square.

Figure 1. Pre-conditioning the images

Once shaped, each image is stored with a multi-part filename with the format XXXXXX_ZZZZZZ_SS_NpN_P, where XXXXXX and ZZZZZZ denote the date the image was collected and the date the image was originally taken, respectively; both dates use the format YYMMDD where Y, M, and D stand for the Year, Month, and Day (ex. 250331 for March 31, 2025). SS denotes the state the image was taken in (ex. “KY” for Kentucky), and NpN stands for the age of the deer (ex. 3p5 stands for 3.5 years). Lastly, P represents the provider name (ex. “RLT” for Realtree, “NDA” for National Deer Association, etc.).

Ingesting data

Images

We use glob to identify image files within our data folder, and matplotlib to read in each image. Each image is converted to grayscale, normalized, and stacked in a 3D array. At the same time, each deer’s age is extracted from the respective filename, creating our supervised learning labels. Querying the size of our data produces:

241 images found
Sample size: (241, 288, 288)

Labels

Based on the way deer’s ages are estimated, the output of our ML model will be discrete — that is, we’re asking our model to guess which age category a given picture belongs to. In ML terms, this means we’re trying to solve a classification problem, and this will determine the type of approach we take and algorithms we use.

As a visualization exercise, imagine we have a stack of physical images for each deer. We’re holding the stack of images in our hands while we stand in front of five buckets. Our task in this scenario is to look at each image image and place it in the correct bucket. The kicker here is that we’re allowed to know some of the answers in advance; we can look through the first 80% of the images, and see how old each deer is based on the age written on the back of the image. The last 20% of the dataset lacks the age information because the age has been smudged out.

Based on what we know of the deer in the images, we label each bucket 0 through 5, knowing that each will represent a different age range. For instance, bucket “0” will hold the images for 1.5 year old deer, bucket “1” will hold pictures for the 2.5 year old deer, and so on.

Figure 2. Imagining the computer vision problem.

In machine learning, the process of representing one value (ex. 1.5 years) by another value (e.g. “0”) is accomplished by mapping our data labels to integers. In our particular problem, there’s a catch. Although whitetail deer have been known to live as long as 22 years, many deer experts simply list a buck as “mature” once the deer reaches or exceeds an age of 5.5 years. This means that a buck aged 8.5 years will likely be judged by experts to be aged as “5.5 years”, “mature”, or “old”. We know a deer has reached an age older than 5.5 years based on assessment of their teeth post mortem.

For obvious reasons, this can be confusing since it not only ensures our age distributions will be non-Gaussian, but also because the deer’s body continues to change over time. For this reason, we group all mature bucks 5.5 years or older into the “5.5+” category and sanity check this by returning a list of the converted ages. We also note that the question we’re asking our model to solve is “Build a model that predicts each deer’s age between the ages of 1.5 and 5.5 years.” After grouping all bucks 5.5 years or older into the same category, we get the following data distributions.

Merged these ages into the 'mature' (5.5+) class: [np.float64(5.5), np.float64(12.5), np.float64(6.5), np.float64(7.5), np.float64(8.5)]
New label mapping: {np.float64(1.5): 0, np.float64(2.5): 1, np.float64(3.5): 2, np.float64(4.5): 3, np.float64(5.5): 4}

Class distribution after first split:
Label 0 (1.5): 29 samples
Label 1 (2.5): 41 samples
Label 2 (3.5): 44 samples
Label 3 (4.5): 29 samples
Label 4 (5.5): 49 samples

Training set class distribution (after both splits):
Label 0 (1.5): 23 samples
Label 1 (2.5): 33 samples
Label 2 (3.5): 35 samples
Label 3 (4.5): 23 samples
Label 4 (5.5): 39 samples

Validation set class distribution:
Label 0 (1.5): 6 samples
Label 1 (2.5): 8 samples
Label 2 (3.5): 9 samples
Label 3 (4.5): 6 samples
Label 4 (5.5): 10 samples

Data Augmentation

Machine learning models typically deal with tens of thousands of datapoints, and our current dataset limps in at 241. Although we strive to find more data in an ever-dwindling supply, we have the ability to augment our data via Keras’ ImageDataGenerator. Based on a handful of parameters, ImageDataGenerator applies well-known transformations to each image, resulting in a new image of the same deer. Using transformations like rotations, horizontal flipping, zooming, image shift, and brightness, the deer’s relative dimensions are preserved (ex. leg length to body thickness, snout length, etc.). In doing so, we buy ourselves extra data that we never had to spend time collecting (although we continue to collect more data on the side anyways). In this study, we initially set a multiplier of 30, meaning each image has the capacity to produce 30 total images, ultimately boosting our 241 image dataset to 7,230 images.

By augmenting each class separately, we can homogenize our data to ensure each age class within our training set contains the same number of images. The bar chart below shows the number of samples in each age class pre- and post-augmentation.

Following our split of training, testing, and validation, we begin to compare the “canned” classifiers. Shown in the figure below, six classifiers were compared: K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest, Logistic Regression, Gradient Boosting, and Decision Tree classifiers.

Of these, KNN performed significantly better with an accuracy of ~32.6%. While this may not seem like a high accuracy, randomly guessing an answer would give you an accuracy of 20%. Furthermore, we haven’t begun tuning our model’s hyperparameters yet! Let’s take a look at that using the following set of parameters as our baseline.

Transfer Learning

Comparing different odels and tuning their hyperparameters became a large part of this effort. After engineering a variety of optimization routines, I turned to transfer learning. By taking advantage of pre-trained models, we take advantage not only of larger datasets, but much larger trained models that we could never have made in the first place.

Although the model has long since changed and matured, KNN, SVM, and other algorithms were initially done away with in favor of ResNet-18. Based on the ImageNet dataset, ResNet-18 evaluates residual nets with a depth of 152 layers; furthermore, accuracy can be enhanced by implementing an ensemble of ResNet-18 models, achieving 3.57% error on the ImageNet test set.

To achieve the most generalized model, ResNet-18 was employed in a five-fold ensemble. In doing so, we end up using five individual models, each optimized on a different set of data allowing each model to gain unique insights and patterns. The training set is augmented via rotations, affine transforms, horizontal flipping, and color noise; since augmentation is done on a per-class basis, each class is perfectly balanced across the train, validation, and test datasets (although the test dataset contains no augmented images).

During the training process, the test set is held out so that none of the images within the test set are every seen by either the training or validation data. After the ensemble is tuned, test data is run through the final model to gauge its overall accuracy. The model weights are also stored in a .pth file and stored on Dropbox for later use.

Accuracy / Analysis

The model’s accuracy for all five age classes is shown below (left), including F1-score, Precision, and Recall. In each age group, accuracy exceeded 90%. Expanded into a confusion matrix, the CNNs accuracy illustrates a near perfect performance with the exception of 4.5 year old deer.

The analysis can be further explored via attention maps from the CNN analysis. Illustrated below, the data consist of 4 columns: the deer’s true age (left column), the original image (second column), the low-resolution attention map (third column) and the overlay (final column). Each row represents one deer from each age class.

In the first row, the attention map of the 1.5 year old deer is concentrated in the deer’s chest area, exactly where you’d expect the analysis to be based on the NDA’s emphasis on where the neck meets the body, and how smoothly that transition occurs. The analysis for the 2.5 year old deer seems to concentrate in a similar region, near the neck. The attention map of the 5.5 year old deer seems to be centered on the stomach region; again, this is expected since older deer tend to have a sagging belly compared to the rest of their body.

Conclusion

In this study, we explored the use of ensemble convolutional neural networks for predicting the age of whitetail bucks in both color and grayscale images. Although the current model has excelled beyond the point illustrated in this article, the results highlight the undeniable potential of computer vision and CNNs in the biology field, regardless of background, deer pose, age, date, time, or location.