(Under construction)

Introduction

Hunters and conservationists alike delight in estimating the age and health of wildlife. Since capturing and tagging animals in the wild can prove difficult, tracking the wellness of species and individuals is typically performed by analyzing trail camera (“trail cam”) photographs of the wildlife. From the images, the age, health, and overall wellbeing of different species can be discerned. The goal of this project is to develop a machine learning (ML) algorithm to accurately predict the age of male whitetail deer (“bucks”) based on trail camera images.

Aspects of Aging

Whitetail buck age predictions are always estimated in half years (ex. 1.5 years, 2.5 years, etc.) due to their life cycle. Deer are typically born in the spring and harvested in the fall (September through January), roughly half-way through their current age year. Additionally, bucks shed their antlers near the end of deer hunting season between January and March. For this reason, estimating buck age during the winter is considerably more difficult.

Newborns (fawns) are the easiest to identify — they are skinny, awkward looking, and have white spots on their sides for the first 3-4 months of their lives. Needless to say, many hunters are interested in aging deer beyond the newborn stage, so many images will feature deer at 1.5 years or older. Considering the average lifetime of a deer in the wild ranges between three to six years, we expect the age ranges of the deer in our image database to lie somewhere in the set of values: 1.5, 2.5, 3.5, 4.5, or 5.5 years. In turn, the job of our ML algorithm becomes very straight forward: classify each image at a discrete age value.

In many of the articles we collect our data from, the authors provide some insight as to how a buck’s body changes throughout its lifetime; although not directly applicable to the ML model, their insights are helpful in building intuition into which features or patterns the ML model will likely learn. For instance, much like humans, each deer’s body grows and changes with age, and beyond peak maturity, the buck’s body may actually decrease in stature. Other common body features are listed below, compared for young and mature bucks.

FeatureYoungMature
Hind quarter widthThinWide
BellyAbove the brisketBelow the brisket
Muscular definitionLittle definitionSignificant definition
Antler spreadWidth of the earsWider than the ears
Tine lengthShortLong
Relative leg lengthLongShort
Neck widthThinWide
Table 1. Feature comparison of young male deer versus mature male deer.

Understanding the data

Image sources

There are thousands of trail camera images taken by hunters across the United States each year, and many of those hunters want to know how old the deer in their photo is. Sadly, very few of these photographs are seen by deer aging professionals; as such, it is left to the hunter to predict the deer’s age. Of the images that are seen by professionals, a very small fraction are printed online or in publications for other hunters and enthusiasts to see.

While the abundance of trail cam imagery may seem like a good thing for our ML model, we still need a validated age for each image; since these are in short supply, our database suffers. To alleviate our data drought, images used in this project are taken from a wide variety of sources including the National Deer Association (NDA), Field & Stream (F&S), state agencies, universities, and other conservation resources. If we chose to use images from a single source (ex. NDA’s “Age This!” competition), we could ensure consistency across the panel of experts. On the other hand, we’ve chosen to utilize multiple sources; not only will this allow us to grow out database faster, but it also allows a wider swath of professionals to weigh in on the variables within an image that indicate the deer’s age, resulting in an overall more robust algorithm.

When an whitetail buck’s image is accompanied by a validated age estimate, the deer’s predicted age and other information (geographic location, date/time of the image capture, etc.) is stored in the image’s metadata, and ultimately used to create the truth labels within the ML model.

Image Standardization

The images captured by trail cameras can differ wildly; in bright lighting conditions, trail cameras typically produce colored imagery, while in dim conditions the same cameras will produce grayscale images. Furthermore, all trail cameras are not created equal — their sensors and optics result in different aspect ratios, pixel resolution, memory, motion sensitivity, and other features. Additionally, some of our data is pulled directly from websites or PDFs, which contain adjustments to the image’s size and color.

Like any ML problem, we begin by cleaning our dataset. In our case, each image is cropped to remove additional information (ex. background clutter) and maximizing the amount of space taken up by the deer in each image. Each image is then proportionally resized and cropped to fit inside of a pre-determined square. If necessary, the remaining image is combined with a white square to maintain overall shape, as shown in Figure 1.

Figure 1. Pre-conditioning the images

Once shaped, each image is stored with a multi-part filename with the format XXXXXX_ZZZZZZ_SS_NpN_P, where XXXXXX and ZZZZZZ denote the date the image was collected and the date the image was originally taken, respectively; both dates use the format YYMMDD where Y, M, and D stand for the Year, Month, and Day (ex. 250331 for March 31, 2025). SS denotes the state the image was taken in (ex. “KY” for Kentucky), and NpN stands for the age of the deer (ex. 3p5 stands for 3.5 years). Lastly, P represents the provider name (ex. “RLT” for Realtree, “NDA” for National Deer Association, etc.).

Ingesting data

Images

We use glob to identify image files within our data folder, and matplotlib to read in each image. Each image is converted to grayscale, normalized, and stacked in a 3D array. At the same time, each deer’s age is extracted from the respective filename, creating our supervised learning labels. Querying the size of our data produces:

241 images found
Sample size: (241, 288, 288)

Labels

Based on the way deer’s ages are estimated, the output of our ML model will be discrete — that is, we’re asking our model to guess which age category a given picture belongs to. In ML terms, this means we’re trying to solve a classification problem, and this will determine the type of approach we take and algorithms we use.

As a visualization exercise, imagine we have a stack of physical images for each deer. We’re holding the stack of images in our hands while we stand in front of five buckets. Our task in this scenario is to look at each image image and place it in the correct bucket. The kicker here is that we’re allowed to know some of the answers in advance; we can look through the first 80% of the images, and see how old each deer is based on the age written on the back of the image. The last 20% of the dataset lacks the age information because the age has been smudged out.

Based on what we know of the deer in the images, we label each bucket 0 through 5, knowing that each will represent a different age range. For instance, bucket “0” will hold the images for 1.5 year old deer, bucket “1” will hold pictures for the 2.5 year old deer, and so on.

Figure 2. Imagining the computer vision problem.

In machine learning, the process of representing one value (ex. 1.5 years) by another value (e.g. “0”) is accomplished by mapping our data labels to integers. In our particular problem, there’s a catch. Although whitetail deer have been known to live as long as 22 years, many deer experts simply list a buck as “mature” once the deer reaches or exceeds an age of 5.5 years. This means that a buck aged 8.5 years will likely be judged by experts to be aged as “5.5 years”, “mature”, or “old”. We know a deer has reached an age older than 5.5 years based on assessment of their teeth post mortem.

For obvious reasons, this can be confusing since it not only ensures our age distributions will be non-Gaussian, but also because the deer’s body continues to change over time. For this reason, we group all mature bucks 5.5 years or older into the “5.5+” category and sanity check this by returning a list of the converted ages. We also note that the question we’re asking our model to solve is “Build a model that predicts each deer’s age between the ages of 1.5 and 5.5 years.” After grouping all bucks 5.5 years or older into the same category, we get the following data distributions.

Merged these ages into the 'mature' (5.5+) class: [np.float64(5.5), np.float64(12.5), np.float64(6.5), np.float64(7.5), np.float64(8.5)]
New label mapping: {np.float64(1.5): 0, np.float64(2.5): 1, np.float64(3.5): 2, np.float64(4.5): 3, np.float64(5.5): 4}

Class distribution after first split:
Label 0 (1.5): 29 samples
Label 1 (2.5): 41 samples
Label 2 (3.5): 44 samples
Label 3 (4.5): 29 samples
Label 4 (5.5): 49 samples

Training set class distribution (after both splits):
Label 0 (1.5): 23 samples
Label 1 (2.5): 33 samples
Label 2 (3.5): 35 samples
Label 3 (4.5): 23 samples
Label 4 (5.5): 39 samples

Validation set class distribution:
Label 0 (1.5): 6 samples
Label 1 (2.5): 8 samples
Label 2 (3.5): 9 samples
Label 3 (4.5): 6 samples
Label 4 (5.5): 10 samples

Data Augmentation

Machine learning models typically deal with tens of thousands of datapoints, and our current dataset limps in at 241. Although we strive to find more data in an ever-dwindling supply, we have the ability to augment our data via Keras’ ImageDataGenerator. Based on a handful of parameters, ImageDataGenerator applies well-known transformations to each image, resulting in a new image of the same deer. Using transformations like rotations, horizontal flipping, zooming, image shift, and brightness, the deer’s relative dimensions are preserved (ex. leg length to body thickness, snout length, etc.). In doing so, we buy ourselves extra data that we never had to spend time collecting (although we continue to collect more data on the side anyways). In this study, we initially set a multiplier of 30, meaning each image has the capacity to produce 30 total images, ultimately boosting our 241 image dataset to 7,230 images.

By augmenting each class separately, we can homogenize our data to ensure each age class within our training set contains the same number of images. The bar chart below shows the number of samples in each age class pre- and post-augmentation.

Following our split of training, testing, and validation, we begin to compare the “canned” classifiers. Shown in the figure below, six classifiers were compared: K-Nearest Neighbors (KNN), Support Vector Machines (SVM), Random Forest, Logistic Regression, Gradient Boosting, and Decision Tree classifiers.

Of these, KNN performed significantly better with an accuracy of ~32.6%. While this may not seem like a high accuracy, randomly guessing an answer would give you an accuracy of 20%. Furthermore, we haven’t begun tuning our model’s hyperparameters yet! Let’s take a look at that using the following set of parameters as our baseline.

Hyperparameter tuning

Hyperparameter tuning begins with understanding the arguments made to the classifier. For scikit-learn’s KNN, this includes the number of neighbors considered (n_neighbors), how the weighting is performed (weights), the distance algorithm used (algorithm), the size of the leaf (leaf_size), the Power parameter (p), the method used to find the metric value (metric) and any associated metric parameters (metric_params). The final parameter, n_jobs is useful for speeding up the algorithm.

classifier = KNeighborsClassifier(n_neighbors=n,
                                  weights='distance',
                                  algorithm='auto',
                                  leaf_size=30,
                                  p=2,
                                  metric='minkowski',
                                  metric_params=None,
                                  n_jobs=None)

Number of Neighbors

Keep all other variables constant, we find a significant increase in both model metrics at a value of N=9. We’ll keep this in our classifier model, and move on to weights.

Figure 3. Model accuracy and F1 scores versus number of nearest neighbors.

But there’s another step that needs to be performed to enable categorical prediction, and that is encoding, also referred to as “one hot” encoding. This coding method simply breaks up a list of N categories into N separate columns, and assigns each column a binary answer. To illustrate, consider a dataset of four images (below, left), where each image contains one animal — the first image contains a cat, the second contains a dog, the third contains another cat, and the fourth image contains a squirrel. After applying one hot encoding, our “Animal” column is replaced by three columns, one for each animal. The first image would have be assigned a “1” under “Cat” because it is an image of a cat, and “0” under the “Dog” and “Squirrel” columns, because the first image contains neither a dog nor a squirrel. The same logic continues for the other three images.

# One-hot encode labels AFTER splitting but BEFORE the next split
num_classes = len(label_mapping)
y_train_val_onehot = keras.utils.to_categorical(y_train_val, num_classes)
y_test_onehot = keras.utils.to_categorical(y_test, num_classes)

# Normalize and reshape images
X_train_val = X_train_val.astype("float32") / 255.0
X_test = X_test.astype("float32") / 255.0
X_train_val = X_train_val.reshape(X_train_val.shape[0], 288, 288, 1)
X_test = X_test.reshape(X_test.shape[0], 288, 288, 1)

# Second split without stratification for validation
X_train_orig, X_valid, y_train_orig, y_valid = train_test_split(
    X_train_val, 
    y_train_val_onehot, 
    test_size=0.2, 
    random_state=42
    # Removed stratify parameter
)

# Print the class distribution to check
print("nTraining set class distribution (after both splits):")
train_class_dist = np.argmax(y_train_orig, axis=1)
for label in np.unique(train_class_dist):
    count = np.sum(train_class_dist == label)
    print(f"Label {label} ({list(label_mapping.keys())[list(label_mapping.values()).index(label)]}): {count} samples")

print("nValidation set class distribution:")
valid_class_dist = np.argmax(y_valid, axis=1)
for label in np.unique(valid_class_dist):
    count = np.sum(valid_class_dist == label)
    print(f"Label {label} ({list(label_mapping.keys())[list(label_mapping.values()).index(label)]}): {count} samples")

The same encoding is carried out on our dataset, except our data contains ages instead of animals. But why? This is a common technique to transform our categorical data (ex. “1.5 years”) into numerical format the machine learning algorithm can efficiently handle. Processing a pattern of 0’s and 1’s is much easier for pattern comparison than matching decimal values.

Data Augmentation

Our dataset is meager compared to datasets commonly used to train ML models. Typically, a dataset will include tens thousands of datapoints in stead of our 40; as we’ve discussed, this is due to the problem of availability. To alleviate this problem, we can use a technique called “data augmentation”, which does exactly what the name suggests — generate more data based off our original dataset.

# After your initial train/test split, you should have X_train_val, X_test, y_train_val, y_test
# X_train_val and y_train_val are what you want to use instead of X_train and y_train

# Get the number of classes
num_classes = len(label_mapping)

# One-hot encode labels BEFORE splitting into train/validation
y_train_val_onehot = keras.utils.to_categorical(y_train_val, num_classes)
y_test_onehot = keras.utils.to_categorical(y_test, num_classes)

# Reshape data to add channel dimension
X_train_val = X_train_val.reshape(X_train_val.shape[0], 288, 288, 1)
X_test = X_test.reshape(X_test.shape[0], 288, 288, 1)

# Create a validation set (without stratification)
X_train_orig, X_valid, y_train_orig, y_valid = train_test_split(
    X_train_val, 
    y_train_val_onehot, 
    test_size=0.2, 
    random_state=42
)





from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Print original sizes
print("nBefore augmentation:")
print(X_train_orig.shape[0], "train samples")
print(X_test.shape[0], "test samples")
print(X_valid.shape[0], "validation samples")

# Setup more diverse but moderate data augmentation
datagen = ImageDataGenerator(
    rotation_range=20,              # Slightly more rotation
    width_shift_range=0.2,          # More shifting
    height_shift_range=0.2,
    zoom_range=0.2,                 # More zooming
    horizontal_flip=True,           # Horizontal flip is good
    brightness_range=[0.7, 1.3],    # More brightness variation
    shear_range=15,                 # More shearing
    fill_mode='nearest',
    # Add these new augmentations:
    channel_shift_range=0.1,        # Slight color changes
    vertical_flip=False,            # Deer won't be upside down in real images
)

# Generate augmented data in advance
# Define how many augmented samples per original sample - reduced from 50 to 10
augmentation_factor = 10  # More reasonable multiplication factor
num_to_generate = X_train_orig.shape[0] * augmentation_factor

# Initialize empty arrays for augmented data
augmented_images = []
augmented_labels = []

# Create augmented images batch by batch
batch_size = 32
generated_count = 0

# Create a flow from the original data (without shuffling)
aug_gen = datagen.flow(
    X_train_orig, 
    y_train_orig,
    batch_size=batch_size,
    shuffle=False  # Important: keep the same order as labels
)

while generated_count < num_to_generate:
    # Get the next batch
    x_batch, y_batch = next(aug_gen)
    
    # Add to our collections
    augmented_images.append(x_batch)
    augmented_labels.append(y_batch)
    
    # Update the count
    generated_count += len(x_batch)
    
    # Break if we've generated enough
    if generated_count >= num_to_generate:
        break

# Concatenate all batches
augmented_images = np.concatenate(augmented_images)
augmented_labels = np.concatenate(augmented_labels)

# Trim excess (due to batch size)
augmented_images = augmented_images[:num_to_generate]
augmented_labels = augmented_labels[:num_to_generate]

# Combine with original data
X_train_combined = np.concatenate([X_train_orig, augmented_images])
y_train_combined = np.concatenate([y_train_orig, augmented_labels])

# Print new sizes after augmentation
print("nAfter augmentation:")
print("Original training samples:", X_train_orig.shape[0])
print("Augmented training samples:", augmented_images.shape[0])
print("Combined training samples:", X_train_combined.shape[0])
print("Augmentation multiplier:", X_train_combined.shape[0] / X_train_orig.shape[0])
print("X_train_combined shape:", X_train_combined.shape)
print("y_train_combined shape:", y_train_combined.shape)





from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Flatten, Dense, Dropout, GlobalAveragePooling2D
from keras.regularizers import l2

# Redefine the model with L2 regularization
model = Sequential()
# Start with fewer filters
model.add(Conv2D(8, kernel_size=3, padding='same', activation='relu', input_shape=(288, 288, 1)))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(16, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
model.add(Conv2D(32, kernel_size=3, padding='same', activation='relu'))
model.add(MaxPooling2D(pool_size=2))
# Global pooling instead of more conv layers
model.add(GlobalAveragePooling2D())
# Single dense layer with higher dropout
model.add(Dense(32, activation='relu'))
model.add(Dropout(0.6))  # Higher dropout to prevent overfitting
model.add(Dense(num_classes, activation='softmax'))
model.summary()

# Compile with a lower learning rate
from keras.optimizers import RMSprop
model.compile(
    loss='categorical_crossentropy', 
    optimizer=RMSprop(learning_rate=0.0001), 
    metrics=['accuracy']
)




# Calculate class weights properly based on the class distribution
from sklearn.utils.class_weight import compute_class_weight
import numpy as np

# Extract the class labels from one-hot encoded y_train_orig
y_integers = np.argmax(y_train_orig, axis=1)

# Compute balanced class weights
class_weights = compute_class_weight(
    class_weight='balanced',
    classes=np.unique(y_integers),
    y=y_integers
)

# Convert to dictionary format for Keras
class_weight_dict = {i: weight for i, weight in enumerate(class_weights)}

print("Class weights:", class_weight_dict)

# Now train the model with properly calculated class weights
checkpointer = ModelCheckpoint(
    filepath='model_augmented.weights.best.hdf5.keras',
    verbose=1, 
    save_best_only=True
)

early_stopping = EarlyStopping(
    monitor='val_loss',
    patience=15,  # Give it more time to learn with class weights
    restore_best_weights=True,
    verbose=1
)

print("nTraining with class weights:")
hist_augmented = model.fit(
    X_train_combined, 
    y_train_combined,
    batch_size=16,
    epochs=100,  # Increase epochs to give more training time
    validation_data=(X_valid, y_valid),
    callbacks=[checkpointer, early_stopping],
    verbose=1,
    shuffle=True,
    class_weight=class_weight_dict  # Use the properly calculated weights
)






import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.metrics import precision_score, recall_score, f1_score, confusion_matrix

# Evaluate with the best weights
model.load_weights('model_augmented.weights.best.hdf5.keras')

# Create a reverse mapping to get original labels
reverse_mapping = {i: label for label, i in label_mapping.items()}

# Make predictions
y_pred_prob = model.predict(X_test, verbose=0)
y_pred = np.argmax(y_pred_prob, axis=1)
y_true = np.argmax(y_test_onehot, axis=1)

# Create a comprehensive evaluation table that includes all classes
all_classes = list(range(num_classes))
all_class_names = [reverse_mapping[i] for i in all_classes]

# Create a comprehensive DataFrame
results_df = pd.DataFrame({
    'Class Index': all_classes,
    'Original Label': all_class_names,
    'In Test Set': [i in y_true for i in all_classes],
    'In Predictions': [i in y_pred for i in all_classes]
})

# Add metrics where applicable
precision_values = []
recall_values = []
f1_values = []
for cls in all_classes:
    if cls in y_true and cls in y_pred:
        # We can calculate metrics for this class
        true_binary = (y_true == cls).astype(int)
        pred_binary = (y_pred == cls).astype(int)
        precision_values.append(precision_score(true_binary, pred_binary, zero_division=0))
        recall_values.append(recall_score(true_binary, pred_binary, zero_division=0))
        f1_values.append(f1_score(true_binary, pred_binary, zero_division=0))
    else:
        # Class not present in test set or predictions
        precision_values.append(float('nan'))
        recall_values.append(float('nan'))
        f1_values.append(float('nan'))

results_df['Precision'] = precision_values
results_df['Recall'] = recall_values
results_df['F1 Score'] = f1_values

print("Comprehensive Class Evaluation:")
print(results_df.to_string(index=False))

# Create a confusion matrix (will only show classes present in test set)
cm = confusion_matrix(y_true, y_pred)
present_classes = sorted(set(np.concatenate([y_true, y_pred])))
plt.figure(figsize=(5, 4))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
           xticklabels=[f"{reverse_mapping[i]}" for i in present_classes],
           yticklabels=[f"{reverse_mapping[i]}" for i in present_classes])
plt.xlabel('Predicted')
plt.ylabel('True')
plt.title('Confusion Matrix (Only Classes Present in Test Set)')
plt.show()

# Show overall accuracy
accuracy = np.mean(y_pred == y_true)
print(f"nOverall Test Accuracy: {accuracy:.4f}")

Before augmentation:
25 train samples
8 test samples
7 validation samples

After augmentation:
Original training samples: 25
Augmented training samples: 250
Combined training samples: 275
Augmentation multiplier: 11.0
X_train_combined shape: (275, 288, 288, 1)
y_train_combined shape: (275, 5)

Model: "sequential_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓
┃ Layer (type)                          Output Shape                         Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩
│ conv2d (Conv2D)                      │ (None, 288, 288, 16)        │             160 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d (MaxPooling2D)         │ (None, 144, 144, 16)        │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_1 (Conv2D)                    │ (None, 144, 144, 32)        │           4,640 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_1 (MaxPooling2D)       │ (None, 72, 72, 32)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_2 (Conv2D)                    │ (None, 72, 72, 64)          │          18,496 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_2 (MaxPooling2D)       │ (None, 36, 36, 64)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ conv2d_3 (Conv2D)                    │ (None, 36, 36, 64)          │          36,928 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ max_pooling2d_3 (MaxPooling2D)       │ (None, 18, 18, 64)          │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ global_average_pooling2d             │ (None, 64)                  │               0 │
│ (GlobalAveragePooling2D)             │                             │                 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense (Dense)                        │ (None, 64)                  │           4,160 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dropout (Dropout)                    │ (None, 64)                  │               0 │
├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤
│ dense_1 (Dense)                      │ (None, 5)                   │             325 │
└──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
 Total params: 64,709 (252.77 KB)
 Trainable params: 64,709 (252.77 KB)
 Non-trainable params: 0 (0.00 B)

Training with pre-generated augmented data:
Epoch 1/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 351ms/step - accuracy: 0.1925 - loss: 1.9045
Epoch 1: val_loss improved from inf to 1.97243, saving model to model_augmented.weights.best.hdf5.keras
5/5 ━━━━━━━━━━━━━━━━━━━━ 8s 2s/step - accuracy: 0.1916 - loss: 1.8951 - val_accuracy: 0.1429 - val_loss: 1.9724
Epoch 2/20
5/5 ━━━━━━━━━━━━━━━━━━━━ 0s 329ms/step - accuracy: 0.3223 - loss: 1.7746
Epoch 2: val_loss improved from 1.97243 to 1.94029, saving model to model_augmented.weights.best.hdf5.keras

Comprehensive Class Evaluation:
Class Index Original Label In Test Set In Predictions Precision Recall F1 Score
0 1.5 True False NaN NaN NaN
1 2.5 True True 0.375 1.0 0.545455
2 3.5 True False NaN NaN NaN
3 4.5 True False NaN NaN NaN
4 5.5 True False NaN NaN NaN
5 6.5 False False NaN NaN NaN
6 12.5 False False NaN NaN NaN

Overall Test Accuracy: 0.3750

At a fundamental level, each datapoint comprises an image and a known age. In turn, the age estimate is based on the relative size of different parts — or features — of the deer in the image. This means we can compare the same features of a buck regardless of the direction the buck is facing; flipping an image horizontally, for instance, does not change the relative feature sizes of the deer, nor does it change the deer’s age. By mirroring each image, then, we can double the size of our dataset.

As a second example, the same can be said for how large the deer in each image is — nose to tail, the relative portions of the deer do not change based on how much the camera has “zoomed in” on the deer. So scaling becomes another parameter we can change. Even more data could be extracted by slightly rotating the image of the deer, and so on. Taken together, data augmentation allows us to rotate, zoom, flip, and change each image randomly to quickly provide us with a larger dataset on which we can train our ML model. These changes are manifested in code below.

Before augmentation:
25 train samples
8 test samples
7 validation samples

After augmentation:
Original training samples: 25
Augmented training samples: 125
Combined training samples: 150
Augmentation multiplier: 6.0
X_train_combined shape: (150, 288, 288, 1)
y_train_combined shape: (150, 7)

(in progress…)

(in progress…)

(in progress…)

(in progress…)

(in progress…)

(in progress…)

Other Considerations

There is a natural “maximum” for which most deer experts agree; although whitetail deer have been known to live as long as 22 years, many experts label any lifespan beyond 5.5 years as “mature” and move on with their day.

Conclusion

(In progress…)