Mushroom Species Classification 1.0

Using a Convolutional Neural Network to Identify Mushrooms

7 min readSep 13, 2020

Project Summary

This post is a working report of an ongoing project to build an image recognition model capable of identifying 20 different mushroom species with 90% accuracy over the testing data.

The models is a convolutional neural network built using Keras (running the Tensorflow backend), trained using Floydhub’s cloud GPU computation. The baseline model consisted of 9 hidden layers and achieved a test accuracy of 42.35% (compared with 5% chance). The baseline was then iterated upon, with a full training run for each individual adjustment. The best performing model consisted of 14 hidden layers, with approximately 25 million trainable parameters, which achieved a testing accuracy of 81.18%.

While the experimentation and iterative adjustment yielded a significant improvement over the baseline, the 90% goal was ultimately not reached. The project will therefore continue, and other approaches will be considered in order to reach the goal.

Introduction

There are over 4'500 species of mushrooms growing wild in the British Isles, and therefore there are plenty of opportunities for cooks, hikers, and nature enthusiasts to forage for some delicious fungi. The problem is that many of these species are remarkably similar to the untrained (and even the trained) eye. Many of these species are poisonous, and a handful of them even lethally so. Given that a tasty addition to an omelette may in fact be a one way ticket to the morgue, it is clear that the ability to identify the species of a particular mushroom is extremely important.

While mycologists can use laboratory equipment and DNA testing to identify species, it is unlikely that a forager will be able to tell the difference between a chanterelle and a false chanterelle, for instance, without years of experience and a handy guidebook or two. Furthermore, mushrooms of the same species can vary dramatically in appearance, and can only be identified through examination of finer details.

Left: tasty chanterelle (cantharellus cibarius); right: poisonous false chanterelle (hygrophoropsis aurantiaca)

Without expensive and expansive equipment, how is the layman supposed to differentiate between safe and poisonous mushrooms? This is precisely the problem that this project seeks to address. The aim was to deploy a deep learning algorithm (specifically a convolutional neural network) capable of recognising mushroom images and classifying them into species.

Given the number of mushroom species that exist, the project scope was limited to include only 20 species that are common to the United Kingdom. Of course, the resulting species list is a mere fraction of the total, however it is still extensive enough to cover the species that one is most likely to encounter whilst rambling through the British countryside. The resulting problem, technically speaking, is a 20-class image classification problem.

Step One: Data Collection

In order to train the convolutional neural network (CNN), I first needed to obtain a large number of images of different mushroom species. The species on the list were selected because of their range and commonness across the British Isles, however a large enough number of images for each is not easy to come by.

The bulk of the images were obtained from two primary sources:

Data Source 1: Atlas of Danish Fungi competition data (training set only): https://github.com/visipedia/fgvcx_fungi_comp/blob/master/README.md, see also https://svampe.databasen.org
Data Source 2: Mushroom Observer database https://mushroomobserver.org (see also specific csv files hosted by the website downloadable at: https://mushroomobserver.org/'database_name'.csv)

The images obtained from the Atlas of Danish Fungi competition were conveniently packaged into neat folders, classed by species name. However, an additional and recurring problem in the world of mushrooms is that many species have multiple names or synonyms. Therefore, each of the species names had to be cross referenced to a list of synonyms. This was achieved through the databases maintained by Mushroom Observer, which proved to be extremely helpful. The images hosted in the Mushroom Observer image databases themselves were obtained through a URL scrape and downloaded using a bulk tab downloader on Google Chrome (Tabsave).

Once the images were obtained, they needed to be inspected manually (one by one…) in order to remove certain ‘trash’ images; the datasets included, for instance, botanical drawings, microscopic images of spores, and so on. The resulting image set contained 10'135 images, which was deemed suitable (but not ideal) for the project.

Step Two: Data Preparation

The images were loaded and processed using the OpenCV CV2 library, and resized to a resolution of 200 by 200 pixels. This size was chosen because it was large enough to preserve the important image features, but small enough to avoid excessive model training time. Class labels were one-hot encoded using SciKit-Learn’s LabelEncoder. Overall, there was some class imbalance in the image data, but it was not deemed to be too severe.

The data was then split into training, validation, and testing sets (making sure to stratify the splits to preserve the class imbalance). The training set contained 8'108 images (80% of the total), the validation set contained 810 images (8%), and the testing set contained 1'217 images (12%); and the three sets were saved as separate NumPy arrays.

Step Three: Baseline Model

The baseline model was a convolutional neural network (CNN) consisting of 9 hidden layers, with all hyper-parameters set to Keras defaults:

The baseline network architecture was loosely based on VGG16, with expanding convolutional and max-pooling layer pairs, followed by a fully connected layer, trained for 15 epochs. No data augmentation was used.

The baseline did not perform particularly well with the validation accuracy plateauing around 40% after 4 epochs, and the loss increasing almost immediately.

The model achieved a testing accuracy of 43.39% with an overall loss of 2.6885, setting a benchmark against which the subsequent models were measured.

Step Four: Iterative Adjustments

With the benchmark accuracy of 43% set by the baseline, the strategy for improvement was to experiment with individual parameters and subjecting each experiment to a full training run. The results of each training run were reviewed, and the best performing model from each experiment was advanced onto the next stage. The results are summarised below:

The final model consisted of 14 hidden layers with expanding convolution and max-pooling layer pairs and two fully-connected layers. The model used LeakyReLU activation for all of the layers, a learning rate of 9E-4, and an Adamax optimiser.

The result was a testing accuracy of 81.18% and an overall loss of 0.6675. Which, unfortunately, does not meet the 90% accuracy target. An intermediate goal of the iteration phase of the project, however, was for the model to achieve an accuracy above 50% for each of the 20 classes. The final model was the first model to achieve this goal, as can be seen in the confusion matrix for the testing set.

A few key points for potential improvement were noted during the iterative phase:

Data augmentation significantly improved the generalisability of the models, but also greatly increased the training time. Furthermore, because of the way in which the Keras ImageDataGenerator seems to bottleneck, the training phase was not able to benefit from the high-performance GPU available on Floydhub, and therefore the standard GPU had to be used. Future work will seek to rectify this by adjusting the pipeline.
Initialising the weights using a Glorot Uniform distribution produced better results than when using He initialisation. This was unexpected, given that the models in the trial were using ReLU activation.
The learning rate of 9E-4 was settled upon after only a few trials, and due to time constraints, it was not possible to make use of a learning rate searcher. It is entirely possible, therefore, than the optimal learning rate was missed.
The Swish activation function was experimented with, but did not beat the LeakyReLU in terms of performance. However, because the activation function experiment took place earlier on in the process, it is possible that the Swish trial did not see the benefits of an extended training time.

An interesting lesson from the process was the way in which the parameters interact with each other. Over the course of the project, many models were trained and evaluated, and results appeared to depend on which order the adjustments were made. This suggests that a step-by-step approach whereby single parameters are tuned one at a time is not optimal, and may lead to missed opportunities for further improvement.

In any case, the best-performing model, model_13, is not that far from the goal of 90%. The project will be revisited in the future, with this model serving as the baseline for more experimentation.