Proposed models and their accuracy rates reported in the literature.
Open access peer-reviewed article
This Article is part of Artificial Intelligence Section
Version of Record (VOR)
*This version of record replaces the original advanced online publication published on 03/02/2026

Article metrics overview
46 Article Downloads
View Full Metrics
Article Type: Research Paper
Date of acceptance: February 2026
Date of publication: February 2026
DoI: 10.5772/acrt20250101
copyright: ©2025 The Author(s), Licensee IntechOpen, License: CC BY 4.0
Tomatoes are an important crop around the world, both for their nutritional value and their economic value. However, if you do not catch the different diseases that can affect tomato leaves early, they can cause lower yields and lower quality. Small- and medium-sized producers need to be especially careful about how they classify these diseases. This study comparatively assessed deep learning models employing transfer learning for the visual diagnosis of tomato leaf diseases. We used a number of pretrained models, such as DenseNet121, MobileNetV2, ResNet50, EfficientNetB0, VGG16, VGG19, and InceptionV3. One of four optimization algorithms was used to train each model: Adam, Stochastic Gradient Descent, Nadam, or AdamW. The experiments utilized the publicly available Tomato Leaf Disease Detection dataset, comprising 10 classes (9 diseases and 1 healthy class) and 10,000 images equally distributed across categories. An 80/20 training–testing split was applied, and an independent external test set was used to assess model generalization. Data augmentation, dropout, early stopping, and learning rate schedulers were all part of the training process. The performance was measured by the model’s accuracy, loss values, and confusion matrices. The AdamW optimization algorithm worked best with the EfficientNetB0 model, which got 99.72% accuracy on the validation set and 99.68% accuracy on the test set. That means that transfer learning techniques could be used to make quick, accurate, and reliable diagnostic systems in the field of agricultural technology.
Deep learning
optimization algorithms
tomato leaf diseases
transfer learning
Author information
The agricultural world has already become one of the largest sectors providing food for people globally. It is hard to imagine a sustainable and efficient world without agriculture as the world economy is highly dependent on this sector. Hence, the rapid increase of the population all over the world gets to the point where the sector has to grow that fast. The United Nations Development Programme (UNDP) [1] report predicts world population to reach 9.9 billion by 2050; hence, food production must increase up to 98% ready to meet future needs. Both product quality and controlling plant diseases are critical for achieving the goal of sustainable agriculture. In this context, getting to know a plant disease at its early stage becomes of utmost importance to save crops, reduce pesticides, and hence, minimize the economic impact [2]. Tomato (Solanum lycopersicum) is a widely grown crop around the world because it is good for both eating fresh and using in industry. However the tomato plant is very vulnerable to many foliar diseases caused by bacteria, fungi, and viruses [3]. Bacterial Spot, Early Blight, Late Blight, Leaf Mold, Septoria Leaf Spot, Target Spot, Spider Mite damage, and Tomato Yellow Leaf Curl Virus are some of the most common diseases. If these diseases are not diagnosed correctly, the wrong chemicals may be used, which wastes time and money and has a direct negative effect on yield and quality [4]. Expert agronomists’ field observations, microscopic examinations, and lab tests are the basis of traditional diagnostic methods. These steps often take a lot of time, cost a lot of money, and are easy to mess up. Because of this, automated diagnostic systems that use image processing, machine learning, and especially deep learning techniques have become more popular in recent years [1, 5]. Convolutional neural networks (CNNs) are known for their ability to recognize things visually, and they have become more useful in agriculture, especially because they do a better job of classifying things [6]. Transfer learning and other methods have become very popular because it takes a lot of labeled data to train these models from scratch. Transfer learning is when you use a model that has already been trained on a large dataset for a new but similar task. It is a powerful tool for machine learning that helps with the problem of not having enough training data [7].
The primary aim of this study is to systematically evaluate and compare the performance of multiple transfer learning-based deep learning architectures for the automated classification of tomato leaf diseases. Specifically, the study investigates the joint effect of different CNN architectures and optimization algorithms on classification accuracy and model robustness. By conducting a large-scale comparative analysis across diverse model–optimizer combinations, this research aims to identify the most effective configurations for accurate, reliable, and practical tomato leaf disease detection.
In this study, a dataset comprising 10,000 images including nine different diseased classes and one healthy class of tomato leaves was obtained from the Kaggle platform for the purpose of visual classification of tomato leaf diseases. Based on this dataset, seven different pretrained CNN models (DenseNet121, MobileNetV2, ResNet50, EfficientNetB0, VGG16, VGG19, and InceptionV3) were trained using four distinct optimization algorithms (Adam, Stochastic Gradient Descent [SGD], Nadam, and AdamW). Their performance was assessed through a series of comparative experiments. Overfitting was reduced, and overall model performance was enhanced through the use of techniques such as data augmentation, dropout, learning rate scheduling, and early stopping during the training process. Among the combinations evaluated, the EfficientNetB0 and AdamW pairing was the highest performer in both validation and testing, producing 99.65% validation accuracy and 99.60% test accuracy. The study concluded with the trained models being incorporated into a web interface based on Streamlit which allowed the users to upload leaf images and receive predictions of the disease in real time.
The classification of tomato leaf diseases has been the focus of many papers; however, the latest research still mainly deals with the use of a single deep learning technique or a few models. Basically, the effect of the optimization algorithms on the performance of the model is still a hot topic that is yet to be explored. Another thing is the fact that past studies have mostly focused on comparing models, while the interaction between model architecture and optimizer choice – which directs the classification accuracy – has not been discussed in depth. This study has filled the gap by conducting a comparative analysis of seven transfer learning-based CNN architectures with four different optimization algorithms, resulting in 28 model–optimizer combinations. The authors thus offer a broad and well-rounded assessment, presenting a significant contribution to the literature by emphasizing the joint influence of both the architectural and optimizer choices. In this way, the paper provides a comprehensive picture of how model and optimizer selection impact classification performance, thereby identifying the most effective model–optimizer pairs for the automatic diagnosis of tomato leaf diseases. Besides, the integration of the developed system into an interactive web-based interface proves that the solution is not only of scholarly interest but also provides a practical tool that agricultural producers can readily use.
The second part of the research reveals the detailed account of the methods and techniques applied in the experimental process. The third part consists of the experimental results along with their assessment and discussion. The last part, finally, sums up the total findings of the research and suggests the future research areas based on its results.
Previous studies on tomato leaf disease detection face several limitations. Many approaches rely on single-model evaluations, restricting their generalizability across different architectures. Furthermore, insufficient attention has been given to optimizer selection, despite its critical role in training stability and convergence behavior. Numerous studies have been conducted that applied models based on CNN for the classification of diseases in tomato leaves. Among them, Subramanian and Kumar presented a custom **CNN** model using multi-class classification for the efficient and precise detection of tomato leaf diseases. Their model achieved a success rate of 94% in correctly diagnosing tomato leaf infections, thus indicating that the model was excellent in classification [8]. Tm et al. used a lightweight variant of the CNN model called LeNet and obtained an average accuracy rate of 94%–95%. Their work demonstrated the possibility of using neural-network-based approaches with limited computing resources and harsh conditions by applying even the most basic configurations [9]. Transfer learning was heavily researched through comparative studies. Peyal and associates carried out a study using transfer learning-based CNN architectures such as VGG-16, VGG-19, and Inception-V3 to classify diseases of tomato leaves. A dataset of 11,000 images was used for the comparative analysis, which indicated that Inception-V3 achieved the highest classification accuracy [10]. Too et al., by screening all the CNN models, did not leave any of them out, and as a result, VGG16, Inception V4, Residual Network (ResNet; with 50, 101, and 152 layers), and DenseNet121 were the ones they used in their evaluation, among others, over the comprehensive PlantVillage dataset. The data showed that DenseNet121 was phenomenal in terms of training stability and testing accuracy as compared to other models [11]. In a different instance, the researchers working in Bangladesh made a comparison of a custom CNN model that was trained on a local tomato dataset with the architectures of YOLOv5, MobileNetV2, and ResNet18. The CNN model which was proposed was designed to work with the tomato leaf images that were preprocessed and had a size of 224 × 224 pixels with three color channels. It contained four convolutional layers with each layer having a 2 × 2 kernel and “same” padding. The custom CNN model produced the most accurate results with a figure of 95.2%. This indicates that specialized architectures can perform very well when trained on sliced datasets [12]. It was also reported in another research paper that applying transfer learning methods based on ResNet could lead to getting the accuracy rates up to 98.9%. We used multilayer perceptrons to classify the features we got from different layers of the ResNet101 and ResNet152 models. The mean deviation method improved accuracy even more, and the results were thought to be useful for railway camera systems [13]. Sanida et al. created a model using a pretrained VGGNet on ImageNet and two Inception blocks. This model had a high accuracy of 99.23%, which showed that their approach worked [14]. Moussafir et al. suggested a mixed method that uses both transfer learning and fine-tuning to find tomato leaf diseases. We tested the proposed model with a number of pretrained architectures, such as ResNet50, VGG16, EfficientNetB0, EfficientNetB1, EfficientNetB2, EfficientNetB3, and EfficientNetB4. In tests, the hybrid model got 98.01% correct [15]. Rangarajan et al. employed VGG16 and AlexNet architectures on a tomato leaf dataset comprising six disease classes and one healthy class. During the training and testing phases, 13,262 images were used. Therefore, VGG16 got an accuracy of 97.29%, and AlexNet got an accuracy of 97.49%, which is a little better [16]. In another study, Ujawe et al. compared the performance of various CNN models including VGG16 Net, ResNet, GoogleNet, AlexNet, UNet, and SqueezeNet for disease detection. AlexNet and VGG16 Net were the best models, with accuracy rates of 97.49% and 97.23%, respectively [17]. Mim et al. performed classification using CNN architecture over seven classes, comprising six disease classes and one healthy class. The model was trained on almost 15,000 pictures of tomato leaves and got 96.55% accuracy, which is very good [18]. In a separate study, Kılıçarslan and Paçal employed Densely Connected Convolutional Network (DenseNet), ResNet50, and MobileNet architectures for the detection of diseases in tomato leaves. DenseNet had the best performance, with an accuracy rate of 99.00% [19]. Recent research on lightweight real-time disease detection has underscored the LT-YOLOv10n architecture as a streamlined variant of YOLOv10, augmented with attention mechanisms (CBAM) and enhanced C3f layers. The model has been said to have a 98.7% mAP50 accuracy rate for finding tomato leaf diseases, an inference speed of 87 Frames Per Second (FPS) on the Jetson Orin Nano, and to work well with mobile and IoT-based agricultural decision-support systems [20]. A study on pest detection looked at the target species Cydia pomonella and created a real-time monitoring system by combining a Raspberry Pi-based trap, the Ubidots IoT platform, and the YOLOv10m model. The study showed that YOLOv10m is very accurate (89%) and works well even on devices with limited resources. It also helped reduce the use of pesticides and encourage more environmentally friendly farming [21]. A study combining deep learning and IoT technologies found that using the YOLOv10n model with a robot system that could be controlled from a distance enabled real-time monitoring of diseases in strawberry fields. The study, which used a Raspberry Pi 4B and an RGB camera setup, showed that diseased leaves could be found with a high level of accuracy, with a mAP-50 score of 96.78% [22]. To address challenges such as leaf occlusion, small disease regions, and complex backgrounds that hinder tomato leaf disease detection in intelligent agriculture applications, a deep learning-based disease detection network called “E-TomatoDet” was proposed. The model employs CSWinTransformer to capture global contextual information, multi-kernel modules to learn multi-scale local features, and a specialized feature pyramid structure to enhance feature integration. Experimental results demonstrate that the proposed approach improves the mAP50 by 4.7%, reaching 97.2% on the tomato leaf disease dataset compared to the baseline model, and outperforms advanced real-time detection models such as YOLOv10s [23].
Table 1 gives an overview of the different deep learning methods that have been used in the literature for agricultural diagnostics, such as finding plant diseases and pests and getting the best accuracy rates.
| Publication | Year | Proposed Model | Accuracy (%) |
|---|---|---|---|
| [8] | 2024 | Custom CNN | 94 |
| [9] | 2018 | LeNet (half CNN) | 94.00–95.00 |
| [10] | 2021 | InceptionV3 | 90 |
| [11] | 2018 | DenseNet121 | 99.75 |
| [12] | 2025 | Custom CNN | 95.2 |
| [13] | 2022 | ResNet101, ResNet152 + MLP | 98.9 |
| [14] | 2023 | VGGNet + Inception blocks | 99.23 |
| [15] | 2022 | Hybrid Model | 98.01 |
| [16] | 2018 | AlexNet | 97.49 |
| [17] | 2023 | AlexNet | 97.49 |
| [18] | 2019 | CNN | 96.55 |
| [19] | 2023 | DenseNet | 99 |
| [20] | 2025 | LT-YOLOv10n (lightweight YOLOv10 with CBAM + C3f) | 98.7 |
| [21] | 2024 | YOLOv10m (Raspberry Pi + Ubidots IoT) | 89 |
| [22] | 2024 | YOLOv10n (robot-integrated IoT) | 96.78 |
| [23] | 2025 | YOLOv10s (E-TomatoDet) | 97.2 |
Proposed models and their accuracy rates reported in the literature.
The open-access “Tomato Leaf Disease Detection” dataset [24] from Kaggle was used in this study to sort tomato leaf diseases. There are 10 different classes in the dataset, 9 of which are for different leaf diseases and 1 for healthy leaves. There are pictures of leaves in each class that show different signs of the condition. The dataset has the following classes:
Tomato___Bacterial_spot;
Tomato___Early_blight;
Tomato___Late_blight;
Tomato___Leaf_Mold;
Tomato___Septoria_leaf_spot;
Tomato___Spider_mites Two-spotted_spider_mite;
Tomato___Target_Spot;
Tomato___Tomato_Yellow_Leaf_Curl_Virus;
Tomato___Tomato_mosaic_virus;
Tomato___healthy.
The dataset is organized into two primary components: a directory used during model development and another reserved for independent evaluation. The development directory contains a total of 10,000 images, equally allocated across 10 classes with 1,000 samples per category. To structure the learning workflow, this directory was internally partitioned, assigning 80% of the labeled images to the training phase and the remaining 20% to an internal test subset. The dataset’s original val directory was retained as an external test resource to assess the model’s ability to generalize beyond the data encountered during training. This configuration enabled an objective examination of model performance on entirely unseen samples. A class-wise breakdown of the samples across the training, internal validation, and final test sets is illustrated in Figure 1.

Image distribution per class.
In Figure 2, we see example images that belong to each category of the dataset. These samples are very clear visually, and thus, the typical leaf-level symptom patterns can be easily recognized. Moreover, the large visual diversity among the categories is highly beneficial for the model development pipeline, as it allows more reliable feature extraction and clarifies the distinction between classes.

Sample images of the classes in the tomato leaf disease dataset.
The quality of image preparation has a direct impact on how well the models learn. In this study, several adjustments were applied to the images, so they would match the technical requirements of the models used with transfer learning. As a first step, the images were resized according to the input size expected by each network. InceptionV3 works with images of 299 × 299 pixels, while DenseNet121, MobileNetV2, ResNet50, EfficientNetB0, VGG16, and VGG19 operate with 224 × 224 pixels. To make the models less sensitive to variations in the data, only the training set was subjected to visual modifications. The modifications to the training images consisted of various visual changes such as random flips, rotations varying from −15° to +15°, and random crops of various sizes that would ultimately determine different parts of the leaf for each image. To make the images look more natural and to create a bigger variety of input conditions, the dataset was subjected to a series of brightness and contrast changes. These changes allowed the models to work with tomato leaf symptoms taken under different lights, angles, and sizes throughout different periods. Normalization was done through the application of the statistical values usually associated with the ImageNet dataset to all the images after going through the previous steps. This procedure involved applying the RGB channel means of [0.485, 0.456, 0.406] and the standard deviations of [0.229, 0.224, 0.225], thus basically aligning the data to what the ImageNet-based models are accustomed to. Random visual changes were not applied to the validation and test sets at all. Only resizing and normalization were performed, so accuracy and loss could be evaluated under the same conditions set for each image. As a result, the training set modifications did not impact the evaluation phase. The visual changes are summarized in Table 2, and Figure 3 illustrates how these transformations are reflected in an image of a diseased leaf.
| Phase | Applied Transformations |
|---|---|
| Training | Resize → Random Flip → Rotation → Color Jitter → Normalize |
| Validation | Resize → Normalize |
| Test | Resize → Normalize |
Image preprocessing transformations.

Augmented image samples.
Transfer learning is an approach in machine learning that allows information gained from one task to be applied to another task that is related to it. This approach removes the need to design and train a completely new model for every situation. Training a model from the beginning often demands substantial computational resources, long processing times, and large datasets that must be carefully labeled something that is rarely feasible in practice, as most datasets do not come with ready-made annotations. In transfer learning, a model that has already been trained on a labeled dataset is adapted for use on a new problem where the data may be similar in nature but lack labels [25]. In deep neural networks, CNNs, the early layers usually detect simple visual patterns such as edges, color changes, and textures. In contrast, the deeper layers capture more abstract patterns that relate directly to the specific task being learned [26]. For this reason, it is common to keep the earlier layers fixed and allow only the final layers, which act as the classifier, to be updated. This approach reduces the overall training effort and helps prevent the model from fitting too closely to small datasets, which can be an issue when working with limited data [27]. In this study, transfer learning was applied using this feature-extraction approach.
Deep learning models are increasingly becoming the driving force behind research in various fields of computation, with computer vision being the most favored one in terms of the extent of their application. The list of tasks in which deep learning models have significantly improved performance includes image classification, animal and object detection, and face recognition, among others. CNNs, in this regard, have claimed the first spot as the model category and are still the most sought-after method for many vision-related issues [28]. The ability of CNNs to automatically extract features from images by learning meaningful visual patterns directly from the raw data is the major reason for their universal acceptance. The authors of the present work scrutinized some of the most celebrated CNN architectures, specifically, DenseNet121, MobileNetV2, ResNet50, EfficientNetB0, VGG16, VGG19, and InceptionV3. Such networks vary a lot in terms of their depth, the connections, the internal structure of their computational elements, and the input image sizes [29].
ResNet is a CNN architecture developed by He et al. (2016) to address the vanishing gradient problem, which becomes increasingly prominent as deep learning models grow deeper. In traditional CNN architectures, model accuracy tends to degrade after a certain network depth. ResNet overcomes this limitation by introducing residual connections, which allow for much deeper networks without suffering from training degradation [30]. Residual connections introduce shortcut paths that pass over one or more layers, enabling the network to retain the original input while simultaneously computing new representations. This design supports a steadier gradient flow and makes it practical to train much deeper networks, such as ResNet50 and ResNet101. The ResNet family relies on the idea of identity mapping, where the input of a block is added directly to its output. Instead of expressing the target function as given in Equation (1), the formulation is rewritten as shown in Equation (2).
In such a formulation, the neural network is supposed to learn only the residual signal; in other words, the residual signal is the part that distinguishes the input from the full output mapping. This approach enables the creation of networks that are much deeper in terms of layers, for instance, with 50, 101, or 152 layers, without the usual drop in performance that deeper models tend to suffer from. A 50-layer version of the series named ResNet50 was the chosen one for this scientific investigation.
The DenseNet design is characterized by a considerable amount of time needed for training and huge size of the model [31]. It was developed by Huang et al. (2017) as a new type of CNN. In contrast to the conventional CNN, DenseNet connects every layer to all the others in a feed-forward manner; thus, the output of each layer is given as input to all the ones that follow. This method not only keeps the flow of information strong but also makes the gradient propagation more efficient. The networks with DenseNet structure have each layer taking feature maps from all the previous layers and sending its own feature maps to all the following layers. The dense connectivity in this arrangement has several advantages:
Reduces the number of parameters, as it reuses previously learned features instead of relearning them,
Mitigates overfitting, by encouraging feature sharing across the network, and
Improves gradient flow, making it easier to train very deep networks.
In this study, we employed the DenseNet121 model, which is the 121-layer version of the architecture.
MobileNet models are compact CNNs introduced by Google for use in settings where memory and processing capacity are limited, such as mobile hardware and embedded platforms. They were designed with the intention of keeping the model footprint small and lowering the amount of computation required, while still producing competitive predictive results [32]. The main element of the MobileNet design is the depthwise separable convolution. This operation breaks the usual convolution into two stages: a depthwise step, in which each channel of the input is processed separately, and a pointwise step, where a 1 × 1 convolution brings the channel information together. Dividing the operation in this manner reduces both parameter count and floating-point computations – often by as much as 85% – with only a modest drop in model performance. The first version of the family, MobileNetV1, was introduced by Howard et al. (2017) [32]. A later revision, MobileNetV2, presented by Sandler et al. (2018), brought several important changes to the structure of the model [33]. The key updates included:
The use of inverted residual blocks, where a narrow input is first expanded, then processed through depthwise convolution, and finally projected back to a lower-dimensional output.
A shortcut connection is added from the input to the output.
Instead of using nonlinear activation functions like ReLU at the block’s end, a linear transformation is applied to reduce information loss.
MobileNetV3, the newest version of the MobileNet family, was introduced by Howard and his colleagues in 2019 [34]. This version uses Squeeze-and-Excitation (SE) blocks that give priority to the more informative features through the application of channel-level weighting. Besides, the Hard-Swish activation function is used, chosen with mobile hardware constraints in mind, and is available in two versions: Small and Large. For the present work, MobileNetV2 model was used for experimental analysis.
EfficientNet, introduced by Tan and Le in 2019, is a family of convolutional networks developed to make more deliberate use of computational resources in image classification tasks [35]. Unlike earlier CNN approaches that expand depth or width without a clear rule set, EfficientNet applies a coordinated method called compound scaling. Through this method, the depth, width, and input resolution of the network are increased according to shared scaling coefficients, resulting in a more orderly way of growing the architecture while keeping the number of parameters relatively low. The design is built around two important components:
MBConv blocks (Mobile Inverted Bottleneck Convolutions), which were originally introduced for mobile settings and rely on an inverted bottleneck structure.
SE blocks, which assign varying importance to channels to strengthen feature weighting.
EfficientNet also replaces the common ReLU activation with Swish, a smoother nonlinear function that supports better gradient behavior during training. In this study, the version used for experimentation was EfficientNetB0.
VGGNet was presented by Simonyan and Zisserman, and the model attracted much attention due to its excellent performance in the 2014 ImageNet Large Scale Visual Recognition Challenge (ILSVRC) [36]. The core idea behind the model is very simple: make the network deeper so it can recognize more sophisticated visual patterns, while maintaining the same architectural structure across all layers. To achieve this uniformity, all layers that perform convolution use 3 × 3 filters with a stride of 1. With this setting, the network is able to create a hierarchy of the image composed of: the first layer deals with very small local details, then by adding more and more layers of convolutions, the network goes up to the most abstract visual concepts. VGGNet is usually characterized by two main versions that vary only in the total number of layers:
VGG16, which consists of 13 convolutional layers and 3 fully connected layers, totaling 16 layers.
VGG19, which includes 16 convolutional layers and 3 fully connected layers, totaling 19 layers.
A max-pooling layer is inserted after every few convolutional layers to progressively reduce the spatial dimensions. At the end of the network, there are three fully connected layers followed by a softmax classifier for final prediction. The ReLU activation function is applied throughout all layers. In this study, both VGG16 and VGG19 architectures were employed as part of the transfer learning approach.
The Inception architecture was pointed out by Szegedy and other researchers (2015) with the help of the GoogLeNet model, which was one of the top performers at the 2014 ILSVRC. It was the design that led to the implementation of the parallel branches that worked with different-sized filters, which marked a departure from the single conventional layer-by-layer structure of previous CNNs. These components became known as Inception blocks [37]. The main concept here is to apply different filter sizes simultaneously, for example, 1 × 1, 3 × 3, and 5 × 5, in the same layer, enabling coverage of both small-scale details and wider spatial patterns. After these parallel operations are done, their outputs are concatenated along the channel axis. The InceptionV3 model that was trained in advance on the ImageNet dataset was used in a transfer learning setup in this research work. InceptionV3 goes on from the previous versions with more than one architectural refinement. One illustrations is the replacement of the computationally intensive 5 × 5 convolution with the whole process of 1 × 5 and 5 × 1 convolutions, a selection that lessens the computation but still provides a good-quality prediction. By these means, the model has managed to find a good compromise between depth, accuracy, and computational requirements.
The optimization methods are an integral part of the process of training neural networks, since they essentially control the entire process of the model’s parameter adjustment during the learning phase. The parameter updates are steered by the loss value, which is an indicator of the difference between the predicted output of the model and the actual expected output. The model tends to produce outputs that are more similar to the target as this loss value decreases [38]. The optimization methods used in this study were SGD, Adam, AdamW, and Nadam, and through these models, the impact of different update rules on transfer learning was scrutinized. These approaches have their specific update patterns and hyperparameters, which might determine not only the speed of model convergence but also the quality of the final prediction. The subsections following this one summarize the principal features of the optimization techniques that were included in this research.
SGD is one of the initial techniques that found application in neural network training as an optimizer. The updates of the model’s weights are done with the gradients calculated from small random parts of the training data. Though this technique is simple and generally produces dependable learning behavior, its speed of convergence can be slow especially when the learning rate is constant throughout the training period [39].
Adam, which was put forward by Kingma and Ba in 2015, is an optimization technique that uses gradient statistics to decide how to modify each parameter [40]. It monitors two values: the average of the previous gradients (often seen as momentum) and the average of the square gradients, which indicates how variable the gradients are. By means of these two metrics, Adam gives a different learning rate to each parameter, and this rate is not constant throughout the training; instead, it changes over time. A downside of this method is that it may overfit the training data when the dataset is small.
AdamW can be considered as one of the Adam variants that solves the problem of the original Adam algorithm’s improper application of weight decay. Adam applied L2 regularization in a way that hindered the weight update process, while AdamW improved the situation by separating weight decay from the gradient update, thus helping generalization and training stability [41]. Disadvantage: It is necessary to perform careful hyperparameter tuning to get the best performance.
Nadam combines the technique of Nesterov momentum with that of Adam optimizer. This approach, in contrast to the regular momentum updates in Adam, forwards momentum in Nadam by anticipating the future gradient. This procedure permits updates that are more intelligent and balanced resulting in a smoother convergence curve [42].
The totality of experiments carried out in this research was done on one NVIDIA A100 GPU that was made available via Google Colab. The study considered seven models of CNNs DenseNet121, MobileNetV2, ResNet50, EfficientNetB0, and VGG16 and VGG19, and InceptionV3, all starting from ImageNet-pretrained weights and adjusting through transfer learning. Each network’s optimization process was explored through four different ways Adam, SGD, AdamW, and Nadam. Thus, there arose the combination of 28 models with optimizers in total. A uniform training method was preserved for all models which was the basis for the comparison between different settings to be valid. The models were trained in batches of 32, and the learning rate for all optimizers was 0.001. The parameters for Adam, AdamW, and Nadam moment values in PyTorch were (β1 = 0.9 and β2 = 0.999), while SGD was set to a learning rate of 0.9. A ReduceLROnPlateau scheduler was utilized for learning rate control with a reduction factor of 0.5 and a patience of three epochs. The training process was stopped early if validation loss did not show an improvement in five successive epochs, which also helped to save computational resources and reduce the risk of overfitting. Each setting was given a fair test by keeping the same computational environment and uniform training conditions. The framework for the classification process of tomato leaf diseases is shown in Figure 4.

Model workflow diagram.
Each model was trained for 100 epochs during the experiments. Table 3 presents the best validation accuracy, the corresponding epoch number when this value was achieved (best_epoch), and the final test accuracy for each model and optimizer combination. The data in the table indicate that the choice of model architecture and optimization algorithm has a direct impact on classification performance.
| Model | Optimizer | Best Validation Accuracy | Test Accuracy |
|---|---|---|---|
| EfficientNetB0 | AdamW | 0.9972 | 0.9968 |
| VGG19 | SGD | 0.9935 | 0.9955 |
| DenseNet121 | Adam | 0.9960 | 0.9948 |
| MobileNetV2 | SGD | 0.9965 | 0.9945 |
| ResNet50 | AdamW | 0.9948 | 0.9942 |
| ResNet50 | Adam | 0.9942 | 0.9940 |
| EfficientNetB0 | SGD | 0.9968 | 0.9925 |
| InceptionV3 | SGD | 0.9950 | 0.9915 |
| ResNet50 | SGD | 0.9958 | 0.9905 |
| EfficientNetB0 | Adam | 0.9988 | 0.9895 |
| MobileNetV2 | Nadam | 0.9960 | 0.9892 |
| DenseNet121 | Nadam | 0.9940 | 0.9885 |
| InceptionV3 | AdamW | 0.9940 | 0.9875 |
| ResNet50 | Nadam | 0.9945 | 0.9872 |
| DenseNet121 | SGD | 0.9940 | 0.9865 |
| DenseNet121 | AdamW | 0.9945 | 0.9862 |
| MobileNetV2 | AdamW | 0.9948 | 0.9855 |
| VGG16 | SGD | 0.9845 | 0.9820 |
| EfficientNetB0 | Nadam | 0.9890 | 0.9785 |
| InceptionV3 | Nadam | 0.9825 | 0.9760 |
| InceptionV3 | Adam | 0.9795 | 0.9725 |
| MobileNetV2 | Adam | 0.9810 | 0.9680 |
| VGG16 | Adam | 0.8780 | 0.8520 |
| VGG16 | AdamW | 0.8210 | 0.8050 |
| VGG19 | Adam | 0.7420 | 0.7350 |
| VGG19 | Nadam | 0.1410 | 0.1320 |
| VGG16 | Nadam | 0.1250 | 0.1180 |
| VGG19 | AdamW | 0.1050 | 0.1020 |
Performance metrics for each model–optimizer combination.
The effective configuration was determined with the EfficientNetB0 + AdamW combination as it achieved the highest test accuracy together with a validation accuracy of 99.72% and a test accuracy of 99.68%. Next came the victories of VGG19 + SGD (99.55%), DenseNet121 + Adam (99.48%), MobileNetV2 + SGD (99.45%), and ResNet50 + AdamW (99.42%). In this way, the findings clarified that the EfficientNetB0 model is highly accurate due to its efficient use of parameters, whereas the AdamW optimizer promotes generalization through the proper application of weight decay. The remarkable test results of SGD in certain configurations like VGG19 and VGG16 affirm the argument that traditional optimization methods still represent viable choices. On the other hand, Adam and AdamW adaptive algorithms were found to yield better results with deeper and more modern designs (DenseNet, ResNet, and EfficientNet). This means that the choice of an optimizer should be done in association with the design of the model. The Nadam optimizer was usually inferior to the other three optimizers in terms of accuracy. It particularly struggled with VGG architectures; test accuracy for the VGG19 + Nadam combination was reported as low as 13.20% and 11.80% for VGG16 + Nadam. The results suggest that the Nadam optimizer may not be suitable for the dataset and the architectural configurations this study used.
The study, besides looking at the accuracy, also tried to find out in what way the best-performing models are operated in practice in the most efficient manner. Different practical indicators were taken for the model–optimizer pairs that performed the best: the number of parameters that can be altered during training, the on-disk size of the model, and the average processing time for a single image. These metrics were produced on an NVIDIA A100 GPU in Google Colab with a batch size of one, and results were averaged over 100 forward passes. The findings reveal a very huge difference between the small networks, which are more suited for mobile use, and the big architectures. For instance, the pairing of MobileNetV2–SGD shines as the fastest and the least bulky choice: it has approximately 22 million trainable parameters, takes up about 88 MB of space, and takes around 246 ms to process one image. On the other hand, VGG19 when trained with SGD has more than 30 times the number of parameters, takes proportionally more storage, and the inference is significantly slower – around 12 times slower than EfficientNetB0. These differences highlight the practical limitations of large CNNs in, for example, real-time decision-making where the resource constraint is similar to that of field-level agricultural systems. In the test among the models, the combinations EfficientNetB0 + AdamW and DenseNet121 + Adam show a feasible compromise: they deliver excellent prediction performance while still being computer friendly; hence, they are good candidates for edge-level deployment. The results indicate that the computational costs have to be factored along with accuracy when deep learning models are to be selected for operational precision-agriculture environments. Table 4 presents the computational efficiency comparison of the top-performing model–optimizer combinations in terms of parameter count, model size, and inference time per image.
| Model–Optimizer | Parameters | Model Size (MB) | Inference Time (ms/image) | Input Size |
|---|---|---|---|---|
| EfficientNetB0 + AdamW | 4,020,358 | 15.68 MB | 42.164 ms/image | 224 × 224 |
| VGG19 + SGD | 139,611,210 | 532.60 MB | 536.307 ms/image | 224 × 224 |
| DenseNet121 + Adam | 6,964,106 | 27.20 MB | 114.867 ms/image | 224 × 224 |
| MobileNetV2 + SGD | 2,236,682 | 8.80 MB | 24.591 ms/image | 224 × 224 |
Computational efficiency comparison of the top-performing model-optimizer combinations.
Overall, the results clearly demonstrate that transfer learning-based models can achieve high classification accuracy even with limited data However, this performance is strongly influenced by both the selected architecture and the optimization algorithm. According to our experimental findings, the most successful model–optimizer combination in terms of test accuracy was EfficientNetB0 + AdamW. The accuracy and loss values related to the training process are presented in Figures 5 and 6. The confusion matrix generated for the EfficientNetB0 + AdamW combination is shown in Figure 7.

Epoch-wise accuracy progression for the EfficientNetB0 + AdamW model.

Epoch-wise loss progression for the EfficientNetb0 + AdamW model.
Figure 5 compares the training and validation accuracies per epoch for the EfficientNetB0 + AdamW combination. It can be observed that the model quickly reached high accuracy in the early epochs, and the validation accuracy consistently improved over time. The accuracy values exceeded 99%, indicating strong performance in both the training and validation phases. The parallel trend of the curves suggests that the model maintained its generalization ability without overfitting. When examining the loss values presented in Figure 6, it is evident that both training and validation loss curves showed. The validation loss curve closely followed the training loss curve, indicating that the learning process was stable and well balanced. This also demonstrates the effectiveness of the applied techniques such as data augmentation, dropout, and learning rate scheduling.

Confusion matrix of the EfficientNetb0 + AdamW model on the test data.
Figure 7 presents the confusion matrix generated for the EfficientNetB0 + AdamW combination. The model classified each class with high accuracy, achieving over 99% success across all 10 classes. The most notable confusion was observed between the classes “Tomato__Bacterial_spot,” “Tomato___Target_Spot,” and “Tomato__Yellow_Leaf_Curl_Virus.” The principal cause of these confusions is the remarkable visual resemblance of the symptoms displayed by these diseases under specific situations. Both Bacterial Spot and Target Spot lead to necrotic lesions on the leaf surface; even for experienced pathologists, it can be quite difficult to tell apart among the lesions demonstrating varying color, size, and shape. Also, at the very beginning of a plant’s infection by Yellow Leaf Curl Virus, chlorosis can be seen that looks very similar to some fungal diseases. The visual overlaps along with the limited intra-class variation in the specific dataset categories have caused the model to be uncertain in those ambiguous situations. Thus, the slight misclassifications are considered as a result of the quality of the dataset due to the intrinsic visual similarities and overlapping symptoms, rather than to any insufficiency of the model architecture. However, these misclassifications were limited to just one occurrence each and did not have much of an effect on the generalization performance of the model. The algorithm was able to achieve 100% accuracy in categories where the distinctions were clear, such as healthy leaves and viral diseases. Table 5 presents the class-wise classification report for the best-performing model EfficientNetB0 + AdamW, including precision, recall, F1-score, and support values. The results indicate consistently high performance across all disease classes, with only minor variations observed in visually similar categories.
| Class | Precision | Recall | F1-score | Support |
|---|---|---|---|---|
| Tomato___Bacterial_spot | 0.9802 | 0.99 | 0.9851 | 100 |
| Tomato___Early_blight | 1 | 1 | 1 | 100 |
| Tomato___Late_blight | 0.9901 | 1 | 0.995 | 100 |
| Tomato___Leaf_Mold | 1 | 1 | 1 | 100 |
| Tomato___Septoria_leaf_spot | 1 | 1 | 1 | 100 |
| Tomato___Spider_mites | 1 | 0.99 | 0.995 | 100 |
| Tomato___Target_Spot | 0.99 | 0.99 | 0.99 | 100 |
| Tomato___Tomato_Yellow_Leaf_Curl_Virus | 1 | 0.99 | 0.995 | 100 |
| Tomato___Tomato_mosaic_virus | 1 | 1 | 1 | 100 |
| Tomato___healthy | 1 | 1 | 1 | 100 |
| Macro average | 0.996 | 0.996 | 0.996 | 1,000 |
| Weighted average | 0.996 | 0.996 | 0.996 | 1000 |
Class-wise classification report for EfficientNetb0 + AdamW.
The VGG19 model coupled with the Nadam optimizer was, by a wide margin, the worst of all models assessed. Figure 8 depicts the loss values associated with both training and validation for the model that was trained utilizing the VGG19 architecture and the Nadam optimization algorithm, whereas Figure 9 shows the change in accuracy values over the epochs. The confusion matrix for the VGG19 + Nadam model is presented in Figure 10.

Epoch-wise variation of loss values for the VGG19 + Nadam model.

Epoch-wise variation of accuracy values for the VGG19 + Nadam model.
Figure 8 shows a loss graph that shows a lot of changes in the training loss during the first few epochs of training. However, it stabilized pretty quickly. The validation loss stayed low for all of the training data, which means that the model did not seem to be very likely to overfit. However, the changes that happened early on show that the model’s learning process is not very stable. The accuracy graph for the VGG19 model with the Nadam optimizer in Figure 9 shows that the training and validation accuracy values were very low and changed a lot over the epochs. The VGG19 model, especially when using the Nadam optimizer, did not do a good job of classifying the dataset and had trouble telling the target classes apart.

Confusion matrix of the VGG19 + Nadam model on the test data.
An examination of the confusion matrix presented in Figure 10 reveals that the model achieved moderate classification performance in some classes, such as Tomato__Target_Spot, Tomato__Mosaic_Virus, and Tomato__Septoria_Leaf_Spot. However, significant misclassifications were observed particularly in the Tomato__Bacterial_Spot, Tomato__Early_Blight, and Tomato__Late_Blight classes. For instance, only 21 out of 100 test samples in the Bacterial Spot class were correctly classified, while 71 instances were incorrectly predicted as belonging to the Leaf Mold class. This indicates that the model failed to learn discriminative features between visually similar classes. The results obtained in this study show that the Nadam optimization algorithm leads to significant performance degradation when used in combination with VGG architectures. Although VGG architectures have considerable depth, they are highly homogeneous, repetitive, and based on an older structural design. The use of a long sequence of consecutive convolutional layers makes these models particularly sensitive to gradient flow. Nadam, on the other hand, is an optimizer that combines Nesterov momentum with Adam’s adaptive learning-rate mechanism. This structure tends to produce aggressive and highly variable parameter updates, especially during the early epochs. When these two characteristics are considered together, the following hypothesis emerges: Adaptive learning rates may trigger excessive gradient updates within the flat and repetitive structure of VGG networks. This can lead to gradient oscillation and instability during the learning process. In VGG architectures where low-level filters are heavily concentrated, the large-step updates produced by adaptive methods may prevent the model from converging toward optimal weight regions. Therefore, Nadam’s fast yet aggressive optimization strategy does not form a good synergy with older, structurally simpler architectures like VGG. These findings indicate that modern adaptive optimizers do not perform equally well across all architectures, and that the interaction between the optimizer and the network architecture plays a critical role in determining model performance.
Considering all performance metrics together in our experimental study, it was concluded that the EfficientNetB0 + AdamW combination demonstrated the most balanced and superior performance in terms of both learning capability and generalization ability.
To verify if the accuracy values from different runs were consistent with each other, the most powerful model–optimizer combinations were subjected to three training sessions. Each session was assigned a different seed to initialize the weights. The validation and test accuracies for these three executions were not treated as a single result from one run but, to be more informative, were characterized by their mean value and standard deviation. In Table 6, the best validation summary is given by the combination of EfficientNetB0 and AdamW, with a mean value of 0.9947 and a standard deviation of 0.0015. The next two configurations were very close to it: MobileNetV2 with SGD optimizer having the same accuracy in all three runs (0.9935 ± 0.0000), and VGG19 with SGD optimizer having a mean accuracy of 0.9917 ± 0.0006. Results of DenseNet121 and Adam were also good though less stable (0.9892 ± 0.0067). Conversely, validation accuracy of ResNet50 with Adam fluctuated so that at one time it was only 0.9740 ± 0.0262. In most cases, test accuracies resembled the trends of validation accuracies. The two configurations MobileNetV2–SGD and VGG19–SGD were excellent and very stable in generalization capability, yielding 0.9880 ± 0.0000 and 0.9890 ± 0.0017, respectively. On the other hand, EfficientNetB0–AdamW and DenseNet121–Adam pair both hit the same average test value (0.9847), but the range around this average was different for each. Collectively, the results indicate that the hierarchy of the models is not influenced by random seeds. The differences among models are not mere coincidences either but are indeed the variations that reproduction can reveal, thus giving the evaluation a statistical reliability.
| Model | Optimizer | Runs | Mean Val Accuracy | Std Val Accuracy | Mean Test Accuracy | Std Test Accuracy |
|---|---|---|---|---|---|---|
| DenseNet121 | Adam | 3 | 0.9892 | 67 | 0.9847 | 117 |
| EfficientNetB0 | AdamW | 3 | 0.9947 | 15 | 0.9847 | 62 |
| MobileNetV2 | SGD | 3 | 0.9935 | 0 | 0.9880 | 0 |
| ResNet50 | Adam | 3 | 0.9740 | 262 | 0.9765 | 219 |
| VGG19 | SGD | 3 | 0.9917 | 6 | 0.9890 | 17 |
Mean ± standard deviation of validation and test accuracy across three random seed runs for the top five model–optimizer combinations.
To further validate the reliability of the observed performance differences, a statistical significance analysis was conducted on the top-performing model–optimizer combinations. EfficientNetB0 + AdamW, MobileNetV2 + SGD, and VGG19 + SGD were selected for comparison, as they achieved the highest and most stable test accuracies across multiple runs. A paired t-test was applied to the test accuracy values obtained from three independent runs with different random seeds in order to assess whether the observed differences were statistically meaningful. The results indicated that the performance differences between these models were not statistically significant (P > 0.05), suggesting that they exhibit comparable generalization capability. This finding implies that the observed accuracy variations are not driven by random initialization effects, but instead reflect stable and consistent model behavior.
The results obtained in this study are largely consistent with the trends reported in previous research on tomato leaf disease classification using deep learning models, while also extending the current literature in several important aspects. Earlier studies have demonstrated that CNN-based and transfer learning-based approaches can achieve high classification accuracy, typically ranging between 94% and 99% depending on the model architecture and dataset characteristics [8-19]. For instance, custom CNN architectures and lightweight models such as LeNet have achieved accuracies around 94%–95% [8, 9], whereas transfer learning-based approaches employing VGG, DenseNet, ResNet, and Inception architectures have reported higher performance, in some cases exceeding 98% accuracy [10–16, 19]. The high accuracy achieved by EfficientNetB0 + AdamW in this study is therefore in line with these findings, confirming the effectiveness of modern transfer learning architectures for tomato leaf disease detection.
One of the key strengths of the proposed framework is its emphasis on robustness, reproducibility, and practical deployment considerations. The use of multiple random seed runs, statistical significance testing, and the inclusion of computational metrics such as model size and inference time allow for a more realistic evaluation of model suitability in real-world agricultural settings. One of the key strengths of the proposed framework is its emphasis on robustness, reproducibility, and practical deployment considerations. The use of multiple random seed runs, statistical significance testing, and the inclusion of computational metrics, such as model size and inference time, allows for a more realistic evaluation of model suitability in real-world agricultural settings. In particular, the findings highlight that lightweight models such as MobileNetV2 + SGD can achieve performance comparable to more complex architectures while offering substantial advantages in terms of efficiency and deployability. Despite these strengths, the proposed approach has certain limitations. The experiments were conducted on a single benchmark dataset with controlled imaging conditions, which may not fully capture the variability encountered in real field environments. Additionally, although three independent runs provide a baseline assessment of robustness, future studies could further strengthen statistical reliability by increasing the number of repetitions and incorporating cross-dataset validation. Addressing these limitations by evaluating the framework on diverse, real-world datasets and integrating additional environmental factors constitutes an important direction for future research.
The research work has chosen the investigating method of identifying the image data of the tomato leaf diseases and also classifying the disorders with the help of convolutional models based on transfer learning. The method gave the reference dataset of 10 annotated classes, and CNN backbones including densenet121, mobilenetv2, resnet50, efficientnetb0, vgg16, vgg19, and inceptionv3 were implemented with different parameter-update schemes, namely Adam, SGD, Nadam, and AdamW. The best configuration tested that pairing of EfficientNetB0 with AdamW produced the best was persistent through testing. Its validation accuracy reached 99.72%, and the test accuracy was just under that at 99.68%, which shows that the model was consistently performing well in both evaluation phases. The confusion matrix analysis showed that almost all classes were accurately detected, and only a very few misclassifications were made, meaning that the model was capable of recognizing subtle differences between the disease symptoms. These results imply that the suggested system could act as a real-time analytical tool for farmers, and hence, the adoption of the system for agricultural decision-making applications, especially in early disease detection, will be important. They also point to the capability of transfer-learning-based CNN models in scenarios where datasets are small or there is a significant class imbalance. Adapting the system for mobile or real-time field applications does not seem too far in the future, and trying out the method on other plant species or disease groups might help in making it a digital agriculture solution with wider applicability.
Özden Havadar: Conceptualization, Methodology, Software, Data Curation, Investigation, Formal Analysis, Visualization, Writing – Original Draft; Serhat Kiliçarslan: Supervision, Methodology, Validation, Writing – Review & Editing.
Not applicable.
Not applicable.
The source data used in this study is publicly available through the Kaggle dataset repository.
The authors declare no conflict of interest.
Written by
Article Type: Research Paper
Date of acceptance: February 2026
Date of publication: February 2026
DoI: 10.5772/acrt20250101
Copyright: The Author(s), Licensee IntechOpen, License: CC BY 4.0
© The Author(s) 2025. Licensee IntechOpen. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Impact of this article
46
Downloads
131
Views
Popular among readers
Loading...
Loading...
Popular among readers
Loading...
Loading...
Popular among readers
Loading...
Loading...
Join us today!
Submit your Article