A Convolutional Neural Network for Different Skin Lesions Classification

A CNN for Skin Cancer Detection

Skin cancer is a global health concern, with millions of cases diagnosed each year. Basal and squamous cell carcinomas are the most common types and well treatable. Melanoma, though being a rare type of skin cancer, accounts for the majority of skin-cancer related deaths. However, if caught early, the 5-year survival rate for melanoma is still over 99%, highlighting the critical importance of timely and accurate diagnosis.

Dermatologist are highly skilled at classifying skin lesions as benign or malignant, and also at determining the type of skin cancer. However, the visual inspection of the entire surface of the skin is time-consuming and remains challenging even for experienced professionals, as the different lesion types present with a high degree of visual similarity. Still, each lesion type is characterized by unique features and subtle patterns. This renders skin cancer classification an ideal task for a deep learning model: By leveraging massive datasets of medical images, deep learning models can be trained to recognize the subtle, intricate patterns that differentiate cancerous from benign lesions. This not only has the potential to assist or improve the diagnostic accuracy of clinicians, but also to increase cost efficiency.

We used PyTorch to develop and train a CNN for the classification of skin lesions.

Data Preprocessing

The training of a CNN requires a large amount of data. We thus downloaded the HAM10000 dataset from kaggle.com. It contains 10015 images of skin lesions that belong to one of seven categories (actinic keratoses (327), basal cell carcinoma (514), benign keratosis-like lesions (1099), dermatofibroma (115), melanoma (1113), melanocytic nevi (6705), and vascular lesions (142). Of these seven categories of skin lesions, three lesion types are benign (melanocytic nevi, dermatofibroma, benign keratosis-like lesion). A key challenge immediately became apparent: the severe class imbalance, with melanocytic nevi making up a vast majority of the dataset. While our initial goal was a seven-class classification, this imbalance made it an unrealistic target. We therefore decided to simplify the problem to a binary classification: "benign" versus "malignant."

Still, the classes “malignant” and “benign” were imbalanced. To prevent the model from becoming biased towards the majority class (”benign”), we employed several preprocessing techniques, e.g.:

Data Augmentation: We applied various transformations, e.g., horizontal and vertical flips, rotation, color jitter, resizing, and cropping to the images of the underrepresented class.

Weighted Loss Function: We implemented a weighted loss function. This function assigns a higher penalty to errors made on the underrepresented class (”malignant”). By making it more "costly" for the model to misclassify a malignant lesion, we encouraged it to pay more attention to these cases.

Our model was designed to handle three distinct types of input: the lesion image, the patient's age, and the localization of the lesion on the body. This approach allows the model to leverage additional information about the patients that can be relevant for diagnosis.

The core of our model remains the image processing lane. It consists of multiple convolutional layers. These layers are responsible for learning hierarchical features from the image data, from edges to more complex patterns. To enhance model performance and training stability, we incorporated batch normalization and skip connections.

Batch normalization allows a larger learning rate and is an effective tool to reduce the risk of overfitting. Skip connections allow information to "skip" one or more layers. This way, a shortcut is created for for the gradient during backpropagation. This effectively mitigates the problem of vanishing or exploding gradients.

The image processing lane results in a feature vector which is combined with the patients age and the location of the lesion. This information is passed to a final linear layer for the binary classification of "benign" or "malignant.”

The overall model accuracy reaches 83.9%. While this level is a reasonable starting point for the development of a model, it is clearly insufficient for any real-world application. In addition: while the metrics for assessing model quality are acceptable for the majority class “benign,” accuracy and recall are significantly worse for the class “malignant.” This imbalance in performance persisted even after applying data augmentation or implementing a weighted loss function.

We also tested the model performance on a new dataset and found that the overall classification was significantly worse than on the trained dataset, indicating persisting issues with generalization.

These results highlights that imbalanced datasets still pose a challenge for deep learning models. Though we applied data augmentation, the amount of information on the different classes of skin lesions contained in the dataset remains unevenly distributed. Other tools that we applied, as e.g. the weighted loss function, improved the model performance but were not sufficient to achieve an even better result. Therefore, this project underlines that issues inherent to the training data cannot necessarily be compensated for.

Source
[2] Tschandl, P., Rosendahl, C. & Kittler, H. The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Sci. Data 5, 180161 doi:10.1038/sdata.2018.161 (2018). https://www.kaggle.com/datasets/jaiahuja/skin-cancer-detection (test set)

Team & Rollen

Mahboubeh Abdighara

Selection of Dataset, Data Preprocessing, Data Augmentation, Simple Model Architecture and Training, Test the Finalized Model, Save the Model

Frauke Heins

selection of dataset, data proprocessing, data augmentation, initial model architecture & training, model evaluation & optimization, testing generalization, draft & finalization blogpost

Mentor:in

Maximilian Hahn

Unsere Partner