COVID_detection

image info

This repository contains my project for the Technion’s EE 046211 course “Deep Learning”.

Introduction

This project will attempt to use transfer learning in order to classify CT scans as positive or negative for COVID19. For this task, two approaches will be considered: using Convolutional Neural Networks (CNN) and using Vision Transformers (Vit). In this project the effectiveness of each approach as a feature extraction model will be examined and compared.

The Code

The code for this project split in to 2 parts:

1. Data exploration and feature extraction:

Notebook can an be found on Kaggle (the dataset is large ~ 28GB, so it’s more convenient to access the data from Kaggle directly):
Used to export the features files used in the next part:
- train_predictions_vgg.csv
- train_true_values_vgg.csv
- train_predictions_vit.csv
- train_true_values_vit.csv
- val_predictions_vgg.csv
- val_true_values_vgg.csv
- val_predictions_vit.csv
- val_true_values_vit.csv 2. Train classifier models from the generated features:
Notebook can be found in this repository at Project/train_classifier.ipynb
Also, after obtaining the features files (previous part), the code can be used in Google Colab:

Dataset

The dataset COVIDx CT from Kaggle used for this project. This dataset contains CT scans for both positive and negative COVID19 cases.

Example images from the dataset:

The original dataset contains Pneumonia class, which wasn’t considered here for simplicity.

The number of data samples for positive and negative class isn’t balanced, so the custom dataloader keeps equal number of samples of both positive and negative samples.

image info

Preprocessing steps:

All the images provided with a bounding box, allowing one to crop the images to contain only the relevant areas.

In order to use the ViT model, the images should be with size 384x384.

Image Classification

For the classification, two models will be compared as feature extractors:

ViT [1] - pretrained model from lukemelas/PyTorch-Pretrained-ViT on GitHub (it has very detailed and useful readme). This model is trained on ImageNet-21K.
VGG [2] - pretrained model from torchvision.models. This model is trained on ImageNet.

Feature Extraction

For each model, the features obtained by replacing the last layer in the original model with an identity layer, because these models trained in ImageNet and therefore outputs 1000 class while for this stage only the learned features of the model are wanted (and not it’s classification).

Example for VGG:

import torch.nn as nn
from torchvision import models

model = models.vgg16(pretrained=True)
for param in model.parameters():
    param.requires_grad = False

num_features = model.classifier[6].in_features
model.classifier[6] = nn.Identity()

Now inference with model will yield features insted of classification.

Results

VGG

Time	Epoch	Learning rate	Loss	Val accuracy	Val (tp, tn, fp, fn)
00:00:16	01/10	5.0e-05	0.4137	80.89%	42.4%, 38.5%, 11.5%, 07.6%
00:00:32	02/10	5.2e-04	0.3501	84.06%	41.8%, 42.3%, 07.7%, 08.2%
00:00:48	03/10	1.0e-03	0.2412	84.47%	39.7%, 44.7%, 05.3%, 10.3%
00:01:04	04/10	5.2e-04	0.1306	84.56%	38.9%, 45.6%, 04.4%, 11.1%
00:01:20	05/10	5.0e-05	0.0594	85.71%	41.7%, 44.0%, 06.0%, 08.3%
00:01:36	06/10	2.9e-04	0.0479	85.45%	41.9%, 43.5%, 06.5%, 08.1%
00:01:52	07/10	5.2e-04	0.0259	85.30%	43.5%, 41.8%, 08.2%, 06.5%
00:02:09	08/10	2.9e-04	0.0105	85.58%	42.0%, 43.5%, 06.5%, 08.0%
00:02:25	09/10	5.0e-05	0.0066	85.78%	41.5%, 44.3%, 05.7%, 08.5%
00:02:41	10/10	1.7e-04	0.0060	85.75%	41.5%, 44.3%, 05.7%, 08.5%

ViT

Time	Epoch	Learning rate	Loss	Val accuracy	Val (tp, tn, fp, fn)
00:00:10	01/10	5.0e-04	0.6887	53.07%	15.4%, 37.6%, 12.4%, 34.6%
00:00:20	02/10	5.2e-03	0.5822	67.68%	31.9%, 35.8%, 14.2%, 18.1%
00:00:31	03/10	1.0e-02	0.4923	70.24%	32.4%, 37.8%, 12.2%, 17.6%
00:00:41	04/10	5.2e-03	0.4508	72.05%	34.6%, 37.4%, 12.6%, 15.4%
00:00:51	05/10	5.0e-04	0.4225	72.31%	33.5%, 38.8%, 11.2%, 16.5%
00:01:02	06/10	2.9e-03	0.4265	72.66%	34.0%, 38.7%, 11.3%, 16.0%
00:01:13	07/10	5.2e-03	0.4229	74.11%	32.5%, 41.6%, 08.4%, 17.5%
00:01:23	08/10	2.9e-03	0.4042	74.37%	35.9%, 38.5%, 11.6%, 14.1%
00:01:34	09/10	5.0e-04	0.3863	75.15%	36.8%, 38.4%, 11.6%, 13.2%
00:01:44	10/10	1.7e-03	0.3872	76.01%	39.9%, 36.1%, 13.8%, 10.1%

Conclusion

Although ViT pretrained on a larger dataset, and achieves better accuracy on most of the common datasets (compares to CNNs), on this dataset the VGG model (CNN based) performed much better than the ViT model.

For future work, it would be interesting to:

Train both models from scratch (instead of using transfer learning) and see if the CNN model still achieves superior results.

Use data augmentation in order to provide more general features.

Check another datasets from the medical field and check if CNN based models still outperforms ViT.

References

[1] An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale.
[2] Very Deep Convolutional Networks for Large-Scale Image Recognition.