Star Wars Classifier

This notebook consists of building a Star Wars classifier from scratch. The notebook doesn't use any predefined dataset. So, I'll be downloading the dataset on the go by scraping the images from internet. For the sake of keeping it simple, I'll be making a 3 class classifier mainly of Yoda, Luke and Wookie. The model development will be done using fastai. If you like the notebook, consider giving an upvote. ✅

1. Downloads

back to top

I'm using a python package named icrawler for scraping the images.

!pip install icrawler

Collecting icrawler
  Downloading icrawler-0.6.4.tar.gz (26 kB)
Collecting beautifulsoup4>=4.4.1
  Using cached beautifulsoup4-4.9.3-py3-none-any.whl (115 kB)
Collecting lxml
  Using cached lxml-4.6.3-cp38-cp38-macosx_10_9_x86_64.whl (4.6 MB)
Collecting requests>=2.9.1
  Downloading requests-2.25.1-py2.py3-none-any.whl (61 kB)
     |████████████████████████████████| 61 kB 5.1 MB/s  eta 0:00:01
Requirement already satisfied: six>=1.10.0 in /Users/namanmanchanda/miniconda3/envs/nishu/lib/python3.8/site-packages (from icrawler) (1.15.0)
Requirement already satisfied: Pillow in /Users/namanmanchanda/miniconda3/envs/nishu/lib/python3.8/site-packages (from icrawler) (8.1.1)
Collecting soupsieve>1.2
  Using cached soupsieve-2.2.1-py3-none-any.whl (33 kB)
Collecting idna<3,>=2.5
  Downloading idna-2.10-py2.py3-none-any.whl (58 kB)
     |████████████████████████████████| 58 kB 9.3 MB/s  eta 0:00:01
Requirement already satisfied: certifi>=2017.4.17 in /Users/namanmanchanda/miniconda3/envs/nishu/lib/python3.8/site-packages (from requests>=2.9.1->icrawler) (2020.12.5)
Collecting chardet<5,>=3.0.2
  Downloading chardet-4.0.0-py2.py3-none-any.whl (178 kB)
     |████████████████████████████████| 178 kB 19.4 MB/s eta 0:00:01
Collecting urllib3<1.27,>=1.21.1
  Downloading urllib3-1.26.5-py2.py3-none-any.whl (138 kB)
     |████████████████████████████████| 138 kB 9.4 MB/s eta 0:00:01
Building wheels for collected packages: icrawler
  Building wheel for icrawler (setup.py) ... done
  Created wheel for icrawler: filename=icrawler-0.6.4-py2.py3-none-any.whl size=35063 sha256=bd50c3f1d07da534fa4b7615bd47053e2d92da940734f7757b3c9d172fc18b9d
  Stored in directory: /Users/namanmanchanda/Library/Caches/pip/wheels/5b/a5/50/db28e1726fdc127cb6c5a757a4350af44a32b4e5d2c5d45dac
Successfully built icrawler
Installing collected packages: urllib3, soupsieve, idna, chardet, requests, lxml, beautifulsoup4, icrawler
Successfully installed beautifulsoup4-4.9.3 chardet-4.0.0 icrawler-0.6.4 idna-2.10 lxml-4.6.3 requests-2.25.1 soupsieve-2.2.1 urllib3-1.26.5

2. Packages

back to top

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))
        
# icrawler 
from icrawler.builtin import GoogleImageCrawler   

# fastai
from fastai import *
from fastai.vision import *
from fastai.imports import *
from fastai.vision.all import *

# widgets
import ipywidgets as widgets

# ignore warnings
import warnings
warnings.filterwarnings("ignore")

3. Pre-model building

back to top

3.1 Create folder

I'm creating three folders in /kaggle/working to download their respective image in each folder.

!mkdir yoda
!mkdir luke
!mkdir wookie

3.2 Scrape images

The way icrawler works is that it creates a folder named /images from wherever the command is run. So right now we are in the /kaggle/working directory. Now I'll be going to each directory one at a time and run the crawler to download the images in the /images folder. So the structure of the folders will be something like

  • /kaggle/working/yoda/images
  • /kaggle/working/luke/images
  • /kaggle/working/wookie/images

After the download, I'll be moving the images from the images folder of each respective label to the label folder itself - so for example from /kaggle/working/yoda/images to /kaggle/working/yoda and I'll be deleting all the empty images folder.

If you would like to reproduce the exact same thing, run the command in console first followed by the command in the notebook and so on in the provided order which is as follows.

Run the following in console

cd yoda

After above command, run the following cell

google_crawler = GoogleImageCrawler()
google_crawler.crawl(keyword='baby yoda', max_num=50)

Run the following in console one line at a time

cd ..
cd luke

After above command, run the following cell

google_crawler = GoogleImageCrawler()
google_crawler.crawl(keyword='luke skywalker', max_num=50)

Run the following in console one line at a time

cd ..
cd wookie

After above command, run the following cell

google_crawler = GoogleImageCrawler()
google_crawler.crawl(keyword='wookie', max_num=50)

3.3 Move images

Run the following in console one line at a time

cd ..

Now I'll be moving the images from /images folders to their respective labels and deleting the /images folder. Run the following in console one line at a time

mv -v yoda/images/* yoda
mv -v luke/images/* luke
mv -v wookie/images/* wookie
rmdir yoda/images
rmdir luke/images
rmdir wookie/images

Once done, you may check run pwd in console to check your current working directory. It must show /kaggle/working.

4. Data Loaders

back to top

4.1 For a single label

path = Path('/kaggle/working/yoda')
dls = ImageDataLoaders.from_folder(path, valid_pct=0.5, batch_size=10, item_tfms=Resize(224))
dls.valid.show_batch(max_n=4, nrows=1)

4.2 For the model building

characters = DataBlock(
    blocks=(ImageBlock, CategoryBlock), 
    get_items=get_image_files, 
    splitter=RandomSplitter(valid_pct=0.2, seed=42),
    get_y=parent_label,
    item_tfms=Resize(256))

# Creating the dataloader
path = Path('/kaggle/working')
dls = characters.dataloaders(path)

# checking the images
dls.valid.show_batch(max_n=18, nrows=3)

5. Model Building

back to top

5.1 Training

learn = cnn_learner(dls, resnet18, metrics=error_rate)
learn.fine_tune(4)
Downloading: "https://download.pytorch.org/models/resnet18-5c106cde.pth" to /root/.cache/torch/hub/checkpoints/resnet18-5c106cde.pth
epoch train_loss valid_loss error_rate time
0 1.834176 2.258008 0.600000 00:07
epoch train_loss valid_loss error_rate time
0 1.978312 1.446693 0.566667 00:06
1 1.585535 0.461595 0.133333 00:05
2 1.268723 0.311160 0.066667 00:05
3 1.044068 0.287985 0.066667 00:05

5.2 Prediction

uploader = widgets.FileUpload()
uploader
def helper():
    img = PILImage.create(uploader.data[0])
    img.show()
    pred,pred_idx,probs = learn.predict(img)
    lbl_pred = widgets.Label()
    lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'
    print(lbl_pred)
helper()
Label(value='Prediction: yoda; Probability: 0.9999')
helper()
Label(value='Prediction: luke; Probability: 0.9567')
helper()
Label(value='Prediction: wookie; Probability: 0.9970')
If you liked the notebook, please drop a upvote. Thank you.✅