Train a simple CNN using free computing resources, then deploy it in the browser to classify cats and dogs
We will learn to create a simple Convolutional Neural Network for binary classification using Tensorflow, train the model with free resources using Google Colab, and deploy the trained model in the browser using Tensorflow.js.
Find a simple demo here (not mobile-friendly yet) and code for the demo here. Find the code for model training and saving here on Colab.
a simple demo
Note that most codes used for the classification task in this tutorial were from the Google Machine Learning practica, read it if you want to see a much more detailed tutorial on image classification. Also, if you are not already familiar, read more about CNN here and Colab here (a detailed tutorial on Colab usage can be found here).
Train CNN using Tensorflow in Google Colab
The model will train on a filtered version of Kaggle Dogs vs. Cats dataset. The filtered dataset contains 2,000 labeled color images of cats and dogs: 1,000 cat examples for training, 1,000 dog examples for training, 500 cat examples for validation, and 500 dog examples for validation.
Open a Colab notebook and type the following into a code cell to download the filtered dataset as a zip file:
!wget --no-check-certificate -O /tmp/
Unzip the file and obtain addresses to the training and validation datasets:
在Google Colab上利用Tensorflow训练CNN
这个模型将在Kaggle Dogs vs. Cats数据集的过滤版本上进行训练。经过过滤的数据集包含了2000张带有标签的猫狗彩色图像:1000张用于训练的猫图像示例、1000张用于训练的狗图像示例、500张用于验证的猫图像示例和500张用于验证的狗图像示例。
!wget –no-check-certificate -O /tmp/
import osnimport zipfilenn# extract all files from the downloaded zip filen# the resulting root folder is named 'tmp' and it containsn# a folder named 'cats_and_dogs_filtered', whichn# contains two separate folders: 'train' and 'validation'nlocal_zip = '/tmp/'nzip_ref = zipfile.ZipFile(local_zip, 'r')nzip_ref.extractall('/tmp')nzip_ref.close()nn# construct addresses for two datasetsnbase_dir = '/tmp/cats_and_dogs_filtered'ntrain_dir = os.path.join(base_dir, 'train')nvalidation_dir = os.path.join(base_dir, 'validation')nn# construct addresses for two classes of examplesntrain_cats_dir = os.path.join(train_dir, 'cats')ntrain_dogs_dir = os.path.join(train_dir, 'dogs')nval_cats_dir = os.path.join(validation_dir, 'cats')nval_dogs_dir = os.path.join(validation_dir, 'dogs')
These addresses will later be used to generate batched examples.
Other than reading about the data here, let’s display an random image from the datasets to get more familiar with our data 😉
import randomnncat_img_files = [os.path.join(train_cats_dir, f) for f in train_cats_fnames]ndog_img_files = [os.path.join(train_dogs_dir, f) for f in train_dogs_fnames]nimg_path = random.choice(cat_img_files+dog_img_files)nimg = load_img(img_path)nnplt.imshow(img)nplt.grid(False)nplt.xticks([])nplt.yticks([])
an example from the cat class
Data pre-processing is an important part of any learning system, especially when the dataset is small, which is true in our case. Pre-process the images in the training set using data augmentation techniques, including re-scale, rotation, shifting, zooming, flipping, and shearing:
# The ImageDataGnerator class generates batches of tensor image data n# and their labelswith real-time data augmentationnfrom keras.preprocessing.image import ImageDataGeneratornn# apply data augmentation of the training setntrain_datagen= ImageDataGenerator(n # normalize pixel values to be between [0,1]n # note that the images uses the byte image pixel format, n # which stores a vector of three numbers (i.e. RGB) as an 8-bit integer,n # giving a range of possible values from 0 to 255 (b/c 2^8).n # read more about pixel values here: n # rescale=1./255, n # Read about the following arguments: n # rotation_range= 40,n width_shift_range= 0.2,n height_shift_range= 0.2,n shear_range= 0.2, n zoom_range = 0.2, n horizontal_flip= True)nn# do not augmentent the validation set, n# just normalize the pixel valuenval_datagen= ImageDataGenerator(rescale=1./255)nn# flow_from directory() takes the path to a directory n# and generates batches of augmented datantrain_generator = train_datagen.flow_from_directory(n train_dir, n # resize all images to 150x150, will give reasons later n target_size = (150, 150),n # 20 examples will be used in each iteration, or, n # one gradient update of model trainingn batch_size = 20,n # binary labels for binary cross entropy lossn class_mode='binary') nnval_generator = val_datagen.flow_from_directory(n validation_dir, n target_size = (150, 150),n batch_size = 20,n class_mode = 'binary')
Build a simple model using Keras.
The input layer of the model will accepts raw pixels from images that have width and height of 150 and three color channels; this is the reason why we reshaped all images in our dataset to that a size of 150 by 150.
After the input layer, the model contains three Convolutional layers with ReLU activation and three Max Pooling layers. The details of the structures of theses hidden layers will be included in the comments of code snippets below.
The model then uses a Flatten layer to map the resulting features to a 1D tensor in order to feed the existing calculation into a fully-connected layer of 512 hidden units. After that, a dropout regularization of rate 0.5 is applied to to further prevent over-fitting.
Finally, the output layer of the model uses a Sigmoid function as the activation function; it squeezes the output to a class score between 0 and 1, which is perfect for our task since the dataset labels the cat class with 0 ad the dog class with 1.
from tensorflow.keras import layersnfrom tensorflow.keras import Modelnfrom tensorflow.keras.optimizers import RMSpropnn# the input images are required to have width and height of 150n# and 3 channels, one for each colorn# this is the reason why all images were reshaped aboveninput_layer = layers.Input(shape=(150, 150, 3))nn# note that all layers from now on is a function of the previous layernn# the first convolutional layer has 16 3x3 filtersn# output volume = (W-F+2P)/S+1 = (150-3+2(0))/1+1 = 148 --> (148, 148, 16), where n# W:=input size, F:=filter size, S:=stride (moving filter 1 pixel at a time), P:=paddingn# num of parameter s = inputxFxFxoutput = 3x3x3x16 + 3 = 432+ 16 = 448nx = layers.Conv2D(16, 3, activation='relu')(input_layer) # relu := max(0, x)n# max pooling layer with 2x2 windown# output volume = (W-F)/2+1 = (148-2)/2+1 = 74 --> (74, 74, 16)nx = layers.MaxPooling2D(2)(x)n# The second convolutional layer has 32 3x3 filtersn# output volume = (W-F+2P)/S+1 = (74-3+2(0))/1+1 = 148 --> (72, 72, 32)n# num of parameters = inputxFxFxoutput = 16x3x3x32 + 32 = 4640nx = layers.Conv2D(32, 3, activation='relu')(x)n# max pooling layer with 2x2 windown# output volume = (W-F)/2+1 = (72-2)/2+1 = 36 --> (36, 36, 32)nx = layers.MaxPooling2D(2)(x)n# The third convolutional layer has 64 3x3 filters n# output volume = (W-F+2P)/S+1 = (36-3+2(0))/1+1 = 34 --> (34, 34, 64)n# num of parameters = inputxFxFxoutput = 32x3x3x64 + 64 = 18496nx = layers.Conv2D(64, 3, activation='relu')(x)n# max pooling layer with 2x2 windown# output volume = (W-F)/2+1 = (34-2)/2+1 = 17 --> (17, 17, 64)nx = layers.MaxPooling2D(2)(x)n# the shape is really (None, 17, 17, 64)nn# flatten feature map to a 1D tensor n# output shape = (None, 17x17x64) = (None, 18496)nx = layers.Flatten()(x)n# fully connected layer with relu activation and 512 hidden unitsn# output shape = (None, 512)n# num of parameters = 18496 * 512 + 512= 9470464nx = layers.Dense(512, activation='relu')(x)nn# dropout regularization with dropout rate of 0.5n# output shape = (None, 512)nx = layers.Dropout(0.5)(x)n# output layer with a signal node and a sigmoid activation, n# since we have a binary classification problemn# output shape = (None, 1)n# num of parameters = 1x512 + 1 = 513noutput = layers.Dense(1, activation='sigmoid')(x) # [0,1]nn# the model is defined by input and output tensor(s)nmodel = Model(input_layer, output)nn# print out summary of the model to confirm our calculations abovenprint(model.summary())
define model structure and training process
After model definition, we then define the training configuration and plot the learning curves. The loss function will be binary cross entropy and the optimizer will be RMSprop. The model will be trained on data generatedbatch-by-batch (we defined each batch to contain 20 examples) for 100 epochs, each containing 100 batches of examples (because 2,000/20 = 100). The model will also be validated on data generated batch-by-batch (also 20 examples per batch) for 50 epochs, each containing 50 examples (because 1,000/20 = 50).
# define the model training configurationnmodel.compile(loss='binary_crossentropy', optimizer=RMSprop(lr=0.001), metrics=['acc'])nn# train model and save the processnhistory = model.fit_generator(train_generator,n steps_per_epoch=100,n epochs=100,n validation_data=val_generator,n validation_steps=50,n verbose=2)nn# visualize the training processn# all data are pulled from 'history'nacc = history.history['acc']nval_acc = history.history['val_acc']nnloss = history.history['loss']nval_loss = history.history['val_loss']nnepochs = range(len(acc))nnplt.plot(epochs, acc, label='accuracy')nplt.plot(epochs, val_acc, label='validation')nplt.title('Training and validation accuracy')nplt.legend()nplt.figure()nnplt.plot(epochs, loss, label='accuracy')nplt.plot(epochs, val_loss, label='validation')nplt.title('Training and validation loss')nplt.legend()
train the model and plot the learning curves
the learning curves
As you can see, the classification model we constructed is working but is not yet a great model — the accuracy is only around 80% and the losses are still high. After you get the model to work in the browser, it would be a good idea to test a few model structure of you own or tune hyper-parameters of the current model to achieve a higher accuracy. You could also try to to train the model with the full Kaggle dataset or try to leverage a pre-trained classification models such as Inception-V3. The details of training a better model for our task can all be found in the Google Image Classification Practica linked above.
模型定义之后,我们还需定义训练配置并绘制出学习曲线。损失函数为二元交叉熵,优化器为RMSprop。模型把100个epoch的数据生成batch进行批量训练(我们定义每个批包含20个示例),每个epoch包含100个批示例(因为2000 /20 = 100)。该模型还将把50个epoch的数据生成batch进行批量验证(同样是每个批包含20个示例),每个epoch包含50个示例(因为1000 /20 = 50)。
# from install necessary libraires and perform authorizationn!apt-get install -y -qq software-properties-common python-software-properties module-init-toolsn!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/nulln!apt-get update -qq 2>&1 > /dev/nulln!apt-get -y install -qq google-drive-ocamlfuse fusenfrom google.colab import authnauth.authenticate_user()nfrom oauth2client.client import GoogleCredentialsncreds = GoogleCredentials.get_application_default()nimport getpassn!google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret} < /dev/null 2>&1 | grep URLnvcode = getpass.getpass()n!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret={creds.client_secret}nn# mount your Google Driven!mkdir -p ggdriven!google-drive-ocamlfuse ggdrive
Save the trained model structure and weights to Google drive
# save the'cats_dogs.h5')nn# install tensorflow.jsn!pip install tensorflowjsnn# create weight files and a json file containing structure of the modeln!mkdir modeln!tensorflowjs_converter --input_format keras cats_dogs.h5 model/nn# zip the model upn!zip -r modelnn# move filen!mv ggdrive
Establish connection with your Google Drive:
Save the model structure and weights and move the files into your Google Drive:
The method on line #2 saves a Keras model into a single HDF5 file that contains the architecture, weights, training configuration, and the state of the optimizer of the model; see more details here. Your resulting ‘model’ folder should contain files similar to the following:
将训练好的模型结构及权重保存在Google Drive上
建立与Google Drive的连接:
保存模型结构和权重,并将文件移动至Google Drive上:
Reconstruct the pre-trained model in the browser using TensorFlow.js
Make sure you have a place to host your project, such as After that, include the following in your HTML file to enable the usage of Tensorflow.js:
Download the files from google drive to your project folder, then load the saved model from the files:
model = await tf.loadModel('cat_dog_model/model.json');
Create an element to display the image meant to be classified and another to show the classification result:
Your image contains a .
Use the model we loaded to perform classification on inputImage and display the result on the HTML label tag:
// step 1: grab the raw input image pixel; n// unfortunatly we cannot get pixel data directly from the html img elementnn// get the html element that contains the input image nvar img = document.getElementById('inputImage');n// create an html canvas elementn// the canvas element is a container for grahpics that allows javascript to draw graphics on the fly nvar canvas = document.createElement('canvas');n// context is a CanvasRenderingContext2D object (which represents a 2D rendering context) nvar context = canvas.getContext('2d');n// set the width and the height of the canvas to be n// the same as inputImagencanvas.width = img.width;ncanvas.height = img.height;n// position the imae on the canvasncontext.drawImage(img, 0, 0);n/* CanvasRenderingContext2D.getImageData(sx, sy. sw, sh) returns an ImageData that n represents the underlying pixel data for the area of the canvas starting at (sx, sy)n and has width of sw and height of sh */nvar imgData = context.getImageData(0, 0, img.width, img.height);nn// step 2: turn raw pixels into tensors, so our model could work with the data n// use tidy() to dispose any possible unsued variablesnpreprocessed_imgData = tf.tidy(()=>{n// convert the image data into a tensorn// tensor.shape = (width, height, 3)nlet tensor = tf.fromPixels(imgData, numChannels=3);n// resize the tensor into (150, 150, 3)n// because the input layer of our models requires it nvar resized = tf.image.resizeBilinear(tensor, [150, 150]).toFloat();n// now the tensor has shape (150, 150, 3)n// reshape it to (1, 150, 150, 3) because the input layer takes 4 dimensions (i.e. batch size=1)nconst input_tensor = resized.reshape([1,150,150,3]);n// normalize the pixel values to be in the range of [0, 1]nconst offset = tf.scalar(255.0);nconst normalized = tf.scalar(1.0).sub(input_tensor.div(offset));n// return the normalized imagenreturn normalized;})nn// step 3: make perdiction n// dataSync() synchronously downloads the values from the tf.Tensorn// this could be why the demo has performance issue on mobil devicesnpred = model.predict(preprocessed_imgData).dataSync();nn// step 4: display the perdiction for users nvar catOrDog = pred <= 0.5 ? 'cat' : 'dog';ndocument.getElementById('catOrDogLabel').innerHTML = catOrDog;
I set the decision threshold to be 0.5, but it could be reset to anything reasonable.
从Google Drive中下载文件至您的项目文件夹,然后从文件中上传保存好的模型:
model = await tf.loadModel('cat_dog_model/model.json');
Your image contains a .
