{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Zfh8tpTFS9Mv" }, "source": [ "# **Transfer Learning: Vision Transformers**\n", "## **Image Classification**\n", "\n", "---\n", "\n", "Transfer learning is a technique where a pre-trained model, which has already learned features from one task, is used as the starting point for a similar task. This saves time and resources by leveraging the existing knowledge of the model instead of training a new model from scratch.\n", "\n", "In this tutorial, we will be looking at how we can apply transfer learning for image classification with a Vision Transformer on any dataset of our choice.\n", "\n", "In transfer learning, we do not need to update the parameters of the entire model. Since our ViT has learned feature representations from millions of images, we can just choose to train the very last layers of our model to make it perform well on our new dataset.\n", "\n", "For this tutorial, we will be using [google/vit-base-patch16-224](https://huggingface.co/google/vit-base-patch16-224) model from the Hugging Face hub." ] }, { "cell_type": "markdown", "metadata": { "id": "PdBfBalXS9Mw" }, "source": [ "### Let's begin by importing some necessary modules and functions\n", "---" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "execution": { "iopub.execute_input": "2023-12-14T19:33:40.620106Z", "iopub.status.busy": "2023-12-14T19:33:40.619322Z", "iopub.status.idle": "2023-12-14T19:33:47.299161Z", "shell.execute_reply": "2023-12-14T19:33:47.298218Z", "shell.execute_reply.started": "2023-12-14T19:33:40.620063Z" }, "id": "GCdKLbahS9Mx" }, "outputs": [], "source": [ "import io\n", "from PIL import Image\n", "\n", "import numpy as np\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "\n", "import torch\n", "import torch.nn as nn\n", "\n", "from huggingface_hub import notebook_login\n", "\n", "from datasets import load_dataset, DatasetDict\n", "\n", "from transformers import AutoImageProcessor, ViTForImageClassification\n", "\n", "from transformers import Trainer, TrainingArguments\n", "\n", "import evaluate" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 160, "referenced_widgets": [ "c0054da808034b4fa38a2798549a6464", "348a56fa955d4da2b74ab27c9e98cb3a", "08b2f5eb1a4b44f7b79bce900ff8e605", "4a1fb9f1c87249b79ef40e3f26e7fbb8", "24d6a46a49b7497eaf47b39909d12341", "6bd25641f34e48f8a5ac90535dd63b92", "c4fe8672e0de4c2b95b951f93b46dcfa", "f4f2072e43e3465591cfb6461f2fcb51", "45fa4f434a35441bbb5feba85d7415cc", "bb858bbb9e964377b0bb6067576f6374", "68624a5f14644a1bb96b5afe1c547ca8", "2aae1881e3eb4d39aa9a2b10dc1408ed", "43e76a9f3a6649c5b7ce805351c5c84f", "777054a52e044264bb9ba385c94797fc", "260ba6716e60447c8e398782fb60a8e7", "da8b18811a3e4d70819bc6a8fac9b0f2", "2282051ebe9b439fa1d68f5be3203799", "0a6d640b6dd9458d8228dc7f400e40a3", "841acd1224444c8180ca0a97f1cdf1eb", "1b21ab0bd37a489b904fca5224e410b4", "67187fa277164483996074d7d016eb37", "179d7b08227f42c081f063e4d22c3963", "93962dd494d448068f4f54deeb4aa8c3", "a0cc301509ee4900ab809ac170c91a51", "ce06c8cab8e945749fac9f9b04f2c5bd", "f256357ec1f94d86861fa73d4604439f", "b698852709894d16ab5026a9b5a01837", "24037cd9a16c49348a77fedc69a7cee3", "4818d2cea6e04d2a9b319064c096a346", "724c848f98fe4ef1a8a8b21bd259e509", "55430e1a092640d9b7aaf7292a267177", "49f6f3d57a2a4573882e01a948379506", "2f3c5b9cf8ba4bf5a32d432681169b8c" ] }, "execution": { "iopub.execute_input": "2023-12-14T19:33:47.301085Z", "iopub.status.busy": "2023-12-14T19:33:47.300396Z", "iopub.status.idle": "2023-12-14T19:33:47.323947Z", "shell.execute_reply": "2023-12-14T19:33:47.323048Z", "shell.execute_reply.started": "2023-12-14T19:33:47.301054Z" }, "id": "d2WXhBFGS9My", "outputId": "6a3452a7-aa35-41b2-892e-95dd7f4511d6" }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "506c7c0306354b9c9fb123cafcaa1521", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(HTML(value='