Delete train_model.ipynb

443b306a · Brummans, Nick · 482abb13 · 482abb13
Commit 443b306a authored 3 years ago by Brummans, Nick
--- a/train_model.ipynb
+++ b/train_model.ipynb
-{
-  "cells": [
-    {
-      "cell_type": "markdown",
-      "source": [
-        "![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/NotebookVM/tutorials/quickstart-ci/AzureMLin10mins.png)"
-      ],
-      "metadata": {}
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "# Setup\r\n",
-        "\r\n",
-        "It is important to maintain a conda dependency file and/or MLstudio environment. \r\n",
-        "\r\n",
-        "Every user of the workspace will use their own compute instance, with conda files and environments it is easy to install dependencies on these different compute instances."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "!conda env update -n workshop_env --file conda-notebook.yml\r\n",
-        "!conda activate workshop_env\r\n",
-        "!python -m ipykernel install --user --name=workshop_env --display-name=workshop_env"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "Refresh page and change kernel to workshop_env."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "Connect to the workspace for easier Azure commands."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from azureml.core import Workspace\r\n",
-        "\r\n",
-        "ws = Workspace.from_config()\r\n",
-        "print(f'WS name: {ws.name}\\nRegion: {ws.location}\\nSubscription id: {ws.subscription_id}\\nResource group: {ws.resource_group}')"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646660446637
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "## Create training environment\r\n",
-        "\r\n",
-        "We will use the environment created in **ETL.ipynb**"
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from azureml.core import Environment\r\n",
-        "\r\n",
-        "new_update_env = False\r\n",
-        "env_name='workshop-training-env'\r\n",
-        "# pathing in notebook folder\r\n",
-        "conda_path='conda-training.yml'\r\n",
-        "\r\n",
-        "if new_update_env:\r\n",
-        "    # create new environment\r\n",
-        "    env = Environment.from_conda_specification(name=env_name, file_path=conda_path)\r\n",
-        "    env.register(workspace=ws)\r\n",
-        "    # We can directly build the environment - this will create a new Docker \r\n",
-        "    # image in Azure Container Registry (ACR), and directly 'bake in' our dependencies \r\n",
-        "    # from the conda definition. When we later use the Environment, all AML will need to \r\n",
-        "    # do is pull the image for environment, thus saving the time for potentially a \r\n",
-        "    # long-running conda environment creation.\r\n",
-        "    build = env.build(workspace=ws)\r\n",
-        "    build.wait_for_completion(show_output=True)\r\n",
-        "else:\r\n",
-        "    # load existing environment\r\n",
-        "    env = Environment.get(workspace=ws, name=env_name)"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646660605973
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "\r\n",
-        "## Create experiment\r\n",
-        "Create an experiment to track the runs in your notebook. A workspace can have muliple experiments. We will create an experiment specifically for training our model."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from azureml.core import Experiment\r\n",
-        "\r\n",
-        "experiment_name = 'train_model_name'\r\n",
-        "exp = Experiment(workspace=ws, name=experiment_name)"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646660688636
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "### Attach existing compute resource\r\n",
-        "\r\n",
-        "By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU or CPU support. "
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "# list compute targets\r\n",
-        "print(ws.compute_targets.keys())"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646660707664
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "We will use our cluster. It is better to keep the training compute (which probably has better specs) seperate from the notebook compute. This ensure a lower cost (only use heavy compute in the place where it is needed) and a central compute instance for every user of the workspace."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from azureml.core.compute import AmlCompute\r\n",
-        "from azureml.core.compute import ComputeTarget\r\n",
-        "import os\r\n",
-        "\r\n",
-        "# choose compute target. Look at compute tab -> clusters for options OR look at list in cell above.\r\n",
-        "compute_name = \"cpu-cluster\"\r\n",
-        "\r\n",
-        "if compute_name in ws.compute_targets:\r\n",
-        "    compute_target = ws.compute_targets[compute_name]\r\n",
-        "    print(\"found compute target: \" + compute_name)\r\n",
-        "else:\r\n",
-        "    print(\"Compute not found, create compute in compute tab (cluster) with subnet in advanced settings if working in production subscription.\")"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646660950951
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "## Import Data\n",
-        "\n",
-        "Before you train a model, you need to understand the data you're using to train it. In this section we will:\n",
-        "\n",
-        "* Load datasets created in the ETL.ipynb notebook\n",
-        "* Display some sample images\n",
-        "\n",
-        "Lets connect to dataset by mounting on compute. It is also possible to download on compute."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from azureml.core import Dataset\r\n",
-        "\r\n",
-        "# get dataset by name\r\n",
-        "image_dataset = Dataset.get_by_name(ws, \"img_files_name\")\r\n",
-        "labels_dataset = Dataset.get_by_name(ws, \"labels_name\")\r\n",
-        "\r\n",
-        "# mount datasets on compute\r\n",
-        "image_mount = image_dataset.mount()\r\n",
-        "image_mount.start()\r\n",
-        "image_mount_folder = image_mount.mount_point\r\n",
-        "\r\n",
-        "# load dataset as pandas frame\r\n",
-        "labels_pandas = labels_dataset.to_pandas_dataframe()"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646660986167
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "### Take a look at the data\n",
-        "\n",
-        "Lets look at the pandas dataframe with labels."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "labels_pandas.head()"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646661039048
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "Load the images files then use `matplotlib` to plot 3 random images from the dataset with their labels above them. "
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "import matplotlib.pyplot as plt\n",
-        "import numpy as np\n",
-        "import glob\n",
-        "from skimage import io\n",
-        "\n",
-        "train = glob.glob(image_mount_folder + \"/*\")\n",
-        "\n",
-        "# now let's show some images from the traininng set.\n",
-        "count = 0\n",
-        "sample_size = 3\n",
-        "plt.figure(figsize=(20, 4))\n",
-        "for name in train[:sample_size]:\n",
-        "    count = count + 1\n",
-        "    plt.subplot(1, sample_size, count)\n",
-        "    plt.axhline(\"\")\n",
-        "    plt.axvline(\"\")\n",
-        "    # get label from filename\n",
-        "    image = name.split(\"/\")[-1].split(\".\")[0]\n",
-        "    # get label from pandas frame\n",
-        "    label_row = labels_pandas.loc[labels_pandas['image_id'] == image]\n",
-        "    label = label_row.columns[(label_row == 1).iloc[0]][0]\n",
-        "    # plot with text\n",
-        "    plt.text(0, 0, label, horizontalalignment=\"left\", verticalalignment=\"top\", fontsize=18, backgroundcolor='white')\n",
-        "    image = io.imread(name)\n",
-        "    plt.imshow(image)\n",
-        "plt.show()"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "gather": {
-          "logged": 1646661099911
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "# stop mount point\r\n",
-        "image_mount.stop()"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646661109866
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "## Train on a remote cluster\r\n",
-        "\r\n",
-        "For this task, you submit the job to run on the remote training cluster you set up earlier.  To submit a job you:\r\n",
-        "* Create a directory\r\n",
-        "* Create a training script\r\n",
-        "* Create a script run configuration\r\n",
-        "* Submit the job \r\n",
-        "\r\n",
-        "### Create a directory\r\n",
-        "\r\n",
-        "Create a directory to deliver the necessary code from your computer to the remote resource."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "import os\r\n",
-        "script_folder = os.path.join(os.getcwd(), \"scripts\")\r\n",
-        "os.makedirs(script_folder, exist_ok=True)"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646661124819
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "### Create a training script\r\n",
-        "\r\n",
-        "To submit the job to the cluster, first create a training script. Run the following code to create the training script called `train.py` in the directory you just created. "
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "%%writefile $script_folder/train.py\r\n",
-        "\r\n",
-        "import os\r\n",
-        "import argparse\r\n",
-        "import joblib\r\n",
-        "\r\n",
-        "from azureml.core import Run\r\n",
-        "from azureml.core import Dataset as DatasetAzure\r\n",
-        "\r\n",
-        "import torch\r\n",
-        "from torch.utils.data import Dataset, DataLoader\r\n",
-        "import torch.nn as nn\r\n",
-        "import torch.nn.functional as F\r\n",
-        "import torch.optim as optim\r\n",
-        "import torchvision\r\n",
-        "\r\n",
-        "import cv2\r\n",
-        "\r\n",
-        "import albumentations as A\r\n",
-        "from albumentations.pytorch import ToTensorV2\r\n",
-        "\r\n",
-        "from sklearn.model_selection import train_test_split\r\n",
-        "from sklearn.metrics import roc_auc_score, confusion_matrix\r\n",
-        "\r\n",
-        "import numpy as np\r\n",
-        "\r\n",
-        "import seaborn as sns\r\n",
-        "\r\n",
-        "# model dataset \r\n",
-        "class PlantDataset(Dataset):\r\n",
-        "    \r\n",
-        "    def __init__(self, df, mount_folder, transforms=None):\r\n",
-        "    \r\n",
-        "        self.df = df\r\n",
-        "        self.mount_folder = mount_folder\r\n",
-        "        self.transforms = transforms\r\n",
-        "        \r\n",
-        "    def __len__(self):\r\n",
-        "        return self.df.shape[0]\r\n",
-        "    \r\n",
-        "    def __getitem__(self, idx):\r\n",
-        "        image_src = self.mount_folder + '/' + self.df.loc[idx, 'image_id'] + '.jpg'\r\n",
-        "        image = cv2.imread(image_src, cv2.IMREAD_COLOR)\r\n",
-        "        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)\r\n",
-        "        labels = self.df.loc[idx, ['healthy', 'multiple_diseases', 'rust', 'scab']].values\r\n",
-        "        labels = torch.from_numpy(labels.astype(np.int8))\r\n",
-        "        labels = labels.unsqueeze(-1)\r\n",
-        "        \r\n",
-        "        if self.transforms:\r\n",
-        "            transformed = self.transforms(image=image)\r\n",
-        "            image = transformed['image']\r\n",
-        "\r\n",
-        "        return image, labels\r\n",
-        "\r\n",
-        "# custom model class\r\n",
-        "class PlantModel(nn.Module):\r\n",
-        "    \r\n",
-        "    def __init__(self, num_classes=4):\r\n",
-        "        super().__init__()\r\n",
-        "        \r\n",
-        "        self.backbone = torchvision.models.resnet18(pretrained=True)\r\n",
-        "        \r\n",
-        "        in_features = self.backbone.fc.in_features\r\n",
-        "\r\n",
-        "        self.logit = nn.Linear(in_features, num_classes)\r\n",
-        "        \r\n",
-        "    def forward(self, x):\r\n",
-        "        batch_size, C, H, W = x.shape\r\n",
-        "        \r\n",
-        "        x = self.backbone.conv1(x)\r\n",
-        "        x = self.backbone.bn1(x)\r\n",
-        "        x = self.backbone.relu(x)\r\n",
-        "        x = self.backbone.maxpool(x)\r\n",
-        "\r\n",
-        "        x = self.backbone.layer1(x)\r\n",
-        "        x = self.backbone.layer2(x)\r\n",
-        "        x = self.backbone.layer3(x)\r\n",
-        "        x = self.backbone.layer4(x)\r\n",
-        "        \r\n",
-        "        x = F.adaptive_avg_pool2d(x,1).reshape(batch_size,-1)\r\n",
-        "        x = F.dropout(x, 0.25, self.training)\r\n",
-        "\r\n",
-        "        x = self.logit(x)\r\n",
-        "\r\n",
-        "        return x\r\n",
-        "\r\n",
-        "# custom cross entropy class\r\n",
-        "class DenseCrossEntropy(nn.Module):\r\n",
-        "\r\n",
-        "    def __init__(self):\r\n",
-        "        super(DenseCrossEntropy, self).__init__()\r\n",
-        "        \r\n",
-        "        \r\n",
-        "    def forward(self, logits, labels):\r\n",
-        "        logits = logits.float()\r\n",
-        "        labels = labels.float()\r\n",
-        "        \r\n",
-        "        logprobs = F.log_softmax(logits, dim=-1)\r\n",
-        "        \r\n",
-        "        loss = -labels * logprobs\r\n",
-        "        loss = loss.sum(-1)\r\n",
-        "\r\n",
-        "        return loss.mean()\r\n",
-        "\r\n",
-        "# function for collecting input arguments\r\n",
-        "def get_runtime_args():\r\n",
-        "    parser = argparse.ArgumentParser()\r\n",
-        "    parser.add_argument('--image-folder', type=str)\r\n",
-        "    parser.add_argument('--labels', type=str)\r\n",
-        "    parser.add_argument('--size', type=int)\r\n",
-        "    parser.add_argument('--split', type=float)\r\n",
-        "    parser.add_argument('--batch-size', type=int)\r\n",
-        "    parser.add_argument('--num-workers', type=int)\r\n",
-        "    parser.add_argument('--num-classes', type=int)\r\n",
-        "    parser.add_argument('--learning-rate', type=float)\r\n",
-        "    parser.add_argument('--epochs', type=int)\r\n",
-        "    args = parser.parse_args()\r\n",
-        "    return args\r\n",
-        "\r\n",
-        "# We define our main class here \r\n",
-        "def main():\r\n",
-        "    args = get_runtime_args()\r\n",
-        "    # A run represents a single trial of an experiment. Runs are used to monitor the asynchronous execution of a trial, \r\n",
-        "    # log metrics and store output of the trial, and to analyze results and access artifacts generated by the trial.\r\n",
-        "    run = Run.get_context()\r\n",
-        "\r\n",
-        "    # define a training image transformer\r\n",
-        "    transforms_train = A.Compose([\r\n",
-        "        A.RandomResizedCrop(height=args.size, width=args.size, p=1.0),\r\n",
-        "        A.Flip(),\r\n",
-        "        A.ShiftScaleRotate(rotate_limit=1.0, p=0.8),\r\n",
-        "        # Pixels\r\n",
-        "        A.OneOf([\r\n",
-        "            A.IAAEmboss(p=1.0),\r\n",
-        "            A.IAASharpen(p=1.0),\r\n",
-        "            A.Blur(p=1.0),\r\n",
-        "        ], p=0.5),\r\n",
-        "    # Affine\r\n",
-        "        A.OneOf([\r\n",
-        "            A.ElasticTransform(p=1.0),\r\n",
-        "            A.IAAPiecewiseAffine(p=1.0)\r\n",
-        "        ], p=0.5),\r\n",
-        "    A.Normalize(p=1.0),\r\n",
-        "    ToTensorV2(p=1.0),\r\n",
-        "    ])\r\n",
-        "\r\n",
-        "    # define a validation image transformer\r\n",
-        "    transforms_valid = A.Compose([\r\n",
-        "        A.Resize(height=args.size, width=args.size, p=1.0),\r\n",
-        "        A.Normalize(p=1.0),\r\n",
-        "        ToTensorV2(p=1.0),\r\n",
-        "    ])\r\n",
-        "\r\n",
-        "    # get labels input dataset by id\r\n",
-        "    ws = run.experiment.workspace\r\n",
-        "    label_dataset = DatasetAzure.get_by_id(ws, id=args.labels)\r\n",
-        "\r\n",
-        "    # get image mount folder\r\n",
-        "    image_mount_folder = args.image_folder\r\n",
-        "\r\n",
-        "    # convert label dataset to pandas for ease of use\r\n",
-        "    labels_pandas = label_dataset.to_pandas_dataframe()\r\n",
-        "\r\n",
-        "    # split dataset in train and validation\r\n",
-        "    train, valid = train_test_split(labels_pandas, test_size=args.split)\r\n",
-        "    # reset indexes\r\n",
-        "    train = train.reset_index(drop=True)\r\n",
-        "    valid = valid.reset_index(drop=True)\r\n",
-        "\r\n",
-        "    # get Datasets\r\n",
-        "    dataset_train = PlantDataset(df=train, mount_folder=image_mount_folder, transforms=transforms_train)\r\n",
-        "    dataset_valid = PlantDataset(df=valid, mount_folder=image_mount_folder, transforms=transforms_valid)\r\n",
-        "\r\n",
-        "    # get datasets in dataloaders\r\n",
-        "    dataloader_train = DataLoader(dataset_train, batch_size=args.batch_size, num_workers=args.num_workers, shuffle=True)\r\n",
-        "    dataloader_valid = DataLoader(dataset_valid, batch_size=args.batch_size, num_workers=args.num_workers, shuffle=False)\r\n",
-        "\r\n",
-        "    # load model\r\n",
-        "    model = PlantModel(num_classes=args.num_classes)\r\n",
-        "\r\n",
-        "    # set parameters for model training\r\n",
-        "    criterion = DenseCrossEntropy()\r\n",
-        "    plist = [{'params': model.parameters(), 'lr': args.learning_rate}]\r\n",
-        "    optimizer = optim.Adam(plist)\r\n",
-        "\r\n",
-        "    # start training\r\n",
-        "    train_loss = []\r\n",
-        "    valid_loss = []\r\n",
-        "    valid_score = []\r\n",
-        "\r\n",
-        "    # epoch loop\r\n",
-        "    for epoch in range(args.epochs):\r\n",
-        "\r\n",
-        "        print('  Epoch {}/{}'.format(epoch + 1, args.epochs))\r\n",
-        "        print('  ' + ('-' * 20))\r\n",
-        "\r\n",
-        "        model.train()\r\n",
-        "        tr_loss = 0\r\n",
-        "\r\n",
-        "        for step, batch in enumerate(dataloader_train):\r\n",
-        "\r\n",
-        "            images = batch[0]\r\n",
-        "            labels = batch[1]\r\n",
-        "            \r\n",
-        "            outputs = model(images)\r\n",
-        "            loss = criterion(outputs, labels.squeeze(-1))                \r\n",
-        "            loss.backward()\r\n",
-        "\r\n",
-        "            tr_loss += loss.item()\r\n",
-        "\r\n",
-        "            optimizer.step()\r\n",
-        "            optimizer.zero_grad()\r\n",
-        "\r\n",
-        "        # Validate\r\n",
-        "        model.eval()\r\n",
-        "        val_loss = 0\r\n",
-        "        val_preds = None\r\n",
-        "        val_labels = None\r\n",
-        "\r\n",
-        "        for step, batch in enumerate(dataloader_valid):\r\n",
-        "\r\n",
-        "            images = batch[0]\r\n",
-        "            labels = batch[1]\r\n",
-        "\r\n",
-        "            if val_labels is None:\r\n",
-        "                val_labels = labels.clone().squeeze(-1)\r\n",
-        "            else:\r\n",
-        "                val_labels = torch.cat((val_labels, labels.squeeze(-1)), dim=0)\r\n",
-        "\r\n",
-        "            with torch.no_grad():\r\n",
-        "                outputs = model(images)\r\n",
-        "\r\n",
-        "                loss = criterion(outputs, labels.squeeze(-1))\r\n",
-        "                val_loss += loss.item()\r\n",
-        "\r\n",
-        "                preds = torch.softmax(outputs, dim=1).data.cpu()\r\n",
-        "\r\n",
-        "                if val_preds is None:\r\n",
-        "                    val_preds = preds\r\n",
-        "                else:\r\n",
-        "                    val_preds = torch.cat((val_preds, preds), dim=0)\r\n",
-        "\r\n",
-        "        # update metrics\r\n",
-        "        train_loss.append(tr_loss / len(dataloader_train))\r\n",
-        "        valid_loss.append(val_loss / len(dataloader_valid))\r\n",
-        "        valid_score.append(roc_auc_score(val_labels, val_preds, average='macro'))\r\n",
-        "\r\n",
-        "    # Create confusion matrix with last epoch\r\n",
-        "    val_labels=np.argmax(val_labels, axis=1)\r\n",
-        "    val_preds=np.argmax(val_preds, axis=1)\r\n",
-        "    cf_matrix = confusion_matrix(val_labels, val_preds)\r\n",
-        "\r\n",
-        "    # make plot of matrix\r\n",
-        "    ax = sns.heatmap(cf_matrix, annot=True, cmap='Blues')\r\n",
-        "\r\n",
-        "    ax.set_title('Seaborn Confusion Matrix with labels\\n\\n');\r\n",
-        "    ax.set_xlabel('\\nPredicted Values')\r\n",
-        "    ax.set_ylabel('Actual Values ');\r\n",
-        "\r\n",
-        "    ## Ticket labels - List must be in alphabetical order\r\n",
-        "    ax.xaxis.set_ticklabels(['healthy','multiple_diseases','rust','scab'])\r\n",
-        "    ax.yaxis.set_ticklabels(['healthy','multiple_diseases','rust','scab'])\r\n",
-        "    fig = ax.get_figure()\r\n",
-        "\r\n",
-        "    # log results to ml studio\r\n",
-        "    run.log_list(name='train loss per epoch', value=train_loss)\r\n",
-        "    run.log_list(name='valid loss per epoch', value=valid_loss)\r\n",
-        "    run.log_list(name='valid score per epoch', value=valid_score)\r\n",
-        "    run.log_image(name='confusion matrix last epoch', plot=fig)\r\n",
-        "\r\n",
-        "    #copying to \"outputs\" directory, automatically uploads it to Azure ML\r\n",
-        "    output_dir = './outputs/'\r\n",
-        "    os.makedirs(output_dir, exist_ok=True)\r\n",
-        "    torch.save(model.state_dict(), os.path.join(output_dir, 'model.pth'))\r\n",
-        "\r\n",
-        "if __name__ == \"__main__\":\r\n",
-        "    main()"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "Notice how the script gets data and saves models:\r\n",
-        "\r\n",
-        "+ The training script reads input arguments:\r\n",
-        "\r\n",
-        "    - parser.add_argument(..)\r\n",
-        "\r\n",
-        "+ The training script saves your model state dictionary into a directory named outputs. <br/>\r\n",
-        "`torch.save(model.state_dict(), os.path.join(output_dir, 'model.pth'))`<br/>\r\n",
-        "Anything written in this directory is automatically uploaded into your workspace. You'll access your model from this directory later in the tutorial."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "### Configure the training job\n",
-        "\n",
-        "Create a **ScriptRunConfig** object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on. Configure the **ScriptRunConfig** by specifying:\n",
-        "\n",
-        "* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution. \n",
-        "* The compute target.  In this case you will use the \"cpu-cluster\"\n",
-        "* The training script name, train.py\n",
-        "* An environment that contains the libraries needed to run the script\n",
-        "* Arguments required from the training script. "
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from azureml.core import ScriptRunConfig\r\n",
-        "\r\n",
-        "args = ['--image-folder', image_dataset.as_mount(), # it is also possible to download image dataset on compute (as_download(), because mounting load files at the time of processing, it is usually faster than download.)\r\n",
-        "        '--labels', labels_dataset.as_named_input('labels_name'),\r\n",
-        "        '--size', 512, \r\n",
-        "        '--split', 0.2,\r\n",
-        "        '--batch-size', 4,\r\n",
-        "        '--epochs', 3,\r\n",
-        "        '--num-workers', 0,\r\n",
-        "        '--num-classes', 4,\r\n",
-        "        '--learning-rate', 5e-5]\r\n",
-        "\r\n",
-        "src = ScriptRunConfig(source_directory=\"./\",\r\n",
-        "                      script='scripts/train.py', \r\n",
-        "                      arguments=args,\r\n",
-        "                      compute_target=compute_target,\r\n",
-        "                      environment=env)"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646662405063
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "### Submit the job to the cluster\r\n",
-        "\r\n",
-        "Run the experiment by submitting the ScriptRunConfig object. And you can navigate to Azure portal to monitor the run.\r\n",
-        "\r\n",
-        "Since the call is asynchronous, it returns a **Preparing** or **Running** state as soon as the job is started."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "run = exp.submit(config=src)\r\n",
-        "run"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646662413760
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "\r\n",
-        "## Monitor a remote run\r\n",
-        "\r\n",
-        "Here is what's happening while you wait:\r\n",
-        "\r\n",
-        "- **Image creation**: A Docker image is created matching the Python environment specified by the Azure ML environment. The image is built and stored in the ACR (Azure Container Registry) associated with your workspace. Image creation and uploading takes **about 5 minutes**. \r\n",
-        "\r\n",
-        "  This stage happens once for each Python environment since the container is cached for subsequent runs.  During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs. If you prebuild the image this step will be much quicker.\r\n",
-        "\r\n",
-        "- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**\r\n",
-        "\r\n",
-        "- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the files in the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.\r\n",
-        "\r\n",
-        "- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.\r\n",
-        "\r\n",
-        "\r\n",
-        "You can check the progress of a running job in multiple ways. This workshop uses a Jupyter widget it is also possible to use the `wait_for_completion` method. \r\n",
-        "\r\n",
-        "### Jupyter widget\r\n",
-        "\r\n",
-        "Watch the progress of the run with a Jupyter widget.  Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from azureml.widgets import RunDetails\r\n",
-        "RunDetails(run).show()"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646662414044
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "By the way, if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run)."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "## View Experiment\n",
-        "In the left-hand menu in Azure Machine Learning Studio, select __Experiments__ and then select your experiment. An experiment is a grouping of many runs from a specified script or piece of code. Information for the run is stored under that experiment. If the name doesn't exist when you submit an experiment, if you select your run you will see various tabs containing metrics, logs, explanations, etc.\n",
-        "\n",
-        "## Register model\n",
-        "\n",
-        "The last step in the training script wrote the file `outputs/model.pth` in a directory named `outputs` in the VM of the cluster where the job is executed. `outputs` is a special directory in that all content in this  directory is automatically uploaded to your workspace.  This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace.\n",
-        "\n",
-        "You can see files associated with that run."
-      ],
-      "metadata": {}
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "print(run.get_file_names())"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "gather": {
-          "logged": 1646400935407
-        },
-        "jupyter": {
-          "outputs_hidden": false,
-          "source_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "# register model \r\n",
-        "model = run.register_model(model_name=\"workshop_training_name\", model_path='outputs/model.pth')\r\n",
-        "print(model.name, model.id, model.version, sep='\\t')"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646419117576
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "## Hyper parameter tuning\r\n",
-        "\r\n",
-        "In machine learning, models are trained to predict unknown labels for new data based on correlations between known labels and features found in the training data. Depending on the algorithm used, you may need to specify hyperparameters to configure how the model is trained. For example, the logistic regression algorithm uses a regularization rate hyperparameter to counteract overfitting; and deep learning techniques for convolutional neural networks (CNNs) use hyperparameters like learning rate to control how weights are adjusted during training, and batch size to determine how many data items are included in each training batch.\r\n",
-        "\r\n",
-        "The choice of hyperparameter values can significantly affect the resulting model, making it important to select the best possible values for your particular data and predictive performance goals.\r\n",
-        "\r\n",
-        "Hyperparameter tuning is accomplished by training the multiple models, using the same algorithm and training data but different hyperparameter values. The resulting model from each training run is then evaluated to determine the performance metric for which you want to optimize (for example, accuracy), and the best-performing model is selected.\r\n",
-        "\r\n",
-        "In Azure Machine Learning, you achieve this through an experiment that consists of a hyperdrive run, which initiates a child run for each hyperparameter combination to be tested. Each child run uses a training script with parameterized hyperparameter values to train a model, and logs the target performance metric achieved by the trained model."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "## Defining a search space\r\n",
-        "\r\n",
-        "The set of hyperparameter values tried during hyperparameter tuning is known as the search space. The definition of the range of possible values that can be chosen depends on the type of hyperparameter.\r\n",
-        "\r\n",
-        "### Discrete hyperparameters\r\n",
-        "\r\n",
-        "Some hyperparameters require discrete values - in other words, you must select the value from a particular set of possibilities. You can define a search space for a discrete parameter using a choice from a list of explicit values, which you can define as a Python list (choice([10,20,30])), a range (choice(range(1,10))), or an arbitrary set of comma-separated values (choice(30,50,100))\r\n",
-        "\r\n",
-        "### Continuous hyperparameters\r\n",
-        "\r\n",
-        "Some hyperparameters are continuous - in other words you can use any value along a scale. To define a search space for these kinds of value, you can use any of the following distribution types:\r\n",
-        "\r\n",
-        "- normal\r\n",
-        "- uniform\r\n",
-        "- lognormal\r\n",
-        "- loguniform\r\n",
-        "\r\n",
-        "### Defining a search space\r\n",
-        "\r\n",
-        "To define a search space for hyperparameter tuning, create a dictionary with the appropriate parameter expression for each named hyperparameter. For example, the following search space indicates that the learning rate hyperparameter can have the value 5e-5 or 4e-5. The learning_rate hyperparameter can also have any value from a normal distribution with a mean of 5e-5 and a standard deviation of 1e-5.\r\n",
-        "\r\n",
-        "\r\n"
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from azureml.train.hyperdrive import choice, normal\r\n",
-        "\r\n",
-        "param_space = {\r\n",
-        "                '--learning-rate': choice(5e-5, 4e-5)\r\n",
-        "                # '--learning_rate': normal(5e-5, 1e-5)\r\n",
-        "              }"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646299192371
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "## Configuring sampling\r\n",
-        "\r\n",
-        "The specific values used in a hyperparameter tuning run depend on the type of sampling used.\r\n",
-        "\r\n",
-        "### Grid sampling\r\n",
-        "\r\n",
-        "Grid sampling can only be employed when all hyperparameters are discrete, and is used to try every possible combination of parameters in the search space.\r\n",
-        "\r\n",
-        "### Random sampling\r\n",
-        "\r\n",
-        "Random sampling is used to randomly select a value for each hyperparameter, which can be a mix of discrete and continuous values.\r\n",
-        "\r\n",
-        "### Bayesian sampling\r\n",
-        "\r\n",
-        "Bayesian sampling chooses hyperparameter values based on the Bayesian optimization algorithm, which tries to select parameter combinations that will result in improved performance from the previous selection. \r\n",
-        "\r\n",
-        "### "
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from azureml.train.hyperdrive import GridParameterSampling\r\n",
-        "\r\n",
-        "param_sampling = GridParameterSampling(param_space)"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646299194623
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "## Configuring early termination\r\n",
-        "\r\n",
-        "With a sufficiently large hyperparameter search space, it could take many iterations (child runs) to try every possible combination. Typically, you set a maximum number of iterations, but this could still result in a large number of runs that don't result in a better model than a combination that has already been tried.\r\n",
-        "\r\n",
-        "To help prevent wasting time, you can set an early termination policy that abandons runs that are unlikely to produce a better result than previously completed runs. The policy is evaluated at an evaluation_interval you specify, based on each time the target performance metric is logged. You can also set a delay_evaluation parameter to avoid evaluating the policy until a minimum number of iterations have been completed.\r\n",
-        "\r\n",
-        "## Bandit policy\r\n",
-        "\r\n",
-        "You can use a bandit policy to stop a run if the target performance metric underperforms the best run so far by a specified margin.\r\n",
-        "\r\n",
-        "## Median stopping policy\r\n",
-        "\r\n",
-        "A median stopping policy abandons runs where the target performance metric is worse than the median of the running averages for all runs.\r\n",
-        "\r\n",
-        "## Truncation selection policy\r\n",
-        "\r\n",
-        "A truncation selection policy cancels the lowest performing X% of runs at each evaluation interval based on the truncation_percentage value you specify for X."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from azureml.train.hyperdrive import BanditPolicy\r\n",
-        "\r\n",
-        "early_termination_policy = BanditPolicy(slack_amount = 0.2,\r\n",
-        "                                        evaluation_interval=1,\r\n",
-        "                                        delay_evaluation=1)"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646299197383
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "This example applies the policy for every iteration after the first one, and abandons runs where the reported target metric is 0.2 or more worse than the best performing run after the same number of intervals.\r\n",
-        "\r\n",
-        "You can also apply a bandit policy using a slack factor, which compares the performance metric as a ratio rather than an absolute value."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "## Running hyperparameter tuning\r\n",
-        "\r\n",
-        "To run a hyperdrive experiment, you need to create a training script just the way you would do for any other training experiment, except that your script must:\r\n",
-        "\r\n",
-        "Include an argument for each hyperparameter you want to vary.\r\n",
-        "Log the target performance metric. This enables the hyperdrive run to evaluate the performance of the child runs it initiates, and identify the one that produces the best performing model.\r\n",
-        "\r\n",
-        "We will use the previous training script and use the 'valid score per epoch' as a tracking metric."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "from azureml.train.hyperdrive import HyperDriveConfig, PrimaryMetricGoal\r\n",
-        "\r\n",
-        "hyperdrive = HyperDriveConfig(run_config=src,\r\n",
-        "                              hyperparameter_sampling=param_sampling,\r\n",
-        "                              policy=early_termination_policy,\r\n",
-        "                              primary_metric_name='valid score per epoch',\r\n",
-        "                              primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,\r\n",
-        "                              max_total_runs=6,\r\n",
-        "                              max_concurrent_runs=4)\r\n",
-        "\r\n",
-        "experiment = Experiment(workspace = ws, name ='workshop_hyperparameter_tuning_name')\r\n",
-        "hyperdrive_run = experiment.submit(config=hyperdrive)\r\n",
-        "hyperdrive_run"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646299292772
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "To retrieve the best performing run, you can use the following code:\r\n",
-        "\r\n"
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "best_run = hyperdrive_run.get_best_run_by_primary_metric()"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646306372776
-        }
-      }
-    },
-    {
-      "cell_type": "markdown",
-      "source": [
-        "Now we can register the best performing model."
-      ],
-      "metadata": {
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        }
-      }
-    },
-    {
-      "cell_type": "code",
-      "source": [
-        "# register model \r\n",
-        "model = best_run.register_model(model_name=\"workshop_hyper_model_name\", model_path='outputs/model.pth')\r\n",
-        "print(model.name, model.id, model.version, sep='\\t')"
-      ],
-      "outputs": [],
-      "execution_count": null,
-      "metadata": {
-        "jupyter": {
-          "source_hidden": false,
-          "outputs_hidden": false
-        },
-        "nteract": {
-          "transient": {
-            "deleting": false
-          }
-        },
-        "gather": {
-          "logged": 1646306378265
-        }
-      }
-    }
-  ],
-  "metadata": {
-    "authors": [
-      {
-        "name": "cewidste"
-      }
-    ],
-    "kernelspec": {
-      "name": "workshop_env",
-      "language": "python",
-      "display_name": "workshop_env"
-    },
-    "language_info": {
-      "name": "python",
-      "version": "3.6.9",
-      "mimetype": "text/x-python",
-      "codemirror_mode": {
-        "name": "ipython",
-        "version": 3
-      },
-      "pygments_lexer": "ipython3",
-      "nbconvert_exporter": "python",
-      "file_extension": ".py"
-    },
-    "notice": "Copyright (c) Microsoft Corporation. All rights reserved. Licensed under the MIT License.",
-    "nteract": {
-      "version": "nteract-front-end@1.0.0"
-    },
-    "categories": [
-      "tutorials",
-      "compute-instance-quickstarts"
-    ],
-    "kernel_info": {
-      "name": "workshop_env"
-    },
-    "microsoft": {
-      "host": {
-        "AzureML": {
-          "notebookHasBeenCompleted": true
-        }
-      }
-    }
-  },
-  "nbformat": 4,
-  "nbformat_minor": 4
-}
-%% Cell type:markdown id: tags:
-![Impressions](https://PixelServer20190423114238.azurewebsites.net/api/impressions/NotebookVM/tutorials/quickstart-ci/AzureMLin10mins.png)
-%% Cell type:markdown id: tags:
-# Setup
-It is important to maintain a conda dependency file and/or MLstudio environment.
-Every user of the workspace will use their own compute instance, with conda files and environments it is easy to install dependencies on these different compute instances.
-%% Cell type:code id: tags:
-``` python
-!conda env update -n workshop_env --file conda-notebook.yml
-!conda activate workshop_env
-!python -m ipykernel install --user --name=workshop_env --display-name=workshop_env
-```
-%% Cell type:markdown id: tags:
-Refresh page and change kernel to workshop_env.
-%% Cell type:markdown id: tags:
-Connect to the workspace for easier Azure commands.
-%% Cell type:code id: tags:
-``` python
-from azureml.core import Workspace
-ws = Workspace.from_config()
-print(f'WS name: {ws.name}\nRegion: {ws.location}\nSubscription id: {ws.subscription_id}\nResource group: {ws.resource_group}')
-```
-%% Cell type:markdown id: tags:
-## Create training environment
-We will use the environment created in **ETL.ipynb**
-%% Cell type:code id: tags:
-``` python
-from azureml.core import Environment
-new_update_env = False
-env_name='workshop-training-env'
-# pathing in notebook folder
-conda_path='conda-training.yml'
-if new_update_env:
-    # create new environment
-    env = Environment.from_conda_specification(name=env_name, file_path=conda_path)
-    env.register(workspace=ws)
-    # We can directly build the environment - this will create a new Docker
-    # image in Azure Container Registry (ACR), and directly 'bake in' our dependencies
-    # from the conda definition. When we later use the Environment, all AML will need to
-    # do is pull the image for environment, thus saving the time for potentially a
-    # long-running conda environment creation.
-    build = env.build(workspace=ws)
-    build.wait_for_completion(show_output=True)
-else:
-    # load existing environment
-    env = Environment.get(workspace=ws, name=env_name)
-```
-%% Cell type:markdown id: tags:
-## Create experiment
-Create an experiment to track the runs in your notebook. A workspace can have muliple experiments. We will create an experiment specifically for training our model.
-%% Cell type:code id: tags:
-``` python
-from azureml.core import Experiment
-experiment_name = 'train_model_name'
-exp = Experiment(workspace=ws, name=experiment_name)
-```
-%% Cell type:markdown id: tags:
-### Attach existing compute resource
-By using Azure Machine Learning Compute, a managed service, data scientists can train machine learning models on clusters of Azure virtual machines. Examples include VMs with GPU or CPU support.
-%% Cell type:code id: tags:
-``` python
-# list compute targets
-print(ws.compute_targets.keys())
-```
-%% Cell type:markdown id: tags:
-We will use our cluster. It is better to keep the training compute (which probably has better specs) seperate from the notebook compute. This ensure a lower cost (only use heavy compute in the place where it is needed) and a central compute instance for every user of the workspace.
-%% Cell type:code id: tags:
-``` python
-from azureml.core.compute import AmlCompute
-from azureml.core.compute import ComputeTarget
-import os
-# choose compute target. Look at compute tab -> clusters for options OR look at list in cell above.
-compute_name = "cpu-cluster"
-if compute_name in ws.compute_targets:
-    compute_target = ws.compute_targets[compute_name]
-    print("found compute target: " + compute_name)
-else:
-    print("Compute not found, create compute in compute tab (cluster) with subnet in advanced settings if working in production subscription.")
-```
-%% Cell type:markdown id: tags:
-## Import Data
-Before you train a model, you need to understand the data you're using to train it. In this section we will:
-* Load datasets created in the ETL.ipynb notebook
-* Display some sample images
-Lets connect to dataset by mounting on compute. It is also possible to download on compute.
-%% Cell type:code id: tags:
-``` python
-from azureml.core import Dataset
-# get dataset by name
-image_dataset = Dataset.get_by_name(ws, "img_files_name")
-labels_dataset = Dataset.get_by_name(ws, "labels_name")
-# mount datasets on compute
-image_mount = image_dataset.mount()
-image_mount.start()
-image_mount_folder = image_mount.mount_point
-# load dataset as pandas frame
-labels_pandas = labels_dataset.to_pandas_dataframe()
-```
-%% Cell type:markdown id: tags:
-### Take a look at the data
-Lets look at the pandas dataframe with labels.
-%% Cell type:code id: tags:
-``` python
-labels_pandas.head()
-```
-%% Cell type:markdown id: tags:
-Load the images files then use `matplotlib` to plot 3 random images from the dataset with their labels above them.
-%% Cell type:code id: tags:
-``` python
-import matplotlib.pyplot as plt
-import numpy as np
-import glob
-from skimage import io
-train = glob.glob(image_mount_folder + "/*")
-# now let's show some images from the traininng set.
-count = 0
-sample_size = 3
-plt.figure(figsize=(20, 4))
-for name in train[:sample_size]:
-    count = count + 1
-    plt.subplot(1, sample_size, count)
-    plt.axhline("")
-    plt.axvline("")
-    # get label from filename
-    image = name.split("/")[-1].split(".")[0]
-    # get label from pandas frame
-    label_row = labels_pandas.loc[labels_pandas['image_id'] == image]
-    label = label_row.columns[(label_row == 1).iloc[0]][0]
-    # plot with text
-    plt.text(0, 0, label, horizontalalignment="left", verticalalignment="top", fontsize=18, backgroundcolor='white')
-    image = io.imread(name)
-    plt.imshow(image)
-plt.show()
-```
-%% Cell type:code id: tags:
-``` python
-# stop mount point
-image_mount.stop()
-```
-%% Cell type:markdown id: tags:
-## Train on a remote cluster
-For this task, you submit the job to run on the remote training cluster you set up earlier.  To submit a job you:
-* Create a directory
-* Create a training script
-* Create a script run configuration
-* Submit the job
-### Create a directory
-Create a directory to deliver the necessary code from your computer to the remote resource.
-%% Cell type:code id: tags:
-``` python
-import os
-script_folder = os.path.join(os.getcwd(), "scripts")
-os.makedirs(script_folder, exist_ok=True)
-```
-%% Cell type:markdown id: tags:
-### Create a training script
-To submit the job to the cluster, first create a training script. Run the following code to create the training script called `train.py` in the directory you just created.
-%% Cell type:code id: tags:
-``` python
-%%writefile $script_folder/train.py
-import os
-import argparse
-import joblib
-from azureml.core import Run
-from azureml.core import Dataset as DatasetAzure
-import torch
-from torch.utils.data import Dataset, DataLoader
-import torch.nn as nn
-import torch.nn.functional as F
-import torch.optim as optim
-import torchvision
-import cv2
-import albumentations as A
-from albumentations.pytorch import ToTensorV2
-from sklearn.model_selection import train_test_split
-from sklearn.metrics import roc_auc_score, confusion_matrix
-import numpy as np
-import seaborn as sns
-# model dataset
-class PlantDataset(Dataset):
-    def __init__(self, df, mount_folder, transforms=None):
-        self.df = df
-        self.mount_folder = mount_folder
-        self.transforms = transforms
-    def __len__(self):
-        return self.df.shape[0]
-    def __getitem__(self, idx):
-        image_src = self.mount_folder + '/' + self.df.loc[idx, 'image_id'] + '.jpg'
-        image = cv2.imread(image_src, cv2.IMREAD_COLOR)
-        image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
-        labels = self.df.loc[idx, ['healthy', 'multiple_diseases', 'rust', 'scab']].values
-        labels = torch.from_numpy(labels.astype(np.int8))
-        labels = labels.unsqueeze(-1)
-        if self.transforms:
-            transformed = self.transforms(image=image)
-            image = transformed['image']
-        return image, labels
-# custom model class
-class PlantModel(nn.Module):
-    def __init__(self, num_classes=4):
-        super().__init__()
-        self.backbone = torchvision.models.resnet18(pretrained=True)
-        in_features = self.backbone.fc.in_features
-        self.logit = nn.Linear(in_features, num_classes)
-    def forward(self, x):
-        batch_size, C, H, W = x.shape
-        x = self.backbone.conv1(x)
-        x = self.backbone.bn1(x)
-        x = self.backbone.relu(x)
-        x = self.backbone.maxpool(x)
-        x = self.backbone.layer1(x)
-        x = self.backbone.layer2(x)
-        x = self.backbone.layer3(x)
-        x = self.backbone.layer4(x)
-        x = F.adaptive_avg_pool2d(x,1).reshape(batch_size,-1)
-        x = F.dropout(x, 0.25, self.training)
-        x = self.logit(x)
-        return x
-# custom cross entropy class
-class DenseCrossEntropy(nn.Module):
-    def __init__(self):
-        super(DenseCrossEntropy, self).__init__()
-    def forward(self, logits, labels):
-        logits = logits.float()
-        labels = labels.float()
-        logprobs = F.log_softmax(logits, dim=-1)
-        loss = -labels * logprobs
-        loss = loss.sum(-1)
-        return loss.mean()
-# function for collecting input arguments
-def get_runtime_args():
-    parser = argparse.ArgumentParser()
-    parser.add_argument('--image-folder', type=str)
-    parser.add_argument('--labels', type=str)
-    parser.add_argument('--size', type=int)
-    parser.add_argument('--split', type=float)
-    parser.add_argument('--batch-size', type=int)
-    parser.add_argument('--num-workers', type=int)
-    parser.add_argument('--num-classes', type=int)
-    parser.add_argument('--learning-rate', type=float)
-    parser.add_argument('--epochs', type=int)
-    args = parser.parse_args()
-    return args
-# We define our main class here
-def main():
-    args = get_runtime_args()
-    # A run represents a single trial of an experiment. Runs are used to monitor the asynchronous execution of a trial,
-    # log metrics and store output of the trial, and to analyze results and access artifacts generated by the trial.
-    run = Run.get_context()
-    # define a training image transformer
-    transforms_train = A.Compose([
-        A.RandomResizedCrop(height=args.size, width=args.size, p=1.0),
-        A.Flip(),
-        A.ShiftScaleRotate(rotate_limit=1.0, p=0.8),
-        # Pixels
-        A.OneOf([
-            A.IAAEmboss(p=1.0),
-            A.IAASharpen(p=1.0),
-            A.Blur(p=1.0),
-        ], p=0.5),
-    # Affine
-        A.OneOf([
-            A.ElasticTransform(p=1.0),
-            A.IAAPiecewiseAffine(p=1.0)
-        ], p=0.5),
-    A.Normalize(p=1.0),
-    ToTensorV2(p=1.0),
-    ])
-    # define a validation image transformer
-    transforms_valid = A.Compose([
-        A.Resize(height=args.size, width=args.size, p=1.0),
-        A.Normalize(p=1.0),
-        ToTensorV2(p=1.0),
-    ])
-    # get labels input dataset by id
-    ws = run.experiment.workspace
-    label_dataset = DatasetAzure.get_by_id(ws, id=args.labels)
-    # get image mount folder
-    image_mount_folder = args.image_folder
-    # convert label dataset to pandas for ease of use
-    labels_pandas = label_dataset.to_pandas_dataframe()
-    # split dataset in train and validation
-    train, valid = train_test_split(labels_pandas, test_size=args.split)
-    # reset indexes
-    train = train.reset_index(drop=True)
-    valid = valid.reset_index(drop=True)
-    # get Datasets
-    dataset_train = PlantDataset(df=train, mount_folder=image_mount_folder, transforms=transforms_train)
-    dataset_valid = PlantDataset(df=valid, mount_folder=image_mount_folder, transforms=transforms_valid)
-    # get datasets in dataloaders
-    dataloader_train = DataLoader(dataset_train, batch_size=args.batch_size, num_workers=args.num_workers, shuffle=True)
-    dataloader_valid = DataLoader(dataset_valid, batch_size=args.batch_size, num_workers=args.num_workers, shuffle=False)
-    # load model
-    model = PlantModel(num_classes=args.num_classes)
-    # set parameters for model training
-    criterion = DenseCrossEntropy()
-    plist = [{'params': model.parameters(), 'lr': args.learning_rate}]
-    optimizer = optim.Adam(plist)
-    # start training
-    train_loss = []
-    valid_loss = []
-    valid_score = []
-    # epoch loop
-    for epoch in range(args.epochs):
-        print('  Epoch {}/{}'.format(epoch + 1, args.epochs))
-        print('  ' + ('-' * 20))
-        model.train()
-        tr_loss = 0
-        for step, batch in enumerate(dataloader_train):
-            images = batch[0]
-            labels = batch[1]
-            outputs = model(images)
-            loss = criterion(outputs, labels.squeeze(-1))
-            loss.backward()
-            tr_loss += loss.item()
-            optimizer.step()
-            optimizer.zero_grad()
-        # Validate
-        model.eval()
-        val_loss = 0
-        val_preds = None
-        val_labels = None
-        for step, batch in enumerate(dataloader_valid):
-            images = batch[0]
-            labels = batch[1]
-            if val_labels is None:
-                val_labels = labels.clone().squeeze(-1)
-            else:
-                val_labels = torch.cat((val_labels, labels.squeeze(-1)), dim=0)
-            with torch.no_grad():
-                outputs = model(images)
-                loss = criterion(outputs, labels.squeeze(-1))
-                val_loss += loss.item()
-                preds = torch.softmax(outputs, dim=1).data.cpu()
-                if val_preds is None:
-                    val_preds = preds
-                else:
-                    val_preds = torch.cat((val_preds, preds), dim=0)
-        # update metrics
-        train_loss.append(tr_loss / len(dataloader_train))
-        valid_loss.append(val_loss / len(dataloader_valid))
-        valid_score.append(roc_auc_score(val_labels, val_preds, average='macro'))
-    # Create confusion matrix with last epoch
-    val_labels=np.argmax(val_labels, axis=1)
-    val_preds=np.argmax(val_preds, axis=1)
-    cf_matrix = confusion_matrix(val_labels, val_preds)
-    # make plot of matrix
-    ax = sns.heatmap(cf_matrix, annot=True, cmap='Blues')
-    ax.set_title('Seaborn Confusion Matrix with labels\n\n');
-    ax.set_xlabel('\nPredicted Values')
-    ax.set_ylabel('Actual Values ');
-    ## Ticket labels - List must be in alphabetical order
-    ax.xaxis.set_ticklabels(['healthy','multiple_diseases','rust','scab'])
-    ax.yaxis.set_ticklabels(['healthy','multiple_diseases','rust','scab'])
-    fig = ax.get_figure()
-    # log results to ml studio
-    run.log_list(name='train loss per epoch', value=train_loss)
-    run.log_list(name='valid loss per epoch', value=valid_loss)
-    run.log_list(name='valid score per epoch', value=valid_score)
-    run.log_image(name='confusion matrix last epoch', plot=fig)
-    #copying to "outputs" directory, automatically uploads it to Azure ML
-    output_dir = './outputs/'
-    os.makedirs(output_dir, exist_ok=True)
-    torch.save(model.state_dict(), os.path.join(output_dir, 'model.pth'))
-if __name__ == "__main__":
-    main()
-```
-%% Cell type:markdown id: tags:
-Notice how the script gets data and saves models:
-+ The training script reads input arguments:
-    - parser.add_argument(..)
-+ The training script saves your model state dictionary into a directory named outputs. <br/>
-`torch.save(model.state_dict(), os.path.join(output_dir, 'model.pth'))`<br/>
-Anything written in this directory is automatically uploaded into your workspace. You'll access your model from this directory later in the tutorial.
-%% Cell type:markdown id: tags:
-### Configure the training job
-Create a **ScriptRunConfig** object to specify the configuration details of your training job, including your training script, environment to use, and the compute target to run on. Configure the **ScriptRunConfig** by specifying:
-* The directory that contains your scripts. All the files in this directory are uploaded into the cluster nodes for execution.
-* The compute target.  In this case you will use the "cpu-cluster"
-* The training script name, train.py
-* An environment that contains the libraries needed to run the script
-* Arguments required from the training script.
-%% Cell type:code id: tags:
-``` python
-from azureml.core import ScriptRunConfig
-args = ['--image-folder', image_dataset.as_mount(), # it is also possible to download image dataset on compute (as_download(), because mounting load files at the time of processing, it is usually faster than download.)
-        '--labels', labels_dataset.as_named_input('labels_name'),
-        '--size', 512,
-        '--split', 0.2,
-        '--batch-size', 4,
-        '--epochs', 3,
-        '--num-workers', 0,
-        '--num-classes', 4,
-        '--learning-rate', 5e-5]
-src = ScriptRunConfig(source_directory="./",
-                      script='scripts/train.py',
-                      arguments=args,
-                      compute_target=compute_target,
-                      environment=env)
-```
-%% Cell type:markdown id: tags:
-### Submit the job to the cluster
-Run the experiment by submitting the ScriptRunConfig object. And you can navigate to Azure portal to monitor the run.
-Since the call is asynchronous, it returns a **Preparing** or **Running** state as soon as the job is started.
-%% Cell type:code id: tags:
-``` python
-run = exp.submit(config=src)
-run
-```
-%% Cell type:markdown id: tags:
-## Monitor a remote run
-Here is what's happening while you wait:
- **Image creation**: A Docker image is created matching the Python environment specified by the Azure ML environment. The image is built and stored in the ACR (Azure Container Registry) associated with your workspace. Image creation and uploading takes **about 5 minutes**.
-  This stage happens once for each Python environment since the container is cached for subsequent runs.  During image creation, logs are streamed to the run history. You can monitor the image creation progress using these logs. If you prebuild the image this step will be much quicker.
- **Scaling**: If the remote cluster requires more nodes to execute the run than currently available, additional nodes are added automatically. Scaling typically takes **about 5 minutes.**
- **Running**: In this stage, the necessary scripts and files are sent to the compute target, then data stores are mounted/copied, then the entry_script is run. While the job is running, stdout and the files in the ./logs directory are streamed to the run history. You can monitor the run's progress using these logs.
- **Post-Processing**: The ./outputs directory of the run is copied over to the run history in your workspace so you can access these results.
-You can check the progress of a running job in multiple ways. This workshop uses a Jupyter widget it is also possible to use the `wait_for_completion` method.
-### Jupyter widget
-Watch the progress of the run with a Jupyter widget.  Like the run submission, the widget is asynchronous and provides live updates every 10-15 seconds until the job completes.
-%% Cell type:code id: tags:
-``` python
-from azureml.widgets import RunDetails
-RunDetails(run).show()
-```
-%% Cell type:markdown id: tags:
-By the way, if you need to cancel a run, you can follow [these instructions](https://aka.ms/aml-docs-cancel-run).
-%% Cell type:markdown id: tags:
-## View Experiment
-In the left-hand menu in Azure Machine Learning Studio, select __Experiments__ and then select your experiment. An experiment is a grouping of many runs from a specified script or piece of code. Information for the run is stored under that experiment. If the name doesn't exist when you submit an experiment, if you select your run you will see various tabs containing metrics, logs, explanations, etc.
-## Register model
-The last step in the training script wrote the file `outputs/model.pth` in a directory named `outputs` in the VM of the cluster where the job is executed. `outputs` is a special directory in that all content in this  directory is automatically uploaded to your workspace.  This content appears in the run record in the experiment under your workspace. Hence, the model file is now also available in your workspace.
-You can see files associated with that run.
-%% Cell type:code id: tags:
-``` python
-print(run.get_file_names())
-```
-%% Cell type:markdown id: tags:
-Register the model in the workspace so that you (or other collaborators) can later query, examine, and deploy this model.
-%% Cell type:code id: tags:
-``` python
-# register model
-model = run.register_model(model_name="workshop_training_name", model_path='outputs/model.pth')
-print(model.name, model.id, model.version, sep='\t')
-```
-%% Cell type:markdown id: tags:
-## Hyper parameter tuning
-In machine learning, models are trained to predict unknown labels for new data based on correlations between known labels and features found in the training data. Depending on the algorithm used, you may need to specify hyperparameters to configure how the model is trained. For example, the logistic regression algorithm uses a regularization rate hyperparameter to counteract overfitting; and deep learning techniques for convolutional neural networks (CNNs) use hyperparameters like learning rate to control how weights are adjusted during training, and batch size to determine how many data items are included in each training batch.
-The choice of hyperparameter values can significantly affect the resulting model, making it important to select the best possible values for your particular data and predictive performance goals.
-Hyperparameter tuning is accomplished by training the multiple models, using the same algorithm and training data but different hyperparameter values. The resulting model from each training run is then evaluated to determine the performance metric for which you want to optimize (for example, accuracy), and the best-performing model is selected.
-In Azure Machine Learning, you achieve this through an experiment that consists of a hyperdrive run, which initiates a child run for each hyperparameter combination to be tested. Each child run uses a training script with parameterized hyperparameter values to train a model, and logs the target performance metric achieved by the trained model.
-%% Cell type:markdown id: tags:
-## Defining a search space
-The set of hyperparameter values tried during hyperparameter tuning is known as the search space. The definition of the range of possible values that can be chosen depends on the type of hyperparameter.
-### Discrete hyperparameters
-Some hyperparameters require discrete values - in other words, you must select the value from a particular set of possibilities. You can define a search space for a discrete parameter using a choice from a list of explicit values, which you can define as a Python list (choice([10,20,30])), a range (choice(range(1,10))), or an arbitrary set of comma-separated values (choice(30,50,100))
-### Continuous hyperparameters
-Some hyperparameters are continuous - in other words you can use any value along a scale. To define a search space for these kinds of value, you can use any of the following distribution types:
- normal
- uniform
- lognormal
- loguniform
-### Defining a search space
-To define a search space for hyperparameter tuning, create a dictionary with the appropriate parameter expression for each named hyperparameter. For example, the following search space indicates that the learning rate hyperparameter can have the value 5e-5 or 4e-5. The learning_rate hyperparameter can also have any value from a normal distribution with a mean of 5e-5 and a standard deviation of 1e-5.
-%% Cell type:code id: tags:
-``` python
-from azureml.train.hyperdrive import choice, normal
-param_space = {
-                '--learning-rate': choice(5e-5, 4e-5)
-                # '--learning_rate': normal(5e-5, 1e-5)
-              }
-```
-%% Cell type:markdown id: tags:
-## Configuring sampling
-The specific values used in a hyperparameter tuning run depend on the type of sampling used.
-### Grid sampling
-Grid sampling can only be employed when all hyperparameters are discrete, and is used to try every possible combination of parameters in the search space.
-### Random sampling
-Random sampling is used to randomly select a value for each hyperparameter, which can be a mix of discrete and continuous values.
-### Bayesian sampling
-Bayesian sampling chooses hyperparameter values based on the Bayesian optimization algorithm, which tries to select parameter combinations that will result in improved performance from the previous selection.
-###
-%% Cell type:code id: tags:
-``` python
-from azureml.train.hyperdrive import GridParameterSampling
-param_sampling = GridParameterSampling(param_space)
-```
-%% Cell type:markdown id: tags:
-## Configuring early termination
-With a sufficiently large hyperparameter search space, it could take many iterations (child runs) to try every possible combination. Typically, you set a maximum number of iterations, but this could still result in a large number of runs that don't result in a better model than a combination that has already been tried.
-To help prevent wasting time, you can set an early termination policy that abandons runs that are unlikely to produce a better result than previously completed runs. The policy is evaluated at an evaluation_interval you specify, based on each time the target performance metric is logged. You can also set a delay_evaluation parameter to avoid evaluating the policy until a minimum number of iterations have been completed.
-## Bandit policy
-You can use a bandit policy to stop a run if the target performance metric underperforms the best run so far by a specified margin.
-## Median stopping policy
-A median stopping policy abandons runs where the target performance metric is worse than the median of the running averages for all runs.
-## Truncation selection policy
-A truncation selection policy cancels the lowest performing X% of runs at each evaluation interval based on the truncation_percentage value you specify for X.
-%% Cell type:code id: tags:
-``` python
-from azureml.train.hyperdrive import BanditPolicy
-early_termination_policy = BanditPolicy(slack_amount = 0.2,
-                                        evaluation_interval=1,
-                                        delay_evaluation=1)
-```
-%% Cell type:markdown id: tags:
-This example applies the policy for every iteration after the first one, and abandons runs where the reported target metric is 0.2 or more worse than the best performing run after the same number of intervals.
-You can also apply a bandit policy using a slack factor, which compares the performance metric as a ratio rather than an absolute value.
-%% Cell type:markdown id: tags:
-## Running hyperparameter tuning
-To run a hyperdrive experiment, you need to create a training script just the way you would do for any other training experiment, except that your script must:
-Include an argument for each hyperparameter you want to vary.
-Log the target performance metric. This enables the hyperdrive run to evaluate the performance of the child runs it initiates, and identify the one that produces the best performing model.
-We will use the previous training script and use the 'valid score per epoch' as a tracking metric.
-%% Cell type:code id: tags:
-``` python
-from azureml.train.hyperdrive import HyperDriveConfig, PrimaryMetricGoal
-hyperdrive = HyperDriveConfig(run_config=src,
-                              hyperparameter_sampling=param_sampling,
-                              policy=early_termination_policy,
-                              primary_metric_name='valid score per epoch',
-                              primary_metric_goal=PrimaryMetricGoal.MAXIMIZE,
-                              max_total_runs=6,
-                              max_concurrent_runs=4)
-experiment = Experiment(workspace = ws, name ='workshop_hyperparameter_tuning_name')
-hyperdrive_run = experiment.submit(config=hyperdrive)
-hyperdrive_run
-```
-%% Cell type:markdown id: tags:
-To retrieve the best performing run, you can use the following code:
-%% Cell type:code id: tags:
-``` python
-best_run = hyperdrive_run.get_best_run_by_primary_metric()
-```
-%% Cell type:markdown id: tags:
-Now we can register the best performing model.
-%% Cell type:code id: tags:
-``` python
-# register model
-model = best_run.register_model(model_name="workshop_hyper_model_name", model_path='outputs/model.pth')
-print(model.name, model.id, model.version, sep='\t')
-```