Skip to content
Snippets Groups Projects

ClearML demo

introduction

The ClearML server is setup in the following way:

image

  • Open source ClearML server (https://github.com/allegroai/clearml-server) is deployed on in a kubernetes namespace on premise and maintained by FB-IT.
  • The API and webserver are deployed here.
  • An external S3 storage is used as the file storage and given to project group. This gives the project group ownership of the data storage.

At this point you should have:

  • A URL to your ClearML webapp
  • Login credentials to webapp
  • Endpoint, access key and secret key to S3 storage

Setup S3 storage interface

S3 interface application [Windows/mac only]

You can use any s3 interface, however we will show how to setup CloudBerry as a example:

  • Go to https://www.msp360.com/explorer/ for windows or macOS users
  • Download relevant installer
  • Install installer
  • Start CloudBerry explorer
  • Click File -> Add New Account

image

  • Select S3 Compatible

image

  • Put in Display name: name of your choice
  • Put in Server: the endpoint with https://
  • Put in Access Key ID: the s3 access key
  • Put in Secret Access Key: the s3 secret key
  • Click Test connection
  • Continue on successful test

image

  • Select your s3 account via drop down menu at source
  • You can now drag files/folders to s3 storage or move files from one source to s3 via interface.

image

Setup ClearML client

NOTE: It is advised to create a new folder in the s3 storage for file storage from the AI platform. This will keep the root folder of the s3 storage cleaner.

  • pip install -r requirements.txt

  • Login to webapp (for example: https://app.clearml.containers.wurnet.nl)

  • Create workspace credentials

    • Settings -> Workspace -> Create new credentials (optional add label)
    • copy api credentials (access_key and secret_key) for later use
  • copy clearml.conf in your home directory

    • Linux: ~/clearml.conf
    • Mac: $HOME/clearml.conf
    • Windows: \User\username\clearml.conf
  • Put in your project specific variables in the clearml.conf inside your home directory:

    • api {
        api_server: API_SERVER
        web_server: WEB_SERVER
        files_server: "FILE_SERVER"
        # input your api credentials generated in webapp here
        credentials {"access_key": "ACCESSKEY", "secret_key": "SECRETKEY"}
      }
  • There is no fileserver provided by FB-IT with the clearml workspace (this would limit scalability and control for researchers). We need to connect the s3 storage to the clearml workspace. Lookup clearml.conf again in your home folder and change to your provided credentials:

    • aws {
        s3 {
         # S3 credentials, used for read/write access by various SDK elements
         # The following settings will be used for any bucket not specified below in the "credentials" section
         # --------------------------------------------------------------
         # Specify explicit keys
         key: "s3_access_key_here"
         secret: "s3_secret_key_here"
         # --------------------------------------------------------------
         credentials: [
         ]
        }
      }

Setup ClearML Agent

You can deploy agents on your varying compute resources as long as it has access to a command line interface. (computer, laptop, server, etc...)

  1. pip install clearml-agent

NOTE: if run from same device as client, update clearml.conf in the user folder on your computer with agent code and skip to step 6. See here for guidence.

  1. Login to webapp (for example: https://app.clearml.containers.wurnet.nl)
  2. Create workspace credentials
  • Settings -> Workspace -> Create new credentials (optional add label)
  • copy api credentials (access_key and secret_key) for later use
  1. copy clearml-agent.conf to the home directory of your agent device and rename to clearml.conf
  • Linux: ~/clearml.conf
  • Mac: $HOME/clearml.conf
  • Windows: \User\username\clearml.conf
  1. Set (windows) or export (linux) some variables in the terminal. This needs to be done every time the terminal gets reopend. We do this in order to keep secrets safe.
    • export GIT_USER=git username
    • export GIT_PASS=git access token
      • NOTE: You can also use GIT deploy tokens for user and password. Access and deploy tokens need read repository rights.
    • export S3_ACCESS_KEY=s3 access key
    • export S3_SECRET_KEY=s3 secret key
  2. deploy agent and let it listen to a queue (for example: test). The following code also creates that queue and service mode makes sure that the agent can run multiple different jobs (needed for pipelines). See here for more options. We also detach to let the agent run in the background.
  • clearml-agent daemon --queue test --create-queue --detach
    • To deploy the ClearML with a GPU allocated add the --gpus argument
      • Install CUDA on agent device https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
      • install pytorch with cuda support on client device (specify pytorch version to cuda on agent device)
        • You can specify the pytorch wheel url on agent device needed for pip install via variable in clearml.conf on agent device: agents.extra_index_url
      • use cuda as torch device in script
    • To stop the agent running in the background: clearml-agent daemon --stop
  1. [optional] The terminal in Linux can be locked so nobody can read the environment variables.
    • sudo screen
    • Press ctrl+a+x and input password
    • To reopen terminal, input password
    • input: exit, to exit screen

demo scripts

Example task run

The task run can be used for running a single script. It runs locally by default, but if the remote_queue argument is input, the first epoch will run locally, but following epochs will run by the agent listening to the remote queue.

  • Run task_run.py
    • You can change arguments inside the script
    • change remote_queue argument to run script on agent that is listening to queue.

Example pipeline

  • Run stage one script via: python stage_one.py
    • You are able to change arguments inside the script
  • Run stage two via: python stage_two.py
    • You are able to change arguments inside the script
  • This will create drafts of the scripts in the Clearml workspace
  • Now run controller.py to create and run a pipeline that uses the drafts. Run via: python controller.py
    • You can change the pipe.start command at the end to use remote agents

Add pipeline trigger

There is a example for a dataset trigger in trigger.py (more triggers uses can be found at https://clear.ml/docs/latest/docs/references/sdk/trigger). This triggers when a mutation happens to the registered dataset.

  • Change the trigger.py file to your specific pipeline and dataset
  • run trigger.py

Troubleshoot

If there are any trouble or if you have any feedback/feature request, please contact us at:

mdtresearchitsolutions@wur.nl