Experiment tracking is the process of recording all the important components such as hyper parameters, metrics, models and artifacts like plots PNG images, files etc. Experiment tracking helps to reproduce the old results by using the stored parameters. Under one experiment different runs can be created and by changing the parameters value we can evaluate model performance. And easily do the model performance comparison and finalize the optimal model for production. MLFlow is the widely used tool for experiment tracking across organizations.
To explain, how experiment tracking works and how to implement it using python, I have created a jupyter notebook. [mlflow live demo]
- Create virtual environment
- pip install notebook
- pip install numpy
- pip install scikit-learn
- pip install matplotlib
- pip install mlflow
- Train a Basic Machine Learning classifier using Logistic regression
- Create experiment with basic classifier and records metrics, parameters and model
- Fine tune the model using hyper parameter tuning random search CV method and log hyperparameter runs
- Load model from registry
- Deploy model from registry locally
There is also a python file: log_simple_experiment.py that can be used to test MLflow logging by running scripts from the command line.
Execute the following steps from command line:
- set MLFLOW_TRACKING_URI=https://mlflow-rits.containers.wurnet.nl (use export ... if in linux)
- Generate token via https://sso.containers.wurnet.nl/token
- set MLFLOW_TRACKING_TOKEN=Generated-Token (use export ... if in linux)
- python log_simple_experiment.py