Understanding DenseNet architecture with PyTorch code
2022-07-25
What is Reservoir Sampling in Stream Processing?
2022-07-30
Show all

A comprehensive tutorial on MLflow for MLOps: From experimentation to production

39 mins read

After reading this post you will be able to:

  • Understand how you and your Data Science teams can improve your MLOps practices using MLflow
  • Use all the Components of MLflow (Tracking, Projects, Models, Registry)
  • Use MLflow in an Anaconda Environment
  • Use MLflow in a Docker Environment (Including running an IDE inside of a container)
  • Use Postgres Backend Store and MinIO Artifact Store for Easy Collaboration

The instructions and demos below assume you are using a Linux operating system. Other operating systems can be used with minor modifications.

What is MLflow and Why Should You Use It?

Basic Concepts

MLflow is an MLOps tool that can be used to increase the efficiency of machine learning experimentation and productionalization. MLflow is organized into four components (Tracking, Projects, Models, and Registry). You can use each of these components on their own but they are designed to work well together. MLflow is designed to work with any machine learning library, determine most things about your code by convention, and require minimal changes to integrate into an existing codebase. It aims to take any codebase written in its format and make it reproducible and reusable by multiple data scientists. MLflow lets you train, reuse, and deploy models with any library and package them into reproducible steps that other data scientists can use as a “black box”, without even having to know which library you are using.

Productivity Challenges in Machine Learning

It is difficult to keep track of experiments

If you are just working with a script or notebook, how do you tell which data, code, and parameters went into getting a particular model result?

It is difficult to reproduce code

Even if you have meticulously tracked the code versions and parameters, you need to capture the whole environment (e.g. library dependencies) to get the same result. This is especially challenging if you want another data scientist to use your code, or if you want to run the same code at scale on another platform (e.g. in the cloud).

There’s no standard way to package and deploy models

Every data science team comes up with its own approach for each ML library it uses and the link between a model and the code and parameters that produced it is often lost.

There is no central store to manage models (their version and stage transitions)

A data science team creates many models. In the absence of a central place to collaborate and manage the model lifecycle, data science teams face challenges in how they manage models and stages.

MLflow Components

MLflow Tracking

This is an API and UI for logging parameters, code versions, metrics, and artifacts when running your machine learning code and later for visualizing results. You can use MLflow Tracking in any environment (e.g. script or notebook) to log results to local files or to a server, then compare multiple runs. Teams can use MLflow tracking to compare results from different users.

MLflow Projects

MLflow Projects are a standard format for packaging reusable data science code. Each project is simply a directory with code and uses a descriptor file to specify its dependencies and how to run the code. For example, a project can contain a conda.yaml for specifying a Python Anaconda environment.

MLflow Models

MLflow Models offer a convention for packaging machine learning models in multiple flavors, and a variety of tools to help deploy them. Each model is saved as a directory containing arbitrary files and a descriptor file that lists several “flavors” the model can be used in. For example, a Tensorflow model can be loaded as a TensorFlow DAG, or as a python function to apply to input data.

MLflow Registry

MLflow Registry offers a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, stage transitions (for example from staging to production or archiving), and annotations.

Scalability and Big Data

An individual MLflow run can execute on a distributed cluster. You can launch runs on the distributed infrastructure of your choice and report results to a tracking server to compare them.

MLflow supports launching multiple runs in parallel with different parameters, for example for hyperparameter tuning. You can use the Projects API to start multiple runs and the tracking API to track them.

MLflow Projects can take input from, and write output to, distributed storage systems such as AWS S3. This means that you can write projects that build large datasets, such as featurizing a 100TB file.

MLflow Model Registry offers large organizations a central hub to collaboratively manage a complete model lifecycle. Many data science teams within an organization develop hundreds of models, each model with its experiments, runs, versions, artifacts, and stage transitions.

Example Use Cases

Individual Data Scientists

Individual data scientists can use MLflow Tracking to track experiments locally on their machine, organize code in projects for future reuse, and output models that production engineers can then deploy using MLflow’s deployment tools.

Data Science Teams

Data science teams can deploy an MLflow Tracking server to log and compare results across multiple users working on the same problem (and experimenting with different models). Anyone can download and run another team member’s model.

Large Organizations

Large organizations can share projects, models, and results. Any team can run another team’s code using MLflow Projects, so organizations can package useful training and data preparation steps that another team can use, or compare results from many teams on the same task. Engineering teams can easily move workflows from R&D to staging to production.

Production Engineers

Production engineers can deploy models from diverse ML libraries in the same way, store the models as files in a management system of their choice, and track which run a model came from.

Researchers and Open Source Developers

Researchers and open source developers can publish code to GitHub in the MLflow project format, making it easy for anyone to run their code by pointing the mlflow run command directly to GitHub.

ML Library Developers

ML library developers can output models in the MLflow Model format to have them automatically support deployment using MLflow’s built-in tools. Deployment tool developers (for example, a cloud vendor building a servicing platform) can automatically support a large variety of models.

Using MLflow with a Conda Environment

In this section, we cover how to use the various features of MLflow with an Anaconda environment.

Setting up for the Tutorial

4. Make sure you have Anaconda installed

4. Clone the repository

git clone https://github.com/Noodle-ai/mlflow_part1_condaEnv.git

5. Create a conda environment from the conda.yaml file and activate

conda env create --file conda.yaml
conda activate mlflow_demos

If, instead of using the conda.yaml to set up your environment, you wanted to create an environment from scratch use the following commands to create your own conda.yaml.

conda create --name mlflow_demos python=3.8.3
conda activate mlflow_demos
conda install -c anaconda jupyter=1.0.0
conda install -c conda-forge mlflow=1.8.0
conda install scikit-learn=0.22.1
conda install -c anaconda psycopg2=2.8.5
conda install -c anaconda boto3=1.14.12
conda env export --name mlflow_demos > conda.yaml

Examples

Open experiment.ipynb and follow along. The notebook contains examples demonstrating how to use MLflow Tracking and MLflow Models. It also contains descriptions of how to use MLflow Projects.

Using the Tracking API

The MLflow Tracking API lets you log metrics and artifacts (files from your data science code) in order to track a history of your runs.

The code below logs a run with one parameter (param1), one metric (foo) with three values (1,2,3), and an artifact (a text file containing “Hello world!”).

import mlflow

mlflow.start_run()

# Log a parameter (key-value pair)
mlflow.log_param("param1", 5)
# Log a metric; metrics can be updated throughout the run
mlflow.log_metric("foo", 1)
mlflow.log_metric("foo", 2)
mlflow.log_metric("foo", 3)
# Log an artifact (output file)
with open("output.txt", "w") as f:
    f.write("Hello world!")
mlflow.log_artifact("output.txt")

mlflow.end_run()

Viewing the Tracking UI

By default, wherever you run your program, the tracking API writes data into a local ./mlruns directory. You can then run MLflow’s Tracking UI.

Activate the MLflow Tracking UI by typing the following into the terminal. You must be in the same folder as mlruns.

mlflow ui

View the tracking UI by visiting the URL returned by the previous command.

Click on the run to see more details

Click on the metric to see more details.

Example Incorporating MLflow Tracking, MLflow Models, and MLflow Projects

In this example, MLflow Tracking is used to keep track of different hyperparameters, performance metrics, and artifacts of a linear regression model. MLflow Models is used to store the pickled trained model instance, a file describing the environment the model instance was created in, and a descriptor file that lists several “flavors” the model can be used in. MLflow Projects is used to package the training code. And lastly, MLflow Models is used to deploy the model to a simple HTTP server.

This tutorial uses a dataset to predict the quality of the wine based on quantitative features like the wine’s “fixed acidity”, “pH”, “residual sugar”, and so on. The dataset is from UCI’s machine learning repository.

Training the Model

First, we train the linear regression model that takes two hyperparameters: alpha and l1_ratio.

This example uses the familiar pandas, numpy, and sklearn APIs to create a simple machine learning model. The MLflow Tracking APIs log information about each training run like hyperparameters (alpha and l1_ratio) used to train the model, and metrics (root mean square error, mean absolute error, and r2) used to evaluate the model. The example also serializes the model in a format that MLflow knows how to deploy.

Each time you run the example MLflow logs information about your experiment runs in the directory mlruns.

There is a script containing the training code called train.py. You can run the example through the .py script using the following command.

python train.py <alpha> <l1_ratio>

There is also a notebook function of the training script. You can use the notebook to run the training (train() function shown below).

# Wine Quality Sampledef train(in_alpha, in_l1_ratio):
    import pandas as pd
    import numpy as np
    from sklearn.metrics import mean_squared_error, \              
        mean_absolute_error, r2_score
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import ElasticNet
    import mlflow
    import mlflow.sklearn    def eval_metrics(actual, pred):
        rmse = np.sqrt(mean_squared_error(actual, pred))
        mae = mean_absolute_error(actual, pred)
        r2 = r2_score(actual, pred)
        return rmse, mae, r2    np.random.seed(40)    # Read the wine-quality csv file from the URL
    csv_url =\
        'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
    data = pd.read_csv(csv_url, sep=';')    # Split the data into training and test sets (0.75, 0.25) split
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    # Set default values if no alpha is provided
    if float(in_alpha) is None:
        alpha = 0.5
    else:
        alpha = float(in_alpha)

    # Set default values if no l1_ratio is provided
    if float(in_l1_ratio) is None:
        l1_ratio = 0.5
    else:
        l1_ratio = float(in_l1_ratio)

    # Useful for multiple runs
    with mlflow.start_run():
        # Execute ElasticNet
        lr = ElasticNet(
            alpha=alpha, 
            l1_ratio=l1_ratio, 
            random_state=42
        )
        lr.fit(train_x, train_y)

        # Evaluate Metrics
        predicted_qualities = lr.predict(test_x)
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        # Print out metrics
        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        # Log parameter, metrics, and model to MLflow
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)
        mlflow.sklearn.log_model(lr, "model")

Comparing the Models

Use the MLflow UI (as described above) to compare the models that you have produced.

You can use the search feature to quickly filter out many models. For example the query (metrics.rmse < 0.8) returns all of the models with root mean square error less than 0.8. For more complex manipulations, you can download this table as a CSV and use your favorite data munging software to analyze it.

Loading a Saved Model

After a model has been saved using MLflow Models within MLflow Tracking you can easily load the model in a variety of flavors (python_function, sklearn, etc.). We need to choose a model from the mlruns folder for the model path.

model_path = './mlruns/0/<run_id>/artifacts/model'
mlflow.<model_flavor>.load_model(modelpath)

Packaging the Training Code in a Conda Env with MLflow Projects

Now that you have your training code, you can package it so that other data scientists can easily reuse the training script, or so that you can run the training remotely.

You do this by using MLflow Projects to specify the dependencies and entry points to your code. The MLproject file specifies the project has the dependencies located in a Conda environment (defined by conda.yaml) and has one entry point (train.py) that takes two parameters: alpha and l1_ratio.

To run this project use mlflow run on the folder containing the MLproject file.

mlflow run . -P alpha=1.0 -P l1_ratio=1.0

After running this command, MLflow runs your training code in a new Conda environment with the dependencies specified in conda.yaml.

If a repository has an MLproject file you can also run a project directly from GitHub. This tutorial lives in the https://github.com/Noodle-ai/mlflow_part1_condaEnv repository which you can run with the following command. The symbol “#” can be used to move into a subdirectory of the repo. The “ –version” argument can be used to run code from a different branch.

mlflow run https://github.com/Noodle-ai/mlflow_part1_condaEnv -P alpha=1.0 -P l1_ratio=0.8

Serving the Model

Now that you have packaged your model using the MLproject convention and have identified the best model, it is time to deploy the model using MLflow Models. An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools — for example, real-time serving through a REST API or batch inference on Apache Spark.

In the example training code above, after training the linear regression model, a function in MLflow saved the model as an artifact within the run.

mlflow.sklearn.log_model(lr, "model")

To view this artifact, you can use the UI again. When you click a date in the list of experiment runs you’ll see this page.

At the bottom, you can see the call to mlflow.sklearn.log_model produced three files in ./mlruns/0/<run_id>/artifacts/model. The first file, MLmodel, is a metadata file that tells MLflow how to load the model. The second file is a conda.yaml that contains the model dependencies from the Conda environment. The third file, model.pkl, is a serialized version of the linear regression model that you trained.

In this example, you can use this MLmodel format with MLflow to deploy a local REST server that can serve predictions.

To deploy the server, run the following command.

mlflow models serve -m ./mlruns/0/<run_id>/artifacts/model -p 1234

Note: The version of Python used to create the model must be the same as the one running mlflow models serve. If this is not the case, you may see the error:

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x9f in position 1: ordinal not in range(128) or raise ValueError, “unsupported pickle protocol: %d”

Once you have deployed the server, you can pass it some sample data and see the predictions. The following example uses curl to send a JSON-serialized pandas DataFrame with the split orientation to the model server. For more information about the input data formats accepted by the model server, see the MLflow deployment tools documentation.

curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations

The server should respond with output similar to:

[3.7783608837127516]

Using MLflow with a Docker Environment

In this section, we cover how to use the various features of MLflow with a Docker environment. Docker has some advantages in terms of scalability compared to Anaconda. If you develop a pipeline in a container afterward, with minor modifications to your Dockerfile, your pipeline is ready for production. For example, you may need to delete aspects of the Dockerfile used for development (Jupyter library, SSH configuration), include your pipeline script as a command, and add a way to output objects of interest (write to a database, serve over REST). You can then orchestrate containers, running your pipeline with tools like Kubernetes on a cluster that scales with traffic. Alternatively, if you are training in a container (as we do below) you can run containers in parallel on a cluster to do hyperparameter tuning.

Setting up for the Tutorial

Note: We will present two options for developing inside a container. The first is running a container locally. The second is running a container remotely (for example in a VM).

  1. Make sure you have Anaconda installed
  2. Install the MLflow library in a Python 3 environment
conda install -c conda-forge mlflow=1.8.0

1. Install Docker

2. Clone the repository

git clone https://github.com/Noodle-ai/mlflow_part2_dockerEnv.git

3. Build an Image from the Dockerfile

Note: The name of your image (“mlflow_image” in this case) must match the name in your MLproject file.

docker image build -t mlflow_image .

The Dockerfile below is used to build an image that creates a conda environment, then sets up SSH with user “dockeruser” and password “123”.

FROM continuumio/miniconda3

# Create the environment using conda
RUN conda config --set channel_priority false
RUN conda install -c anaconda jupyter
RUN pip install --upgrade cython
RUN pip install mlflow==1.8.0
RUN conda install -c anaconda scikit-learn
RUN conda install -c anaconda psycopg2
RUN conda install -c anaconda boto3

# Set up SSH
RUN apt-get update && apt-get install -y openssh-server
RUN useradd -m -s /bin/bash dockeruser
RUN mkdir /var/run/sshd
RUN echo 'dockeruser:123' | chpasswd
EXPOSE 22
CMD ["/usr/sbin/sshd", "-D"]

4. Build a Container from the Image

We use (-P) to publish all exposed ports in the container to random ports (the argument -P is necessary for the local container workflow, not for remote container workflow). We use (-d) to run the container in the background. And we use (–mount) to mount the mlflow_part2_dockerEnv repository in the home folder of dockeruser.

docker run -d -P --mount type=bind,source=$(pwd),target=/home/dockeruser --name mlflow_container mlflow_image

5. Determine the Port that Docker Port 22 was published to (necessary for local container workflow, not for the remote container workflow).

docker port mlflow_container 22

6. Get the IP Address of the Container (necessary for the remote container workflow, not for the local container workflow).

This command will return a lot of information about your container. The IP address should be under “NetworkSettings”.

docker inspect mlflow_container

or

docker inspect -f "{{ .NetworkSettings.IPAddress }}" mlflow_container

Examples

The Dockerfile has been configured so that you can SSH to the container. Because of this, you can use the container as a development environment. If you are running your container locally you can directly SSH into the container in order to use your IDE within the container itself. I for example will use the SSH extension in VSCode to use VSCode and notebooks within the container. Configure .ssh/config to use the host name “dockeruser” (user defined in Dockerfile) and the port returned above in the setup instructions.

If the container is not running locally (for example running in a VM) you can port forward a local port to be connected to the container port in your VM. After you connect your local port to your container you can SSH as if your container was running locally. Choose a <local_port> you would like to use, use the <container_ip_address> that you got in the setup instructions, and lastly use the <vm_ip_address>. After port forwarding, you can SSH to <local_port> in order to develop in your container. It is possible you may encounter a permissions issue if you are attempting to connect using a tool like VSCode SSH extension. This extension creates a .vscode-server folder inside of the destination folder mounted in the container and dockeruser may not have permission to do this depending on the default permissions settings in your VM. If this is the case be sure to change the permissions of the mlflow_part2_dockerEnv directory you are mounting (chmod 777 mlflow_part2_dockerEnv). The command to port forward is below.

ssh -L <local_port>:<container_ip_address>:22 <vm_ip_address>
If the container lives in a VM we create a tunnel to port 22 in the container and then SSH.

After SSH-ing into the container if using VSCode you may need to install any extensions you need in the container, select a Python interpreter, and then spawn a new terminal. Open experiment.ipynb and follow along. The notebook contains examples demonstrating how to use MLflow Tracking and MLflow Models. It also contains descriptions of how to use MLflow Projects.

Note: If you encounter the warning Warning: Remote Host Identification Has Changed! this could be due to a new container being on a port that previously hosted a different container. Delete the entry from ~/.ssh/known_hosts to resolve the issue.

Using the Tracking API

The MLflow tracking API lets you log metrics and artifacts (files from your data science code) in order to track a history of your runs.

Note: The default behavior of MLflow Tracking creates mlruns folder. Within this mlruns folder, absolute paths are used by MLflow. This creates a conflict when tracking experiments created locally and within a container together. In this section to get around the issue, I create a different experiment for runs created within the container and runs created from outside the container (named “notebook” and “script” respectively). From the notebook, the experiment can be set using mlflow.set_experiment(‘notebook’). But it is worth keeping in mind that the proper way to resolve this issue is to use a database tracking URI (covered in next sections).

The code below logs a run with one parameter (param1), one metric (foo) with three values (1,2,3), and an artifact (a text file containing “Hello world!”).

import mlflow

mlflow.start_run()

# Log a parameter (key-value pair)
mlflow.log_param("param1", 5)
# Log a metric; metrics can be updated throughout the run
mlflow.log_metric("foo", 1)
mlflow.log_metric("foo", 2)
mlflow.log_metric("foo", 3)
# Log an artifact (output file)
with open("output.txt", "w") as f:
    f.write("Hello world!")
mlflow.log_artifact("output.txt")

mlflow.end_run()

Viewing the Tracking UI

By default, wherever you run your program, the tracking API writes data into a local ./mlruns directory. You can then run MLflow’s Tracking UI.

Activate the MLflow Tracking UI by typing the following into the terminal. You must be in the same folder as mlruns.

mlflow ui

View the tracking UI by visiting the URL returned by the previous command. Then click on “notebook” under the Experiments tab.

Click on the run to see more details.

Click on the metric to see more details.

Example Incorporating MLflow Tracking, MLflow Models, and MLflow Projects

In this example MLflow Tracking is used to keep track of different hyperparameters, performance metrics, and artifacts of a linear regression model. MLflow Models is used to store the pickled trained model instance, a file describing the environment the model instance was created in, and a descriptor file that lists several “flavors” the model can be used in. MLflow Projects is used to package the training code. Lastly MLflow Models is used to deploy the model to a simple HTTP server.

This tutorial uses a dataset to predict the quality of wine based on quantitative features like the wine’s “fixed acidity”, “pH”, “residual sugar”, and so on. The dataset is from UCI’s machine learning repository.

Training the Model

First, we train a linear regression model that takes two hyperparameters: alpha and l1_ratio.

This example uses the familiar pandas, numpy, and sklearn APIs to create a simple machine learning model. The MLflow tracking APIs log information about each training run like hyperparameters (alpha and l1_ratio) used to train the model, and metrics (root mean square error, mean absolute error, and r2) used to evaluate the model. The example also serializes the model in a format that MLflow knows how to deploy.

Each time you run the example MLflow logs information about your experiment runs in the directory mlruns.

There is a script containing the training code called train.py. You can run the example through the .py script using the following command.

python train.py <alpha> <l1_ratio>

There is also a notebook function of the training script. You can use the notebook to run the training (train() function shown below).

# Wine Quality Sampledef train(in_alpha, in_l1_ratio):
    import pandas as pd
    import numpy as np
    from sklearn.metrics import mean_squared_error, \   
        mean_absolute_error, r2_score
    from sklearn.model_selection import train_test_split
    from sklearn.linear_model import ElasticNet
    import mlflow
    import mlflow.sklearn

    def eval_metrics(actual, pred):
        rmse = np.sqrt(mean_squared_error(actual, pred))
        mae = mean_absolute_error(actual, pred)
        r2 = r2_score(actual, pred)
        return rmse, mae, r2

    np.random.seed(40)

    # Read the wine-quality csv file from the URL
    csv_url =\
        'http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv'
    data = pd.read_csv(csv_url, sep=';')

    # Split the data into training and test sets (0.75, 0.25) split
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    # Set default values if no alpha is provided
    if float(in_alpha) is None:
        alpha = 0.5
    else:
        alpha = float(in_alpha)

    # Set default values if no l1_ratio is provided
    if float(in_l1_ratio) is None:
        l1_ratio = 0.5
    else:
        l1_ratio = float(in_l1_ratio)

    # Useful for multiple runs
    with mlflow.start_run():
        # Execute ElasticNet
        lr = ElasticNet(
            alpha=alpha, 
            l1_ratio=l1_ratio, 
            random_state=42
        )
        lr.fit(train_x, train_y)

        # Evaluate Metrics
        predicted_qualities = lr.predict(test_x)
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        # Print out metrics
        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        # Log parameter, metrics, and model to MLflow
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)
        mlflow.sklearn.log_model(lr, "model")

Comparing the Models

Use the MLflow UI (as described above) to compare the models that you have produced.

You can use the search feature to quickly filter out many models. For example, the query (metrics.rmse < 0.8) returns all the models with root mean square error less than 0.8. For more complex manipulations, you can download this table as a CSV and use your favorite data munging software to analyze it.

Loading a Saved Model

After a model has been saved using MLflow Models within MLflow Tracking you can easily load the model in a variety of flavors (python_function, sklearn, etc.). We need to choose a model from the mlruns folder for the model_path.

model_path = './mlruns/1/<run_id>/artifacts/model'
mlflow.<model_flavor>.load_model(model_path)

Packaging the Training Code in a Docker Container with MLflow Projects

Note: If you have been following along and are developing within the container, exit the container now.

Now that you have your training code, you can package it so that other data scientists can easily reuse the training script, or so that you can run the training remotely.

You do this by using MLflow Projects to specify the dependencies and entry points to your code. The MLproject file specifies that the project has the dependencies located in a Docker image named mlflow_image (created from Dockerfile) and has one entry point (train.py) that takes two parameters: alpha and l1_ratio.

To run this project use mlflow run on the folder containing the MLproject file.

mlflow run . -P alpha=1.0 -P l1_ratio=1.0 --experiment-name script

This builds a new Docker image based on “mlflow_image” that also contains our project code. This resulting image is tagged as “mlflow_image-<git-version> where <git-version> is the git commit ID. After the image is built, MLflow executes the default (main) project entry point within the container using “docker run”.

Environment variables, such as “MLFLOW_TRACKING_URI”, are propagated inside the container during project execution. When running against a local tracking URI, MLflow mounts the host system’s tracking directory (e.g. a local mlruns directory) inside the container so that metrics and params logged during project execution are accessible afterwards.

If a repository has an MLproject file you can also run a project directly from GitHub. This tutorial lives in the https://github.com/Noodle-ai/mlflow_part2_dockerEnv repository which you can run with the following command. The symbol “#” can be used to move into a subdirectory of the repo. The “–version” argument can be used to run code from a different branch. The “ –experiment-name” argument can be used to choose an experiment name in mlruns. We must set experiment in this case to be different than the experiment ran in the container because absolute paths in MLflow Tracking will lead to an error. The image must be built locally for this to work.

mlflow run https://github.com/Noodle-ai/mlflow_part2_dockerEnv -P alpha=1.0 -P l1_ratio=0.8 --experiment-name script

Serving the Model (Local REST API Server)

Now that you have packaged your model using the MLproject convention and have identified the best model, it’s time to deploy the model using MLflow Models. An MLflow Model is a standard format for packaging machine learning models that can be used in a variety of downstream tools — for example, real-time serving through a REST API or batch inference on Apache Spark.

In the example training code above, after training the linear regression model, a function in MLflow saved the model as an artifact within the run.

mlflow.sklearn.log_model(lr, "model")

To view this artifact, you can use the UI again. When you click a date in the list of experiment runs you’ll see this page.

At the bottom, you can see that the call to mlflow.sklearn.log_model produced three files in ./mlruns/1/<run_id>/artifacts/model. The first file, MLmodel, is a metadata file that tells MLflow how to load the model. The second file is a conda.yaml that contains the model dependencies from the Conda environment. The third file, model.pkl, is a serialized version of the linear regression model that you trained.

In this example, you can use this MLmodel format with MLflow to deploy a local REST server that can serve predictions.

To deploy the server, run the following command:

mlflow models serve -m ./mlruns/1/<run_id>/artifacts/model -p 1234

Note: The version of Python used to create the model must be the same as the one running mlflow models serve. If this is not the case, you may see the error:

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0x9f in position 1: ordinal not in range(128) or raise ValueError, “unsupported pickle protocol: %d”.

Once you have deployed the server, you can pass it some sample data and see the predictions. The following example uses curl to send a JSON-serialized pandas DataFrame with the split orientation to the model server. For more information about the input data formats accepted by the model server, see the MLflow deployment tools documentation.

curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations

The server should respond with output similar to:

[3.7783608837127516]

Serving the Model (Serving the Model as a Docker Image)

Note: This command is experimental (may be changed or removed in a future release without warning) and does not guarantee that the arguments nor format of the Docker container will remain the same.

Here we build a Docker image whose default entry point serves the specified MLflow model at port 8080 within the container.

The command below builds a docker image named “serve_model” that serves the model in ./mlruns/1/<run_id>/artifacts/model.

mlflow models build-docker -m "./mlruns/1/<run_id>/artifacts/model" -n "serve_model"

We can then serve the model, exposing it at port 5001 on the host with the following command:

docker run -p 5001:8080 "serve_model"

Once you have created a container that serves the model with the above command, you can pass it some sample data and see the predictions. Similar to the above, the following example uses curl to send a JSON-serialized pandas DataFrame with the split orientation to the model server.

curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:5001/invocations

Again, the server should respond with an output similar to:

[3.7783608837127516]

Running MLflow Tracking with PostgreSQL Database and MinIO Artifact Store

In this section, we show how to configure MLflow in a way that allows multiple data scientists using different machines to collaborate by logging experiments in the same location. We build on top of the examples in Anaconda Environment. Instead of having a local mlruns folder for storing the information from MLflow Tracking, we store the parameters and metrics in a PostgreSQL Database, while storing the artifacts in MinIO object storage.

Note: This is the preferred solution to the issue we encountered in Docker Environment when we observed the absolute path conflict between the container and the local environment in MLflow Tracking. If you plan on sharing the same tracking experiment across devices a DB should be used for the tracking URI.

Setting up a PostgreSQL Database Tracking URI and MinIO Artifact URI

Clone the repository, and navigate to the downloaded folder.

git clone https://github.com/iamirmasoud/mlflow_postgres_minio.git
cd mlflow_postgres_minio

Running PostgreSQL and MinIO containers using Docker Compose

1. Run the following docker-compose command to start a PostgreSQL and a MinIO server:

docker-compose  up -d 

Here is the content docker-compose.yml:

version: '3'

services:
  mlflow_postgres:
    image: postgres:13
    container_name: postgres_db
    environment:
      - POSTGRES_USER=postgres
      - POSTGRES_PASSWORD=postgres
      - POSTGRES_DB=mlflow_db
    volumes:
      - postgres_data:/var/lib/postgresql/data
    ports:
      - "5432:5432"
  minio:
    image: minio/minio
    ports:
      - "9000:9000"
      - "9001:9001"
    volumes:
      - minio_data
    environment:
      MINIO_ROOT_USER: masoud
      MINIO_ROOT_PASSWORD: Strong#Pass#2022
    command: server --console-address ":9001" /data
volumes:
  postgres_data: { }
  minio_data: { }

You can run the docker ps command to check if containers were created or not.

3. Verify the new database was created

docker exec -it postgres_db psql -U postgres -c "\l"

You should be able to access MinIO Console. Open your browser and go to http://127.0.0.1:9001 to open the MinIO Console login page. Log in with the Root User and Root Pass you set in docker-compose.yml file (masoud and Strong#Pass#2022 in my case).

From the MinIO UI, create a bucket called mlflow bucket by clicking on the create bucket button in the bottom right corner.

MLflow Examples

Open experiment.ipynb and follow along. This is identical to the notebook in Anaconda Environment except that it uses a PostgreSQL DB as the tracking server and MinIO as the artifact URI.

Using the Tracking API

In order to use a PostgreSQL DB, we must set a new tracking URI that uses the PostgreSQL DB we configured above. The database is encoded as: <dialect>+<driver>://<username>:<password>@<host>:<port>/<database>. We also must set the S3 endpoint URL with the URL returned when we spun up our MinIO UI. Lastly, our environment must know the Access Key and Secret Key.

os.environ['MLFLOW_TRACKING_URI'] = 'postgresql+psycopg2://postgres:postgres@localhost/mlflow_db'
os.environ['MLFLOW_S3_ENDPOINT_URL'] = "http://127.0.0.1:9000"
os.environ['AWS_ACCESS_KEY_ID'] = 'masoud'
os.environ['AWS_SECRET_ACCESS_KEY'] = 'Strong#Pass#2022'

We create a new experiment, setting the artifact location to be the “mlflow” bucket we created in the MinIO UI (Note: an experiment can only be created once). We then set this as our current experiment.

mlflow.create_experiment('exp', artifact_location='s3://mlflow')
mlflow.set_experiment('exp')

Viewing the Tracking UI

We have configured MLflow to use a PostgreSQL DB for tracking. Because of this, we must use the “–backend-store-uri” argument to tell MLflow where to find the experiments. We must set our environment variables in the terminal before opening the MLflow UI (similar to above in the notebook).

export MLFLOW_TRACKING_URI=postgresql+psycopg2://postgres:postgres@localhost/mlflow_db
export MLFLOW_S3_ENDPOINT_URL=http://127.0.0.1:9000
export AWS_ACCESS_KEY_ID=masoud
export AWS_SECRET_ACCESS_KEY=Strong#Pass#2022
mlflow ui --backend-store-uri 'postgresql+psycopg2://postgres:postgres@localhost/mlflow_db'

Loading a Saved Model

After a model has been saved using MLflow Models within MLflow Tracking, you can easily load the model in a variety of flavors (python_function, sklearn, etc.). We need to choose a model from the mlflow bucket in MinIO.

model_path = 's3://mlflow/<run_id>/artifacts/model'
mlflow.<model_flavor>.load_model(modelpath)

Packaging the Training Code in a Conda Environment with MLflow Projects

For more detailed information on Packaging with MLflow Projects look at the previous sections. To run this project use mlflow run on the folder containing the MLproject file. To designate the correct experiment, use the –experiment-name argument. We must set our environment variables in the terminal before running the command.

export MLFLOW_TRACKING_URI=postgresql+psycopg2://postgres:postgres@localhost/mlflow_db
export MLFLOW_S3_ENDPOINT_URL=http://127.0.0.1:9000
export AWS_ACCESS_KEY_ID=masoud
export AWS_SECRET_ACCESS_KEY=Strong#Pass#2022
mlflow run . -P alpha=1.0 -P l1_ratio=1.0 --experiment-name demo_experiment

If a repository has an MLproject file you can also run a project directly from GitHub. This tutorial lives in the https://github.com/iamirmasoud/mlflow_postgres_minio repository which you can run with the following command. The symbol “#” can be used to move into a subdirectory of the repo. The “ –version” argument can be used to run code from a different branch. To designate the correct experiment use the –experiment-name argument. We must set our environment variables in the terminal before running the command.

export MLFLOW_TRACKING_URI=postgresql+psycopg2://postgres:postgres@localhost/mlflow_db
export MLFLOW_S3_ENDPOINT_URL=http://127.0.0.1:9000
export AWS_ACCESS_KEY_ID=masoud
export AWS_SECRET_ACCESS_KEY=Strong#Pass#2022
mlflow run https://github.com/iamirmasoud/mlflow_postgres_minio -P alpha=1.0 -P l1_ratio=0.8 --experiment-name demo_experiment

You can add –no-conda option to mlflow run command to use the current environment instead of creating a new conda environment for this run.

Serving the Model

For more detailed information on serving the model look at the Anaconda section. We must set our environment variables in the terminal before running the command. To deploy the server, run the following commands:

export MLFLOW_TRACKING_URI=postgresql+psycopg2://postgres:postgres@localhost/mlflow_db
export MLFLOW_S3_ENDPOINT_URL=http://127.0.0.1:9000
export AWS_ACCESS_KEY_ID=masoud
export AWS_SECRET_ACCESS_KEY=Strong#Pass#2022
mlflow models serve -m s3://mlflow/<run_id>/artifacts/model -p 1234

Once you have deployed the server, you can pass it some sample data and see the predictions. The following example uses curl to send a JSON-serialized pandas DataFrame with the split orientation to the model server. For more information about the input data formats accepted by the model server, see the MLflow deployment tools documentation.

curl -X POST -H "Content-Type:application/json; format=pandas-split" --data '{"columns":["alcohol", "chlorides", "citric acid", "density", "fixed acidity", "free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8, 0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations

The server should respond with output similar to:

[3.7783608837127516]

MLflow Model Registry

The MLflow Model Registry is a centralized model store, set of APIs, and UI, to collaboratively manage the full lifecycle of an MLflow model. It provides model lineage (which MLflow experiment and run produced the model), model versioning, stage transitions (for example from staging to production), and annotations. In the following sections, I suppose that you have already set up MLflow with the PostgreSQL Tracking server and MinIO artifact URI.

Basic Concepts

Model

An MLflow Model is created from an experiment or a run that is logged with a model flavor’s log_model method (mlflow.<model_flavor>.log_model()). Once logged, this model can then be registered with the Model Registry.

Registered Model

An MLflow Model can be registered with the Model Registry. A registered model has a unique name, versions, associated transitional stages, model lineage, and other metadata.

Model Version

Each registered model can have one or many versions. When a new model is added to the Model Registry, it is added as version 1. Each new model registered to the same model name increments the version number.

Model Stage

Each distinct model version can be assigned one stage at any given time. MLflow provides predefined stages for common use-cases such as Staging, Production, or Archived. You can transition a model version from one stage to another stage.

Annotations and Descriptions

You can annotate the top-level model and each version individually using Markdown; including description and any relevant information useful for the team such as algorithm descriptions, dataset employed, or methodology.

Model Registry Workflows

If running your own MLflow server, you must use a database-backed backend store in order to access the Model Registry via the UI or API. Before you can add a model to the Model Registry, you must log it using the log_model methods of the corresponding model flavors. Once a model has been logged, you can add, modify, update, transition, or delete a model in the Model Registry through the UI or the API.

UI Workflow

  1. From the MLflow Runs detail page, select a logged MLflow Model in the Artifacts section.
  2. Click the “Register Model” button.

3. If you are adding a new model, specify a unique name (ElasticnetWineModel for example) to identify the model. If you are registering a new version to an existing model, pick the existing model name from the dropdown.

Once the model is added to the Model Registry you can:

  • Go to the “Artifacts” section of the run detail page, click the model, and then click the model version at the top right to view the version you created.
  • This opens the “version detail” page where you can see model version details and the current stage of the model version.
  • Click the “Stage” drop-down at the top right to transition the model version to one of the other valid stages.
  • From the “version detail” page you can navigate to the “Registered Models” page and view the model properties by clicking “Registered Models” in the top left.
  • You can click on one of the listed model names in the “Registered Models” page to open the “model overview” page that lists the active versions.
  • You can then navigate back to the “version detail” page by clicking a model version on the “model overview” page.

API Workflow

Adding an MLflow Model to the Model Registry

There are three programmatic ways to add a model to the registry.

First, you can use the mlflow.<model_flavor>.log_model() method by populating the registered_model_name input.

def train_with_model_registry(in_alpha, in_l1_ratio):
    np.random.seed(40)

    # Read the wine-quality csv file from the URL
    csv_url = "http://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/winequality-red.csv"
    data = pd.read_csv(csv_url, sep=";")

    # Split the data into training and test sets. (0.75, 0.25) split.
    train, test = train_test_split(data)

    # The predicted column is "quality" which is a scalar from [3, 9]
    train_x = train.drop(["quality"], axis=1)
    test_x = test.drop(["quality"], axis=1)
    train_y = train[["quality"]]
    test_y = test[["quality"]]

    # Set default values if no alpha is provided
    if float(in_alpha) is None:
        alpha = 0.5
    else:
        alpha = float(in_alpha)

    # Set default values if no l1_ratio is provided
    if float(in_l1_ratio) is None:
        l1_ratio = 0.5
    else:
        l1_ratio = float(in_l1_ratio)

    # Useful for multiple runs
    with mlflow.start_run():
        # Execute ElasticNet
        lr = ElasticNet(alpha=alpha, l1_ratio=l1_ratio, random_state=42)
        lr.fit(train_x, train_y)

        # Evaluate Metrics
        predicted_qualities = lr.predict(test_x)
        (rmse, mae, r2) = eval_metrics(test_y, predicted_qualities)

        # Print out metrics
        print("Elasticnet model (alpha=%f, l1_ratio=%f):" % (alpha, l1_ratio))
        print("  RMSE: %s" % rmse)
        print("  MAE: %s" % mae)
        print("  R2: %s" % r2)

        # Log parameter, metrics, and model to MLflow
        mlflow.log_param("alpha", alpha)
        mlflow.log_param("l1_ratio", l1_ratio)
        mlflow.log_metric("rmse", rmse)
        mlflow.log_metric("r2", r2)
        mlflow.log_metric("mae", mae)
        mlflow.sklearn.log_model(lr, "model")
        mlflow.sklearn.log_model(
        sk_model=lr,
        artifact_path = "model",
        registered_model_name="ElasticnetWineModel"
    )

If a registered model with the name does not exist, the method registers a new model, creates Version 1, and returns a ModelVersion MLflow object. If a registered model with the name exists already, the method creates a new model version and returns the version object.

The second way is to use the mlflow.register_model() method after all your experiment runs complete and when you have decided which model is the most suitable to add to the registry. For this method, you will need the “run_id” as part of the URI argument.

result = mlflow.register_model(
  model_uri="runs:/<run_id>/artifacts/model",
  name="ElasticnetWineModel"
)

If a registered model with the name doesn’t exist, the method registers a new model, creates Version 1, and returns a ModelVersion MLflow object. If a registered model with the name exists already, the method creates a new model version and returns the version object.

And finally, you can use the create_registered_model() method to create a new registered model. If the model name exists, this method will throw an mlflowException because creating a new registered model requires a unique name.

from mlflow.tracking import MlflowClient

client = MlflowClient()
client.create_registered_model("ElasticnetWineModel")

While the method above creates an empty registered model with no version associated, the method below creates a new version of the model.

client = MlflowClient()
result = client.create_model_version(
    name="ElasticnetWineModel",
    source="s3://mlflow/<run_id>/artifacts/model",
    run_id="<run_id>"
)

Adding or Updating an MLflow Model Description

At any point in a model’s lifecycle development, you can update a model version’s description using update_model_version().

client = MlflowClient()
client.update_model_version(
    name="ElasticnetWineModel",
    version=1,
    description="This model version is a scikit-learn elastic net"
)

Renaming an MLflow Model

In addition to adding or updating a description of a specific version of the model, you can rename an existing registered model using rename_registered_model().

client = MlflowClient()
client.rename_registered_model(
    name="ElasticnetWineModel",
    new_name="ElasticnetWineModel2"
)

Transitioning an MLflow Model’s Stage

Over the course of the model’s lifecycle, a model evolves — from development to staging or to production. You can transition a registered model to one of the stages: Staging, Production, or Archived.

client = MlflowClient()
client.transition_model_version_stage(
    name="ElasticnetWineModel",
    version=3,
    stage="Production"
)

The accepted values for “stage” are: Staging|Archived|Production|None.

Listing and Searching MLflow Models

You can fetch a list of all registered models in the registry with a simple method.

from pprint import pprint

client = MlflowClient()
for rm in client.list_registered_models():
  pprint(dict(rm), indent=4)

Deleting MLflow Models

Note: Deleting registered models or model versions is irrevocable, so use it judiciously.

You can either delete specific versions of a registered model or you can delete a registered model and all its versions.

# Delete versions 1,2, and 3 of the model
client = MlflowClient()
versions=[1, 2, 3]
for version in versions:
    client.delete_model_version(
        name="ElasticnetWineModel", 
        version=version
    )

# Delete a registered model along with all its versions
client.delete_registered_model(name="ElasticnetWineModel")

Reference:

https://medium.com/noodle-labs-the-future-of-ai/introduction-to-mlflow-for-mlops-part-1-anaconda-environment-1fd9e299226f#d216

https://betterprogramming.pub/automate-your-machine-learning-experiments-with-mlflow-8c9e42df421

https://towardsdatascience.com/experiment-tracking-with-mlflow-in-10-minutes-f7c2128b8f2c

https://github.com/mlrepa/mlflow-1-tracking

https://docs.databricks.com/_static/notebooks/mlflow/mlflow-end-to-end-example.html

https://towardsdatascience.com/be-more-efficient-to-produce-ml-models-with-mlflow-c104362f377d

Amir Masoud Sefidian
Amir Masoud Sefidian
Machine Learning Engineer

Comments are closed.