Master MLflow: Track Experiments and Deploy Models
This guide provides a comprehensive introduction to MLflow, a powerful tool for managing the machine learning lifecycle. You’ll learn how to track experiments, version models, and integrate MLflow into your professional workflows, ultimately building reproducible and scalable ML systems. We’ll cover everything from local experiment tracking to deploying production-ready models.
Overview of What You’ll Learn
- The fundamental reasons why experiment tracking is crucial for ML systems.
- How MLflow addresses the limitations of traditional methods like Jupyter notebooks and Git for ML development.
- Setting up MLflow on your local system and understanding its core components: tracking server, backend store, and artifact store.
- Creating MLflow experiments and logging parameters, metrics, and artifacts.
- Exploring the MLflow UI to visualize experiment runs and their associated details.
Prerequisites
- Python installed on your system.
- Basic understanding of machine learning concepts.
- Familiarity with using the terminal or command prompt.
Step 1: Understanding the Need for ML Experiment Tracking
Before diving into MLflow, it’s essential to understand why traditional development methods fall short for machine learning. ML projects often start with a single Jupyter notebook, a dataset, and one model. While this works for individual research or very small teams, it quickly becomes unmanageable in larger organizations.
The Problem with Ad Hoc Experiments
- Lack of Reproducibility: Without proper tracking, it’s difficult to recall the exact parameters, code versions, and environment settings used to train a specific model.
- Confusion and Inconsistency: As more data scientists work on a project, individual naming conventions and manual tracking methods lead to chaos.
- Memory Fallibility: Humans tend to overestimate their ability to remember the details of past experiments. Relying on memory is unreliable.
- Probabilistic Nature of ML: Unlike traditional software, ML model outputs are probabilistic due to data and randomness. This means versioning goes beyond just code; it includes the entire decision history.
What Constitutes an ML Experiment?
An ML experiment is an encapsulation of several key components:
- Code: The training scripts.
- Data: The datasets used.
- Parameters: Hyperparameters and other configuration settings.
- Randomness: The inherent randomness in training processes.
- Environment: The software packages and their versions (e.g., Python libraries).
Git effectively tracks code changes, but it doesn’t capture the data, parameters, or environment, which are critical for ML reproducibility.
Why Notebooks Don’t Scale
- No Structured Metadata: Notebooks lack a systematic way to track multiple runs, parameters, or metrics within a single notebook or across different notebooks.
- Difficult Run Comparison: Comparing different training runs is cumbersome and error-prone.
The Dangers in Production
Without proper tracking, ML systems lose their decision history. This is dangerous in production because:
- Retraining is Frequent: Data changes, team members change, and infrastructure evolves, necessitating frequent model retraining.
- Lack of Auditability: It becomes impossible to definitively state why a particular model was deployed, which is crucial for compliance and debugging.
- No Safe Rollbacks: Reverting to a previous stable model version becomes a manual and risky process.
Expert Note: Excuses like “we’ll clean up the code later” or “this is just research” often lead to technical debt. Tracking doesn’t slow you down; it enhances future productivity.
Step 2: Setting Up MLflow Locally
This section guides you through installing MLflow and setting up a local tracking server.
Installation Steps
- Create a Project Directory: Create a new folder for your MLflow project (e.g.,
MLflow_YouTube). - Open Terminal: Navigate to your project directory using your terminal or command prompt.
- Create a Virtual Environment: It’s best practice to use a virtual environment to manage dependencies. Use Python’s built-in
venvmodule:python -m venv venv - Activate the Virtual Environment:
- On macOS/Linux:
source venv/bin/activate - On Windows:
venvScriptsactivate
- On macOS/Linux:
- Install MLflow: With the virtual environment activated, install MLflow using pip:
pip install mlflow
Running the MLflow Tracking Server
MLflow provides a command-line interface (CLI) to manage its features. To start the local tracking server:
- Run the MLflow Command: In your activated virtual environment and project directory, run:
mlflow server - Access the UI: By default, the server runs on
http://127.0.0.1:5000. Open this URL in your web browser to access the MLflow UI.
Tip: When you run mlflow server, MLflow creates an mlruns directory in your project folder. This directory stores your experiment artifacts by default.
Step 3: Creating Your First Experiment and Run
Now that MLflow is set up, let’s create an experiment and log some data.
Creating an Experiment
You can set an experiment name using the MLflow Python API. If the experiment doesn’t exist, MLflow will create it.
- Create a Python Script: Create a new Python file (e.g.,
lecture_2.py) in your project directory. - Import MLflow: Start by importing the MLflow library.
import mlflow - Set the Experiment: Use
mlflow.set_experiment()to define your experiment.mlflow.set_experiment("Demo Experiment")
Logging Parameters and Artifacts within a Run
Within an experiment, you can create multiple runs, each representing a single execution of your training process. You can log parameters, metrics, and artifacts for each run.
- Start a Run: Use the
with mlflow.start_run():context manager. Any logging commands within this block will be associated with the current run.with mlflow.start_run(run_name="My First Run"):
# Log parameters and artifacts here - Log Parameters:
- Log individual parameters:
mlflow.log_param("learning_rate", 0.01) - Log a dictionary of parameters:
params = {"epochs": 100, "batch_size": 32}; mlflow.log_params(params)
- Log individual parameters:
- Log Artifacts: Artifacts are any files produced during a run (e.g., models, plots, data files). Use
mlflow.log_artifact().# Assuming you have a file named 'my_model.pkl'mlflow.log_artifact("my_model.pkl")
Example Script (lecture_2.py):
import mlflow
# Set the experiment name
mlflow.set_experiment("Demo Experiment")
# Start a new run
with mlflow.start_run(run_name="My First Run"):
# Log parameters
mlflow.log_param("learning_rate", 0.01)
mlflow.log_param("epochs", 100)
# Log a dictionary of parameters
more_params = {"batch_size": 32, "optimizer": "adam"}
mlflow.log_params(more_params)
# Log an artifact (e.g., a dummy file)
with open("my_model.pkl", "w") as f:
f.write("This is a dummy model file.")
mlflow.log_artifact("my_model.pkl")
print("MLflow run completed.")
Viewing Runs in the MLflow UI
- Run the Script: Execute your Python script (e.g.,
python lecture_2.py). - Refresh the UI: Go back to your MLflow UI (
http://127.0.0.1:5000) and navigate to the “Experiments” tab. You should see “Demo Experiment” and within it, “My First Run”. - Explore Run Details: Click on the run name to see the logged parameters, metrics, and artifacts.
Tip: The mlruns directory now contains subdirectories for experiment IDs and run IDs, storing the actual artifact files.
Step 4: Understanding MLflow’s Backend and Artifact Stores
MLflow separates metadata storage (backend store) from file storage (artifact store).
Key Concepts
- Backend Store: Stores metadata like parameters, metrics, tags, and run history. By default, when running locally, MLflow uses an SQLite database (
mlflow.db) within themlrunsdirectory. - Artifact Store: Stores the actual files generated during a run (e.g., model weights, datasets, plots). By default, this is the
mlrunsdirectory itself. - Tracking Server: The service that hosts the MLflow UI and serves the MLflow APIs.
Exploring the Default Stores
You can inspect the contents of the default backend store using SQL queries.
- Using SQLite and Pandas:
import sqlite3 import pandas as pd # Connect to the MLflow backend database (default is mlflow.db) # Note: This database is created when you first log data. # If it doesn't exist, you might need to run a script first. connection = sqlite3.connect("mlruns/mlruns.db") # Adjust path if needed # Get all table names tables_df = pd.read_sql_query("SELECT name FROM sqlite_master WHERE type='table';", connection) print("Tables in the database:") print(tables_df) # Explore the 'runs' table runs_df = pd.read_sql_query("SELECT * FROM runs;", connection) print("nFirst 5 rows of the runs table:") print(runs_df.head()) connection.close()
Best Practices for Production
Storing artifacts and metadata locally is suitable for development but not for production environments.
- Remote Artifact Storage: Use cloud object storage like AWS S3, Google Cloud Storage, or Azure Blob Storage for your artifacts.
- Remote Backend Store: Use a robust database like PostgreSQL, MySQL, or a managed database service for your backend store.
Configuring MLflow to use remote stores involves setting environment variables or using MLflow configuration files, which is a more advanced topic.
Step 5: Comprehensive Logging with MLflow
MLflow allows you to log various types of information to track your experiments thoroughly.
Logging Parameters
As demonstrated earlier, you can log individual parameters or entire dictionaries.
- Individual Parameters:
mlflow.log_param("key", "value") - Dictionary of Parameters:
mlflow.log_params({"key1": "value1", "key2": "value2"})
Logging Metrics
Metrics are typically numerical values that change over time or represent performance indicators (e.g., accuracy, loss). Metrics can be logged at different steps during training.
- Log a Single Metric:
mlflow.log_metric("accuracy", 0.95) - Log Metrics with Stepping: Useful for plotting training progress.
mlflow.log_metric("loss", 0.1, step=10)logs the metric at step 10. - Log a Dictionary of Metrics: Similar to parameters,
mlflow.log_metrics({"precision": 0.92, "recall": 0.97}).
Logging Artifacts
Artifacts are files generated by your run. This includes:
- Trained model files (e.g.,
.pkl,.h5,.pt) - Data files (e.g., processed datasets)
- Images and plots (e.g., confusion matrices, training curves)
- Configuration files
requirements.txtor environment files
Use mlflow.log_artifact("path/to/your/file") or mlflow.log_artifacts("path/to/directory").
Example: Comprehensive Logging Script
Here’s an example script demonstrating logging parameters, metrics, and artifacts.
import mlflow
import random
import os
# Set the experiment name
mlflow.set_experiment("YouTube Tutorial")
# Start a run
with mlflow.start_run(run_name="Comprehensive Logging Demo") as run:
run_id = run.info.run_id
print(f"Started run: {run_id}")
# Log parameters
params = {
"learning_rate": 0.001,
"epochs": 50,
"batch_size": 64,
"optimizer": "adam"
}
mlflow.log_params(params)
# Log metrics (simulating training)
for step in range(10):
accuracy = 0.7 + (step * 0.02) + random.uniform(-0.01, 0.01)
loss = 0.5 - (step * 0.03) + random.uniform(-0.01, 0.01)
mlflow.log_metric("accuracy", accuracy, step=step)
mlflow.log_metric("loss", loss, step=step)
print(f"Step {step}: Accuracy={accuracy:.4f}, Loss={loss:.4f}")
# Log an artifact (a dummy model file)
model_filename = "model.pkl"
with open(model_filename, "w") as f:
f.write("Dummy model content")
mlflow.log_artifact(model_filename)
os.remove(model_filename) # Clean up dummy file
# Log another artifact (a dummy plot image)
plot_filename = "accuracy_plot.png"
with open(plot_filename, "w") as f:
f.write("Dummy plot content")
mlflow.log_artifact(plot_filename)
os.remove(plot_filename) # Clean up dummy file
print(f"Finished run: {run_id}")
print("Comprehensive logging demo complete.")
Viewing Comprehensive Logs
After running the script, check the MLflow UI. You will see:
- The “YouTube Tutorial” experiment.
- The “Comprehensive Logging Demo” run.
- Tabs for “Parameters”, “Metrics” (showing plots of accuracy and loss over steps), and “Artifacts” (listing
model.pklandaccuracy_plot.png).
This structured approach ensures that all critical information from your ML experiments is captured, making them reproducible, auditable, and manageable.
Source: Learn MLOps with MLflow and Databricks – Full Course for Machine Learning Engineers (YouTube)