Enhancements in PyCaret 3.0: Key Features Unveiled
Written on
Introduction
PyCaret is a low-code, open-source machine learning library in Python designed to streamline machine learning processes. It serves as a comprehensive tool for managing models and automating workflows, significantly accelerating the experimental cycle and enhancing productivity.
Unlike many other open-source libraries, PyCaret offers a low-code approach that can condense extensive lines of code into just a few, allowing for faster and more efficient experimentation. This Python package integrates various libraries and frameworks, simplifying the machine learning process.
The design of PyCaret is influenced by the growing presence of citizen data scientists—individuals who analyze data and derive insights without formal training in data science or statistics.
1. Time Series Module
The launch of PyCaret 3.0 marks the stabilization of the Time Series module, now available for users. This module is specifically tailored for time series analysis, which involves data collected over intervals.
The Time Series module in PyCaret 3.0 excels at forecasting tasks, providing an easy-to-use interface that allows users of all skill levels to perform forecasting operations with minimal coding.
Future enhancements will enable the module to support time-series anomaly detection—identifying unusual patterns in time-series data—and time-series clustering, which groups similar data points based on their time-series behavior.
# Load dataset
from pycaret.datasets import get_data
data = get_data('airline')
# Initialize setup
from pycaret.time_series import *
s = setup(data, fh=12, session_id=123)
# Compare models
best = compare_models()
# Forecast plot
plot_model(best, plot='forecast')
# Forecast plot 36 days into the future
plot_model(best, plot='forecast', data_kwargs={'fh': 36})
2. Object-Oriented API
PyCaret has established its value in the data science community. However, it does not utilize conventional object-oriented programming practices commonly adopted by Python developers. Consequently, we revisited some foundational design elements from the initial PyCaret 1.0 release.
This transition to a new object-oriented API will demand significant effort but is essential for aligning with Python's best programming practices, ensuring PyCaret remains a dependable tool for data scientists.
This modification will enhance accessibility for a broader audience, facilitating smoother integration with other Python libraries and frameworks, ultimately enabling more efficient data science workflows.
# Functional API (Existing)
# Load dataset
from pycaret.datasets import get_data
data = get_data('juice')
# Initialize setup
from pycaret.classification import *
s = setup(data, target='Purchase', session_id=123)
# Compare models
best = compare_models()
Conducting experiments within the same notebook is convenient, but differing setup parameters can lead to complications, as previous configurations may be overwritten.
With the new object-oriented API, users can easily manage multiple experiments in a single notebook without conflicts, as parameters are tied to specific objects linked to various modeling and preprocessing choices.
# Load dataset
from pycaret.datasets import get_data
data = get_data('juice')
# Setup experiment 1
from pycaret.classification import ClassificationExperiment
exp1 = ClassificationExperiment()
exp1.setup(data, target='Purchase', session_id=123)
# Compare models for experiment 1
best = exp1.compare_models()
# Setup experiment 2
exp2 = ClassificationExperiment()
exp2.setup(data, target='Purchase', normalize=True, session_id=123)
# Compare models for experiment 2
best2 = exp2.compare_models()
After completing experiments, the get_leaderboard function can be used to generate leaderboards for each experiment, facilitating easier comparisons.
import pandas as pd
# Generate leaderboard
leaderboard_exp1 = exp1.get_leaderboard()
leaderboard_exp2 = exp2.get_leaderboard()
lb = pd.concat([leaderboard_exp1, leaderboard_exp2])
# Print pipeline steps
print(exp1.pipeline.steps)
print(exp2.pipeline.steps)
3. Experiment Logging
In PyCaret 2, experiment logging with MLflow was automated and the default method. However, PyCaret 3 introduces an expanded array of logging options. The new version now supports wandb, cometml, and dagshub alongside MLflow.
Switching from the default MLflow logger to any of the newly introduced options is easy. Simply specify the desired logging choice as a parameter in the log_experiment function. Available options include mlflow, wandb, cometml, and dagshub.
This upgrade in logging capabilities enhances user flexibility in tracking and managing machine learning experiments, allowing data scientists to choose the tools that best fit their requirements.
Liked the blog? Connect with Moez Ali
Moez Ali is a forward-thinking innovator and technologist. Transitioning from data scientist to product manager, he is committed to developing cutting-edge data products and fostering vibrant open-source communities.
As the creator of PyCaret, he has authored over 100 publications with more than 500 citations and is recognized globally for his contributions to open-source projects in Python.
Let’s be friends! Connect with me:
- Medium
- YouTube
Check out my personal website: https://www.moez.ai.
To learn more about my open-source endeavors, explore the PyCaret GitHub repository or follow PyCaret’s official LinkedIn page.
Listen to my talk on Time Series Forecasting with PyCaret at the DATA+AI SUMMIT 2022 by Databricks.
My Most Read Articles:
Machine Learning in Power BI using PyCaret
A step-by-step tutorial for implementing machine learning in Power BI within minutes
[towardsdatascience.com](https://towardsdatascience.com)
Announcing PyCaret 2.0
An open-source low-code machine learning library in Python
[towardsdatascience.com](https://towardsdatascience.com)
Time Series Forecasting with PyCaret Regression Module
A step-by-step tutorial for time-series forecasting using PyCaret
[towardsdatascience.com](https://towardsdatascience.com)
Multiple Time Series Forecasting with PyCaret
A step-by-step tutorial on forecasting multiple time series using PyCaret
[towardsdatascience.com](https://towardsdatascience.com)
Time Series Anomaly Detection with PyCaret
A step-by-step tutorial on unsupervised anomaly detection for time series data using PyCaret
[towardsdatascience.co](https://towardsdatascience.co)
Subscribe to DDIntel Here. Visit our website here: https://www.datadriveninvestor.com Join our network here: https://datadriveninvestor.com/collaborate