pdpbox.pdp.pdp_plot

pdpbox.pdp.pdp_plot(pdp_isolate_out, feature_name, center=True, plot_pts_dist=False, plot_lines=False, frac_to_plot=1, cluster=False, n_cluster_centers=None, cluster_method='accurate', x_quantile=False, show_percentile=False, figsize=None, ncols=2, plot_params=None, which_classes=None)

Plot partial dependent plot

Parameters:
pdp_isolate_out: (list of) instance of PDPIsolate

for multi-class, it is a list

feature_name: string

name of the feature, not necessary a column name

center: bool, default=True

whether to center the plot

plot_pts_dist: bool, default=False

whether to show data points distribution

plot_lines: bool, default=False

whether to plot out the individual lines

frac_to_plot: float or integer, default=1

how many lines to plot, can be a integer or a float

cluster: bool, default=False

whether to cluster the individual lines and only plot out the cluster centers

n_cluster_centers: integer, default=None

number of cluster centers

cluster_method: string, default=’accurate’

cluster method to use, default is KMeans, if ‘approx’ is passed, MiniBatchKMeans is used

x_quantile: bool, default=False

whether to construct x axis ticks using quantiles

show_percentile: bool, optional, default=False

whether to display the percentile buckets, for numeric feature when grid_type=’percentile’

figsize: tuple or None, optional, default=None

size of the figure, (width, height)

ncols: integer, optional, default=2

number subplot columns, used when it is multi-class problem

plot_params: dict or None, optional, default=None

parameters for the plot, possible parameters as well as default as below:

plot_params = {
    # plot title and subtitle
    'title': 'PDP for feature "%s"' % feature_name,
    'subtitle': "Number of unique grid points: %d" % n_grids,
    'title_fontsize': 15,
    'subtitle_fontsize': 12,
    'font_family': 'Arial',
    # matplotlib color map for ICE lines
    'line_cmap': 'Blues',
    'xticks_rotation': 0,
    # pdp line color, highlight color and line width
    'pdp_color': '#1A4E5D',
    'pdp_hl_color': '#FEDC00',
    'pdp_linewidth': 1.5,
    # horizon zero line color and with
    'zero_color': '#E75438',
    'zero_linewidth': 1,
    # pdp std fill color and alpha
    'fill_color': '#66C2D7',
    'fill_alpha': 0.2,
    # marker size for pdp line
    'markersize': 3.5,
}
which_classes: list, optional, default=None

which classes to plot, only use when it is a multi-class problem

Returns:
fig: matplotlib Figure
axes: a dictionary of matplotlib Axes

Returns the Axes objects for further tweaking

Examples

Quick start with pdp_plot

from pdpbox import pdp, get_dataset

test_titanic = get_dataset.titanic()
titanic_data = test_titanic['data']
titanic_target = test_titanic['target']
titanic_features = test_titanic['features']
titanic_model = test_titanic['xgb_model']

pdp_sex = pdp.pdp_isolate(model=titanic_model,
                          dataset=titanic_data,
                          model_features=titanic_features,
                          feature='Sex')
fig, axes = pdp.pdp_plot(pdp_isolate_out=pdp_sex, feature_name='sex')

With One-hot encoding features

pdp_embark = pdp.pdp_isolate(model=titanic_model, dataset=titanic_data,
                             model_features=titanic_features,
                             feature=['Embarked_C', 'Embarked_S', 'Embarked_Q'])
fig, axes = pdp.pdp_plot(pdp_isolate_out=pdp_embark,
                         feature_name='Embark',
                         center=True,
                         plot_lines=True,
                         frac_to_plot=100,
                         plot_pts_dist=True)

With numeric features

pdp_fare = pdp.pdp_isolate(model=titanic_model,
                           dataset=titanic_data,
                           model_features=titanic_features,
                           feature='Fare')
fig, axes = pdp.pdp_plot(pdp_isolate_out=pdp_fare,
                         feature_name='Fare',
                         plot_pts_dist=True)

With multi-class

from pdpbox import pdp, get_dataset

test_otto = get_dataset.otto()
otto_data = test_otto['data']
otto_features = test_otto['features']
otto_model = test_otto['rf_model']
otto_target = test_otto['target']

pdp_feat_67_rf = pdp.pdp_isolate(model=otto_model,
                                 dataset=otto_data,
                                 model_features=otto_features,
                                 feature='feat_67')
fig, axes = pdp.pdp_plot(pdp_isolate_out=pdp_feat_67_rf,
                         feature_name='feat_67',
                         center=True,
                         x_quantile=True,
                         ncols=3,
                         plot_lines=True,
                         frac_to_plot=100)