pdpbox.pdp.pdp_plot¶
-
pdpbox.pdp.
pdp_plot
(pdp_isolate_out, feature_name, center=True, plot_pts_dist=False, plot_lines=False, frac_to_plot=1, cluster=False, n_cluster_centers=None, cluster_method='accurate', x_quantile=False, show_percentile=False, figsize=None, ncols=2, plot_params=None, which_classes=None)¶ Plot partial dependent plot
Parameters: - pdp_isolate_out: (list of) instance of PDPIsolate
for multi-class, it is a list
- feature_name: string
name of the feature, not necessary a column name
- center: bool, default=True
whether to center the plot
- plot_pts_dist: bool, default=False
whether to show data points distribution
- plot_lines: bool, default=False
whether to plot out the individual lines
- frac_to_plot: float or integer, default=1
how many lines to plot, can be a integer or a float
- cluster: bool, default=False
whether to cluster the individual lines and only plot out the cluster centers
- n_cluster_centers: integer, default=None
number of cluster centers
- cluster_method: string, default=’accurate’
cluster method to use, default is KMeans, if ‘approx’ is passed, MiniBatchKMeans is used
- x_quantile: bool, default=False
whether to construct x axis ticks using quantiles
- show_percentile: bool, optional, default=False
whether to display the percentile buckets, for numeric feature when grid_type=’percentile’
- figsize: tuple or None, optional, default=None
size of the figure, (width, height)
- ncols: integer, optional, default=2
number subplot columns, used when it is multi-class problem
- plot_params: dict or None, optional, default=None
parameters for the plot, possible parameters as well as default as below:
plot_params = { # plot title and subtitle 'title': 'PDP for feature "%s"' % feature_name, 'subtitle': "Number of unique grid points: %d" % n_grids, 'title_fontsize': 15, 'subtitle_fontsize': 12, 'font_family': 'Arial', # matplotlib color map for ICE lines 'line_cmap': 'Blues', 'xticks_rotation': 0, # pdp line color, highlight color and line width 'pdp_color': '#1A4E5D', 'pdp_hl_color': '#FEDC00', 'pdp_linewidth': 1.5, # horizon zero line color and with 'zero_color': '#E75438', 'zero_linewidth': 1, # pdp std fill color and alpha 'fill_color': '#66C2D7', 'fill_alpha': 0.2, # marker size for pdp line 'markersize': 3.5, }
- which_classes: list, optional, default=None
which classes to plot, only use when it is a multi-class problem
Returns: - fig: matplotlib Figure
- axes: a dictionary of matplotlib Axes
Returns the Axes objects for further tweaking
Examples
Quick start with pdp_plot
from pdpbox import pdp, get_dataset test_titanic = get_dataset.titanic() titanic_data = test_titanic['data'] titanic_target = test_titanic['target'] titanic_features = test_titanic['features'] titanic_model = test_titanic['xgb_model'] pdp_sex = pdp.pdp_isolate(model=titanic_model, dataset=titanic_data, model_features=titanic_features, feature='Sex') fig, axes = pdp.pdp_plot(pdp_isolate_out=pdp_sex, feature_name='sex')
With One-hot encoding features
pdp_embark = pdp.pdp_isolate(model=titanic_model, dataset=titanic_data, model_features=titanic_features, feature=['Embarked_C', 'Embarked_S', 'Embarked_Q']) fig, axes = pdp.pdp_plot(pdp_isolate_out=pdp_embark, feature_name='Embark', center=True, plot_lines=True, frac_to_plot=100, plot_pts_dist=True)
With numeric features
pdp_fare = pdp.pdp_isolate(model=titanic_model, dataset=titanic_data, model_features=titanic_features, feature='Fare') fig, axes = pdp.pdp_plot(pdp_isolate_out=pdp_fare, feature_name='Fare', plot_pts_dist=True)
With multi-class
from pdpbox import pdp, get_dataset test_otto = get_dataset.otto() otto_data = test_otto['data'] otto_features = test_otto['features'] otto_model = test_otto['rf_model'] otto_target = test_otto['target'] pdp_feat_67_rf = pdp.pdp_isolate(model=otto_model, dataset=otto_data, model_features=otto_features, feature='feat_67') fig, axes = pdp.pdp_plot(pdp_isolate_out=pdp_feat_67_rf, feature_name='feat_67', center=True, x_quantile=True, ncols=3, plot_lines=True, frac_to_plot=100)