pdpbox.info_plots.actual_plot_interact

pdpbox.info_plots.actual_plot_interact(model, X, features, feature_names, num_grid_points=None, grid_types=None, percentile_ranges=None, grid_ranges=None, cust_grid_points=None, show_percentile=False, show_outliers=False, endpoint=True, which_classes=None, predict_kwds={}, ncols=2, figsize=None, annotate=False, plot_params=None)

Plot prediction distribution across different feature value combinations (feature grid combinations)

Parameters:
model: a fitted sklearn model
X: pandas DataFrame

data set to investigate on, should contain at least the feature to investigate as well as the target

features: list

two features to investigate

feature_names: list

feature names

num_grid_points: list, optional, default=None

number of grid points for each feature

grid_types: list, optional, default=None

type of grid points for each feature

percentile_ranges: list of tuple, optional, default=None

percentile range to investigate for each feature

grid_ranges: list of tuple, optional, default=None

value range to investigate for each feature

cust_grid_points: list of (Series, 1d-array, list), optional, default=None

customized list of grid points for each feature

show_percentile: bool, optional, default=False

whether to display the percentile buckets for both feature

show_outliers: bool, optional, default=False

whether to display the out of range buckets for both features

endpoint: bool, optional

If True, stop is the last grid point, default=True Otherwise, it is not included

which_classes: list, optional, default=None

which classes to plot, only use when it is a multi-class problem

predict_kwds: dict, default={}

keywords to be passed to the model’s predict function

figsize: tuple or None, optional, default=None

size of the figure, (width, height)

ncols: integer, optional, default=2

number subplot columns, used when it is multi-class problem

annotate: bool, default=False

whether to annotate the points

plot_params: dict or None, optional, default=None

parameters for the plot

Returns:
fig: matplotlib Figure
axes: a dictionary of matplotlib Axes

Returns the Axes objects for further tweaking

summary_df: pandas DataFrame

Graph data in data frame format

Notes

  • Parameters are consistent with the ones for function actual_plot
  • But for this function, you need to specify parameter value for both features in list format
  • For example:
    • percentile_ranges = [(0, 90), (5, 95)] means
    • percentile_range = (0, 90) for feature 1
    • percentile_range = (5, 95) for feature 2

Examples

Quick start with actual_plot_interact

from pdpbox import info_plots, get_dataset

test_titanic = get_dataset.titanic()
titanic_data = test_titanic['data']
titanic_features = test_titanic['features']
titanic_target = test_titanic['target']
titanic_model = test_titanic['xgb_model']

fig, axes, summary_df = info_plots.actual_plot_interact(
    model=titanic_model, X=titanic_data[titanic_features],
    features=['Fare', ['Embarked_C', 'Embarked_Q', 'Embarked_S']],
    feature_names=['Fare', 'Embarked'])