pdpbox.info_plots.actual_plot

pdpbox.info_plots.actual_plot(model, X, feature, feature_name, num_grid_points=10, grid_type='percentile', percentile_range=None, grid_range=None, cust_grid_points=None, show_percentile=False, show_outliers=False, endpoint=True, which_classes=None, predict_kwds={}, ncols=2, figsize=None, plot_params=None)

Plot prediction distribution across different feature values (feature grid)

Parameters:
model: a fitted sklearn model
X: pandas DataFrame

data set on which the model is trained

feature: string or list

feature or feature list to investigate for one-hot encoding features, feature list is required

feature_name: string

name of the feature, not necessary a column name

num_grid_points: integer, optional, default=10

number of grid points for numeric feature

grid_type: string, optional, default=’percentile’

‘percentile’ or ‘equal’, type of grid points for numeric feature

percentile_range: tuple or None, optional, default=None

percentile range to investigate, for numeric feature when grid_type=’percentile’

grid_range: tuple or None, optional, default=None

value range to investigate, for numeric feature when grid_type=’equal’

cust_grid_points: Series, 1d-array, list or None, optional, default=None

customized list of grid points for numeric feature

show_percentile: bool, optional, default=False

whether to display the percentile buckets, for numeric feature when grid_type=’percentile’

show_outliers: bool, optional, default=False

whether to display the out of range buckets for numeric feature when percentile_range or grid_range is not None

endpoint: bool, optional

If True, stop is the last grid point, default=True Otherwise, it is not included

which_classes: list, optional, default=None

which classes to plot, only use when it is a multi-class problem

predict_kwds: dict, default={}

keywords to be passed to the model’s predict function

figsize: tuple or None, optional, default=None

size of the figure, (width, height)

ncols: integer, optional, default=2

number subplot columns, used when it is multi-class problem

plot_params: dict or None, optional, default=None

parameters for the plot

Returns:
fig: matplotlib Figure
axes: a dictionary of matplotlib Axes

Returns the Axes objects for further tweaking

summary_df: pandas DataFrame

Graph data in data frame format

Examples

Quick start with actual_plot

from pdpbox import info_plots, get_dataset

test_titanic = get_dataset.titanic()
titanic_data = test_titanic['data']
titanic_features = test_titanic['features']
titanic_target = test_titanic['target']
titanic_model = test_titanic['xgb_model']
fig, axes, summary_df = info_plots.actual_plot(
    model=titanic_model, X=titanic_data[titanic_features],
    feature='Sex', feature_name='Sex')

With One-hot encoding features

fig, axes, summary_df = info_plots.actual_plot(
    model=titanic_model, X=titanic_data[titanic_features],
    feature=['Embarked_C', 'Embarked_Q', 'Embarked_S'], feature_name='Embarked')

With numeric features

fig, axes, summary_df = info_plots.actual_plot(
    model=titanic_model, X=titanic_data[titanic_features],
    feature='Fare', feature_name='Fare')

With multi-class

from pdpbox import info_plots, get_dataset

test_otto = get_dataset.otto()
otto_data = test_otto['data']
otto_model = test_otto['rf_model']
otto_features = test_otto['features']
otto_target = test_otto['target']

fig, axes, summary_df = info_plots.actual_plot(
    model=otto_model, X=otto_data[otto_features],
    feature='feat_67', feature_name='feat_67', which_classes=[1, 2, 3])