pdpbox.info_plots.target_plot

pdpbox.info_plots.target_plot(df, feature, feature_name, target, num_grid_points=10, grid_type='percentile', percentile_range=None, grid_range=None, cust_grid_points=None, show_percentile=False, show_outliers=False, endpoint=True, figsize=None, ncols=2, plot_params=None)

Plot average target value across different feature values (feature grids)

Parameters:
df: pandas DataFrame

data set to investigate on, should contain at least the feature to investigate as well as the target

feature: string or list

feature or feature list to investigate, for one-hot encoding features, feature list is required

feature_name: string

name of the feature, not necessary a column name

target: string or list

column name or column name list for target value for multi-class problem, a list of one-hot encoding target column

num_grid_points: integer, optional, default=10

number of grid points for numeric feature

grid_type: string, optional, default=’percentile’

‘percentile’ or ‘equal’ type of grid points for numeric feature

percentile_range: tuple or None, optional, default=None

percentile range to investigate for numeric feature when grid_type=’percentile’

grid_range: tuple or None, optional, default=None

value range to investigate for numeric feature when grid_type=’equal’

cust_grid_points: Series, 1d-array, list or None, optional, default=None

customized list of grid points for numeric feature

show_percentile: bool, optional, default=False

whether to display the percentile buckets for numeric feature when grid_type=’percentile’

show_outliers: bool, optional, default=False

whether to display the out of range buckets for numeric feature when percentile_range or grid_range is not None

endpoint: bool, optional, default=True

If True, stop is the last grid point Otherwise, it is not included

figsize: tuple or None, optional, default=None

size of the figure, (width, height)

ncols: integer, optional, default=2

number subplot columns, used when it is multi-class problem

plot_params: dict or None, optional, default=None

parameters for the plot

Returns:
fig: matplotlib Figure
axes: a dictionary of matplotlib Axes

Returns the Axes objects for further tweaking

summary_df: pandas DataFrame

Graph data in data frame format

Examples

Quick start with target_plot

from pdpbox import info_plots, get_dataset

test_titanic = get_dataset.titanic()
titanic_data = test_titanic['data']
titanic_target = test_titanic['target']
fig, axes, summary_df = info_plots.target_plot(
    df=titanic_data, feature='Sex', feature_name='Sex', target=titanic_target)

With One-hot encoding features

fig, axes, summary_df = info_plots.target_plot(
    df=titanic_data, feature=['Embarked_C', 'Embarked_Q', 'Embarked_S'],
    feature_name='Embarked', target=titanic_target)

With numeric features

fig, axes, summary_df = info_plots.target_plot(
    df=titanic_data, feature='Fare', feature_name='Fare',
    target=titanic_target, show_percentile=True)

With multi-class

from pdpbox import info_plots, get_dataset

test_otto = get_dataset.otto()
otto_data = test_otto['data']
otto_target = test_otto['target']
fig, axes, summary_df = info_plots.target_plot(
    df=otto_data, feature='feat_67', feature_name='feat_67',
    target=['target_0', 'target_2', 'target_5', 'target_8'])