pdpbox.info_plots.target_plot_interact

pdpbox.info_plots.target_plot_interact(df, features, feature_names, target, num_grid_points=None, grid_types=None, percentile_ranges=None, grid_ranges=None, cust_grid_points=None, show_percentile=False, show_outliers=False, endpoint=True, figsize=None, ncols=2, annotate=False, plot_params=None)

Plot average target value across different feature value combinations (feature grid combinations)

Parameters:
df: pandas DataFrame

data set to investigate on, should contain at least the feature to investigate as well as the target

features: list

two features to investigate

feature_names: list

feature names

target: string or list

column name or column name list for target value for multi-class problem, a list of one-hot encoding target column

num_grid_points: list, optional, default=None

number of grid points for each feature

grid_types: list, optional, default=None

type of grid points for each feature

percentile_ranges: list of tuple, optional, default=None

percentile range to investigate for each feature

grid_ranges: list of tuple, optional, default=None

value range to investigate for each feature

cust_grid_points: list of (Series, 1d-array, list), optional, default=None

customized list of grid points for each feature

show_percentile: bool, optional, default=False

whether to display the percentile buckets for both feature

show_outliers: bool, optional, default=False

whether to display the out of range buckets for both features

endpoint: bool, optional

If True, stop is the last grid point, default=True Otherwise, it is not included

figsize: tuple or None, optional, default=None

size of the figure, (width, height)

ncols: integer, optional, default=2

number subplot columns, used when it is multi-class problem

annotate: bool, default=False

whether to annotate the points

plot_params: dict or None, optional, default=None

parameters for the plot

Returns:
fig: matplotlib Figure
axes: a dictionary of matplotlib Axes

Returns the Axes objects for further tweaking

summary_df: pandas DataFrame

Graph data in data frame format

Notes

  • Parameters are consistent with the ones for function target_plot
  • But for this function, you need to specify parameter value for both features in list format
  • For example:
    • percentile_ranges = [(0, 90), (5, 95)] means
    • percentile_range = (0, 90) for feature 1
    • percentile_range = (5, 95) for feature 2

Examples

Quick start with target_plot_interact

from pdpbox import info_plots, get_dataset

test_titanic = get_dataset.titanic()
titanic_data = test_titanic['data']
titanic_target = test_titanic['target']

fig, axes, summary_df = info_plots.target_plot_interact(
    df=titanic_data, features=['Sex', ['Embarked_C', 'Embarked_Q', 'Embarked_S']],
    feature_names=['Sex', 'Embarked'], target=titanic_target)