pdpbox.info_plots.target_plot_interact¶
-
pdpbox.info_plots.
target_plot_interact
(df, features, feature_names, target, num_grid_points=None, grid_types=None, percentile_ranges=None, grid_ranges=None, cust_grid_points=None, show_percentile=False, show_outliers=False, endpoint=True, figsize=None, ncols=2, annotate=False, plot_params=None)¶ Plot average target value across different feature value combinations (feature grid combinations)
Parameters: - df: pandas DataFrame
data set to investigate on, should contain at least the feature to investigate as well as the target
- features: list
two features to investigate
- feature_names: list
feature names
- target: string or list
column name or column name list for target value for multi-class problem, a list of one-hot encoding target column
- num_grid_points: list, optional, default=None
number of grid points for each feature
- grid_types: list, optional, default=None
type of grid points for each feature
- percentile_ranges: list of tuple, optional, default=None
percentile range to investigate for each feature
- grid_ranges: list of tuple, optional, default=None
value range to investigate for each feature
- cust_grid_points: list of (Series, 1d-array, list), optional, default=None
customized list of grid points for each feature
- show_percentile: bool, optional, default=False
whether to display the percentile buckets for both feature
- show_outliers: bool, optional, default=False
whether to display the out of range buckets for both features
- endpoint: bool, optional
If True, stop is the last grid point, default=True Otherwise, it is not included
- figsize: tuple or None, optional, default=None
size of the figure, (width, height)
- ncols: integer, optional, default=2
number subplot columns, used when it is multi-class problem
- annotate: bool, default=False
whether to annotate the points
- plot_params: dict or None, optional, default=None
parameters for the plot
Returns: - fig: matplotlib Figure
- axes: a dictionary of matplotlib Axes
Returns the Axes objects for further tweaking
- summary_df: pandas DataFrame
Graph data in data frame format
Notes
- Parameters are consistent with the ones for function target_plot
- But for this function, you need to specify parameter value for both features in list format
- For example:
- percentile_ranges = [(0, 90), (5, 95)] means
- percentile_range = (0, 90) for feature 1
- percentile_range = (5, 95) for feature 2
Examples
Quick start with target_plot_interact
from pdpbox import info_plots, get_dataset test_titanic = get_dataset.titanic() titanic_data = test_titanic['data'] titanic_target = test_titanic['target'] fig, axes, summary_df = info_plots.target_plot_interact( df=titanic_data, features=['Sex', ['Embarked_C', 'Embarked_Q', 'Embarked_S']], feature_names=['Sex', 'Embarked'], target=titanic_target)