pdpbox.info_plots.target_plot¶
-
pdpbox.info_plots.
target_plot
(df, feature, feature_name, target, num_grid_points=10, grid_type='percentile', percentile_range=None, grid_range=None, cust_grid_points=None, show_percentile=False, show_outliers=False, endpoint=True, figsize=None, ncols=2, plot_params=None)¶ Plot average target value across different feature values (feature grids)
Parameters: - df: pandas DataFrame
data set to investigate on, should contain at least the feature to investigate as well as the target
- feature: string or list
feature or feature list to investigate, for one-hot encoding features, feature list is required
- feature_name: string
name of the feature, not necessary a column name
- target: string or list
column name or column name list for target value for multi-class problem, a list of one-hot encoding target column
- num_grid_points: integer, optional, default=10
number of grid points for numeric feature
- grid_type: string, optional, default=’percentile’
‘percentile’ or ‘equal’ type of grid points for numeric feature
- percentile_range: tuple or None, optional, default=None
percentile range to investigate for numeric feature when grid_type=’percentile’
- grid_range: tuple or None, optional, default=None
value range to investigate for numeric feature when grid_type=’equal’
- cust_grid_points: Series, 1d-array, list or None, optional, default=None
customized list of grid points for numeric feature
- show_percentile: bool, optional, default=False
whether to display the percentile buckets for numeric feature when grid_type=’percentile’
- show_outliers: bool, optional, default=False
whether to display the out of range buckets for numeric feature when percentile_range or grid_range is not None
- endpoint: bool, optional, default=True
If True, stop is the last grid point Otherwise, it is not included
- figsize: tuple or None, optional, default=None
size of the figure, (width, height)
- ncols: integer, optional, default=2
number subplot columns, used when it is multi-class problem
- plot_params: dict or None, optional, default=None
parameters for the plot
Returns: - fig: matplotlib Figure
- axes: a dictionary of matplotlib Axes
Returns the Axes objects for further tweaking
- summary_df: pandas DataFrame
Graph data in data frame format
Examples
Quick start with target_plot
from pdpbox import info_plots, get_dataset test_titanic = get_dataset.titanic() titanic_data = test_titanic['data'] titanic_target = test_titanic['target'] fig, axes, summary_df = info_plots.target_plot( df=titanic_data, feature='Sex', feature_name='Sex', target=titanic_target)
With One-hot encoding features
fig, axes, summary_df = info_plots.target_plot( df=titanic_data, feature=['Embarked_C', 'Embarked_Q', 'Embarked_S'], feature_name='Embarked', target=titanic_target)
With numeric features
fig, axes, summary_df = info_plots.target_plot( df=titanic_data, feature='Fare', feature_name='Fare', target=titanic_target, show_percentile=True)
With multi-class
from pdpbox import info_plots, get_dataset test_otto = get_dataset.otto() otto_data = test_otto['data'] otto_target = test_otto['target'] fig, axes, summary_df = info_plots.target_plot( df=otto_data, feature='feat_67', feature_name='feat_67', target=['target_0', 'target_2', 'target_5', 'target_8'])