pdpbox.pdp.pdp_isolate

pdpbox.pdp.pdp_isolate(model, dataset, model_features, feature, num_grid_points=10, grid_type='percentile', percentile_range=None, grid_range=None, cust_grid_points=None, memory_limit=0.5, n_jobs=1, predict_kwds={}, data_transformer=None)

Calculate PDP isolation plot

Parameters:
model: a fitted sklearn model
dataset: pandas DataFrame

data set on which the model is trained

model_features: list or 1-d array

list of model features

feature: string or list

feature or feature list to investigate, for one-hot encoding features, feature list is required

num_grid_points: integer, optional, default=10

number of grid points for numeric feature

grid_type: string, optional, default=’percentile’

‘percentile’ or ‘equal’, type of grid points for numeric feature

percentile_range: tuple or None, optional, default=None

percentile range to investigate, for numeric feature when grid_type=’percentile’

grid_range: tuple or None, optional, default=None

value range to investigate, for numeric feature when grid_type=’equal’

cust_grid_points: Series, 1d-array, list or None, optional, default=None

customized list of grid points for numeric feature

memory_limit: float, (0, 1)

fraction of memory to use

n_jobs: integer, default=1

number of jobs to run in parallel. make sure n_jobs=1 when you are using XGBoost model. check: 1. https://pythonhosted.org/joblib/parallel.html#bad-interaction-of-multiprocessing-and-third-party-libraries 2. https://github.com/scikit-learn/scikit-learn/issues/6627

predict_kwds: dict, optional, default={}

keywords to be passed to the model’s predict function

data_transformer: function or None, optional, default=None

function to transform the data set as some features changing values

Returns:
pdp_isolate_out: instance of PDPIsolate