pdpbox.pdp.pdp_isolate¶
-
pdpbox.pdp.
pdp_isolate
(model, dataset, model_features, feature, num_grid_points=10, grid_type='percentile', percentile_range=None, grid_range=None, cust_grid_points=None, memory_limit=0.5, n_jobs=1, predict_kwds={}, data_transformer=None)¶ Calculate PDP isolation plot
Parameters: - model: a fitted sklearn model
- dataset: pandas DataFrame
data set on which the model is trained
- model_features: list or 1-d array
list of model features
- feature: string or list
feature or feature list to investigate, for one-hot encoding features, feature list is required
- num_grid_points: integer, optional, default=10
number of grid points for numeric feature
- grid_type: string, optional, default=’percentile’
‘percentile’ or ‘equal’, type of grid points for numeric feature
- percentile_range: tuple or None, optional, default=None
percentile range to investigate, for numeric feature when grid_type=’percentile’
- grid_range: tuple or None, optional, default=None
value range to investigate, for numeric feature when grid_type=’equal’
- cust_grid_points: Series, 1d-array, list or None, optional, default=None
customized list of grid points for numeric feature
- memory_limit: float, (0, 1)
fraction of memory to use
- n_jobs: integer, default=1
number of jobs to run in parallel. make sure n_jobs=1 when you are using XGBoost model. check: 1. https://pythonhosted.org/joblib/parallel.html#bad-interaction-of-multiprocessing-and-third-party-libraries 2. https://github.com/scikit-learn/scikit-learn/issues/6627
- predict_kwds: dict, optional, default={}
keywords to be passed to the model’s predict function
- data_transformer: function or None, optional, default=None
function to transform the data set as some features changing values
Returns: - pdp_isolate_out: instance of PDPIsolate