pdpbox.pdp.PDPIsolate

class pdpbox.pdp.PDPIsolate(model, df, model_features, feature, feature_name, pred_func=None, n_classes=None, memory_limit=0.5, chunk_size=-1, n_jobs=1, predict_kwds=None, data_transformer=None, cust_grid_points=None, grid_type='percentile', num_grid_points=10, percentile_range=None, grid_range=None)

Performs Partial Dependence Plot (PDP) analysis on a single feature.

Attributes:
modelobject

A trained model object. The model should have a predict or predict_proba method. Otherwise a custom prediction function should be provided through pred_func.

n_classesint

Number of classes. If it is None, will infer from model.n_classes_. Please set it as 0 for regression.

pred_funccallable

A custom prediction function. If not provided, predict or predict_proba method of model is used to generate the predictions.

model_featureslist of str

A list of features used in model prediction.

memory_limitfloat

The maximum proportion of memory that can be used by the calculation process.

chunk_sizeint

The number of samples to predict at each iteration. -1 means all samples at once.

n_jobsint

The number of jobs to run in parallel for computation. If set to -1, all CPUs are used.

predict_kwdsdict

Additional keyword arguments to pass to the model’s predict function.

data_transformercallable

A function to transform the input data before prediction.

dist_num_samplesint

The number of samples to use for estimating the distribution of the data. This is used to handle large datasets by sampling a smaller subset for efficiency.

plot_typestr

The type of the plot to be generated.

feature_infoFeatureInfo

An instance of the FeatureInfo class.

count_dfpd.DataFrame

A DataFrame that contains the count as well as the normalized count (percentage) of samples within each feature bucket.

n_gridsint

The number of feature grids. For interact plot, it is the product of n_grids of two features.

dist_dfpandas.Series

The distribution of the data points.

from_modelbool

A flag indicating if the prediction function was obtained from the model or was provided as input.

targetlist of int

List of target indices. For binary and regression problems, the list will be just [0]. For multi-class targets, the list is the class indices.

resultslist of PDResults

The results of the Partial Dependence Plot (PDP) analysis. For binary and regression problems, the list will contain a single PDResults object. For multi-class targets, the list will contain a PDResults object for each class.

Methods

plot(**kwargs)

Generates the PDP plot.

plot(center=True, plot_lines=False, frac_to_plot=1, cluster=False, n_cluster_centers=None, cluster_method='accurate', plot_pts_dist=False, to_bins=False, show_percentile=False, which_classes=None, figsize=None, dpi=300, ncols=2, plot_params=None, engine='plotly', template='plotly_white')

Generates the Partial Dependence Plot (PDP).

Parameters:
centerbool, optional

If True, the PDP will be centered by deducting the values of grids[0]. Default is True.

plot_linesbool, optional

If True, ICE lines will be plotted. Default is False.

frac_to_plotint or float, optional

Fraction of ICE lines to plot. Default is 1.

clusterbool, optional

If True, ICE lines will be clustered. Default is False.

n_cluster_centersint or None, optional

Number of cluster centers. Need to provide when cluster is True. Default is None.

cluster_method{‘accurate’, ‘approx’}, optional

Method for clustering. If ‘accurate’, use KMeans. If ‘approx’, use MiniBatchKMeans. Default is accurate.

plot_pts_distbool, optional

If True, distribution of points will be plotted. Default is False.

to_binsbool, optional

If True, the axis will be converted to bins. Only applicable for numeric feature. Default is False.

show_percentilebool, optional

If True, percentiles are shown in the plot. Default is False.

which_classeslist of int, optional

List of class indices to plot. If None, all classes will be plotted. Default is None.

figsizetuple or None, optional

The figure size for matplotlib or plotly figure. If None, the default figure size is used. Default is None.

dpiint, optional

The resolution of the plot, measured in dots per inch. Only applicable when engine is ‘matplotlib’. Default is 300.

ncolsint, optional

The number of columns of subplots in the figure. Default is 2.

plot_paramsdict or None, optional

Custom plot parameters that control the style and aesthetics of the plot. Default is None.

engine{‘matplotlib’, ‘plotly’}, optional

The plotting engine to use. Default is plotly.

templatestr, optional

The template to use for plotly plots. Only applicable when engine is ‘plotly’. Reference: https://plotly.com/python/templates/ Default is plotly_white.

Returns:
matplotlib.figure.Figure or plotly.graph_objects.Figure

A Matplotlib or Plotly figure object depending on the plot engine being used.

dict of matplotlib.axes.Axes or None

A dictionary of Matplotlib axes objects. The keys are the names of the axes. The values are the axes objects. If engine is ‘ploltly’, it is None.