pdpbox.pdp.PDPIsolate¶
- class pdpbox.pdp.PDPIsolate(model, df, model_features, feature, feature_name, pred_func=None, n_classes=None, memory_limit=0.5, chunk_size=-1, n_jobs=1, predict_kwds=None, data_transformer=None, cust_grid_points=None, grid_type='percentile', num_grid_points=10, percentile_range=None, grid_range=None)¶
Performs Partial Dependence Plot (PDP) analysis on a single feature.
- Attributes:
- modelobject
A trained model object. The model should have a predict or predict_proba method. Otherwise a custom prediction function should be provided through pred_func.
- n_classesint
Number of classes. If it is None, will infer from model.n_classes_. Please set it as 0 for regression.
- pred_funccallable
A custom prediction function. If not provided, predict or predict_proba method of model is used to generate the predictions.
- model_featureslist of str
A list of features used in model prediction.
- memory_limitfloat
The maximum proportion of memory that can be used by the calculation process.
- chunk_sizeint
The number of samples to predict at each iteration. -1 means all samples at once.
- n_jobsint
The number of jobs to run in parallel for computation. If set to -1, all CPUs are used.
- predict_kwdsdict
Additional keyword arguments to pass to the model’s predict function.
- data_transformercallable
A function to transform the input data before prediction.
- dist_num_samplesint
The number of samples to use for estimating the distribution of the data. This is used to handle large datasets by sampling a smaller subset for efficiency.
- plot_typestr
The type of the plot to be generated.
- feature_info
FeatureInfo
An instance of the FeatureInfo class.
- count_dfpd.DataFrame
A DataFrame that contains the count as well as the normalized count (percentage) of samples within each feature bucket.
- n_gridsint
The number of feature grids. For interact plot, it is the product of n_grids of two features.
- dist_dfpandas.Series
The distribution of the data points.
- from_modelbool
A flag indicating if the prediction function was obtained from the model or was provided as input.
- targetlist of int
List of target indices. For binary and regression problems, the list will be just [0]. For multi-class targets, the list is the class indices.
- resultslist of
PDResults
The results of the Partial Dependence Plot (PDP) analysis. For binary and regression problems, the list will contain a single PDResults object. For multi-class targets, the list will contain a PDResults object for each class.
Methods
plot(**kwargs)
Generates the PDP plot.
- plot(center=True, plot_lines=False, frac_to_plot=1, cluster=False, n_cluster_centers=None, cluster_method='accurate', plot_pts_dist=False, to_bins=False, show_percentile=False, which_classes=None, figsize=None, dpi=300, ncols=2, plot_params=None, engine='plotly', template='plotly_white')¶
Generates the Partial Dependence Plot (PDP).
- Parameters:
- centerbool, optional
If True, the PDP will be centered by deducting the values of grids[0]. Default is True.
- plot_linesbool, optional
If True, ICE lines will be plotted. Default is False.
- frac_to_plotint or float, optional
Fraction of ICE lines to plot. Default is 1.
- clusterbool, optional
If True, ICE lines will be clustered. Default is False.
- n_cluster_centersint or None, optional
Number of cluster centers. Need to provide when cluster is True. Default is None.
- cluster_method{‘accurate’, ‘approx’}, optional
Method for clustering. If ‘accurate’, use KMeans. If ‘approx’, use MiniBatchKMeans. Default is accurate.
- plot_pts_distbool, optional
If True, distribution of points will be plotted. Default is False.
- to_binsbool, optional
If True, the axis will be converted to bins. Only applicable for numeric feature. Default is False.
- show_percentilebool, optional
If True, percentiles are shown in the plot. Default is False.
- which_classeslist of int, optional
List of class indices to plot. If None, all classes will be plotted. Default is None.
- figsizetuple or None, optional
The figure size for matplotlib or plotly figure. If None, the default figure size is used. Default is None.
- dpiint, optional
The resolution of the plot, measured in dots per inch. Only applicable when engine is ‘matplotlib’. Default is 300.
- ncolsint, optional
The number of columns of subplots in the figure. Default is 2.
- plot_paramsdict or None, optional
Custom plot parameters that control the style and aesthetics of the plot. Default is None.
- engine{‘matplotlib’, ‘plotly’}, optional
The plotting engine to use. Default is plotly.
- templatestr, optional
The template to use for plotly plots. Only applicable when engine is ‘plotly’. Reference: https://plotly.com/python/templates/ Default is plotly_white.
- Returns:
- matplotlib.figure.Figure or plotly.graph_objects.Figure
A Matplotlib or Plotly figure object depending on the plot engine being used.
- dict of matplotlib.axes.Axes or None
A dictionary of Matplotlib axes objects. The keys are the names of the axes. The values are the axes objects. If engine is ‘ploltly’, it is None.