References and Notes

References

[R1]

Friedman, J. (2001). Greedy Function Approximation: A Gradient Boosting Machine. The Annals of Statistics, 29(5):1189–1232. (https://statweb.stanford.edu/~jhf/ftp/trebst.pdf)

[R2]

Goldstein, A., Kapelner, A., Bleich, J., and Pitkin, E., Peeking Inside the Black Box: Visualizing Statistical Learning With Plots of Individual Conditional Expectation. (2015) Journal of Computational and Graphical Statistics, 24(1): 44-65 (https://arxiv.org/abs/1309.6392)

[R3] (1,2)

Christoph Molnar. (2018). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 5.1 Partial Dependence Plot (PDP) (https://christophm.github .io/interpretable-ml-book/pdp.html)

[R4] (1,2)

Christoph Molnar. (2018). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. 5.2 Individual Conditional Expectation (ICE) (https://christophm.github.io/interpretable-ml-book/ice.html)

Notes and Highlights

  • One assumption made for the PDP is that the features in \(X_{C}\) are uncorrelated with the features in \(X_{S}\). If this assumption is violated, the averages, which are computed for the partial dependence plot, incorporate data points that are very unlikely or even impossible.

    For example, it’s unreasonable to claim that height and weight is uncorrelated. If height is the feature to plot, only changing height through different values would create data points like someone is 2 meters but weighting below 50kg. Considering PDP is calculated by averaging through all data points, with these kind of unreasonable data points, the result might not be trustworthy. [R3]

    Note

    check data_transformer parameter in pdp_isolate and pdp_interact.


  • Some PD visualisations don’t include the feature distribution. Omitting the distribution can be misleading, because you might over-interpret the line in regions, with almost no feature values. [R3]

    Note

    check plot_pts_dist parameter in pdp_plot.


  • There is one issue with ICE plots: It can be hard to see if the individual conditional expectation curves differ between individuals, because they start at different \(\hat{f} (x)\). [R4]

    Note

    check center parameters in pdp_plot and pdp_interact_plot.


  • When many ICE curves are drawn the plot can become overcrowded and you don’t see anything any more. [R4]

    Note

    check frac_to_plot and cluster parameters in pdp_plot and pdp_interact_plot.