Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ The library includes:
- *Permutation importances* (how much does the model metric deteriorate when you shuffle a feature?)
- *Partial dependence plots* (how does the model prediction change when you vary a single feature?
- *Shap interaction values* (decompose the shap value into a direct effect an interaction effects)
- For Random Forests and xgboost models: visualisation of individual decision trees
- For Random Forest, XGBoost, and LightGBM models: visualisation of individual decision trees
- Plus for classifiers: precision plots, confusion matrix, ROC AUC plot, PR AUC plot, etc
- For regression models: goodness-of-fit plots, residual plots, etc.

Expand Down
3 changes: 3 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@
- Fix XGBoost multiclass decision-path summary wording to display `prediction (logodds)` when explainer `model_output='logodds'`.
- Fix issue #256: add robust multiclass probability fallback for classifiers that expose `decision_function` but not `predict_proba` (e.g. `LinearSVC`), and use it consistently across kernel SHAP, prediction helpers, PDP, and permutation scorer paths.
- Prevent multiclass class-count mismatches when user-provided/broken `predict_proba` outputs do not match model class count by falling back to `decision_function`-based probabilities.
- Fix issue #118: add LightGBM decision-tree visualization support (dtreeviz) across explainer auto-detection, tree plotting, and decision-path rendering in dashboard tree tabs.
- Fix dtreeviz callback rendering on macOS by switching matplotlib to a non-interactive backend for off-main-thread tree rendering to prevent dashboard 500 errors.

### Tests
- Add regression tests for LightGBM with string categorical features covering dashboard initialization, `get_shap_row(...)`, unseen categorical values in `X_row`, and regression dashboard initialization.
Expand All @@ -22,6 +24,7 @@
- Add explainer-method unit tests for binary-like onehot detection, transformed feature-name deduping, inferred pipeline cats, and pipeline extraction warning text.
- Add regression tests for issue #256 covering multiclass `LinearSVC` with kernel SHAP, PDP, and permutation-importances flows using `decision_function` fallback.
- Add guard tests to confirm multiclass `predict_proba` models (logistic regression) keep working for PDP and permutation-importances paths.
- Add LightGBM tree-visualization regression tests (shadow trees, decision paths, plot_trees, and dtreeviz render contracts) in the boosting-model test suite.

### Improvements
- Add pipeline feature-name cleanup options: `strip_pipeline_prefix=True` and `feature_name_fn=...` for sklearn/imblearn pipeline transformed output columns.
Expand Down
10 changes: 5 additions & 5 deletions TODO.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,22 +12,23 @@
- [S][Hub][#146/#342] hub.to_yaml integrate_dashboard_yamls honors pickle_type and dumps integrated explainer artifacts.
- [M][Explainers][#294] align/explain multiclass logodds between Contributions Plot and Prediction Box (+ PDP highlight and XGBoost decision path wording alignment).
- [M][Explainers/Methods/Docs][#213] improve sklearn/imblearn pipeline support: feature-name cleanup (`strip_pipeline_prefix`, `feature_name_fn`), auto-detect onehot groups (`auto_detect_pipeline_cats`), accept binary-like scaled onehot columns in `cats`, preserve transformed index, add warnings/docs/tests.
- [M][Explainers/Methods/Tests/Docs][#256] improve multiclass LinearSVC support/docs with decision_function probability fallback and regression coverage for SHAP/PDP/permutation flows.
- [M][Explainers/Methods/Components/Tests][#118] add LightGBM tree visualization support (dtreeviz), including tree explainer wiring, dashboard tree tabs, and regression coverage.

**Now**
- [M][Explainers][#118] add LightGBM tree visualization support (dtreeviz).
- [M][Dashboard][#161] more flexible instantiate_component (no explainer needed for non-ExplainerComponents).

**Next**
- [M][Dashboard][#263/#161] more flexible instantiate_component (no explainer needed for non-ExplainerComponents).
- [M] add ExtraTrees and GradientBoostingClassifier to tree visualizers.

**Backlog: Explainers**
- [M] add plain language explanations for plots (in_words + UI toggle).
- [S] pass n_jobs to pdp_isolate.
- [M] add ExtraTrees and GradientBoostingClassifier to tree visualizers.
- [M][#118] add LightGBM tree visualization support (dtreeviz).

**Backlog: Dashboard**
- [S] make poweredby right-aligned.
- [M][#263/#161] more flexible instantiate_component (no explainer needed for non-ExplainerComponents).
- [M][#161] more flexible instantiate_component (no explainer needed for non-ExplainerComponents).
- [M] add TablePopout.
- [M][#247] add EDA-style feature histograms/bar charts/correlation graphs.
- [M/L] add cost calculator/optimizer for classifier models (confusion matrix weights, Youden J).
Expand All @@ -54,7 +55,6 @@
- [M] support SamplingExplainer, PartitionExplainer, PermutationExplainer, AdditiveExplainer.
- [M] support LimeTabularExplainer.
- [M] investigate method from https://arxiv.org/abs/2006.04750.
- [M][#256] improve multiclass LinearSVC support/docs (class-count mismatch with SHAP output).
- [M][#229] clarify/add support path for Poisson and Gamma regression explainers.

**Backlog: Plots**
Expand Down
2 changes: 1 addition & 1 deletion docs/source/deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ And you need to tell heroku how to start your server in ``Procfile``::
Graphviz buildpack
------------------

If you want to visualize individual trees inside your ``RandomForest`` or ``xgboost``
If you want to visualize individual trees inside your ``RandomForest``, ``xgboost`` or ``lightgbm``
model using the ``dtreeviz`` package you will
need to make sure that ``graphviz`` is installed on your ``heroku`` dyno by
adding the following buildstack (as well as the ``python`` buildpack):
Expand Down
24 changes: 12 additions & 12 deletions docs/source/explainers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -456,10 +456,10 @@ plot_residuals_vs_feature
DecisionTree Plots
------------------

There are additional mixin classes specifically for ``sklearn`` ``RandomForests``
and for xgboost models that define additional methods and plots to investigate and visualize
individual decision trees within the ensemblke. These
uses the ``dtreeviz`` library to visualize individual decision trees.
There are additional mixin classes specifically for ``sklearn`` ``RandomForests``,
``xgboost``, and ``lightgbm`` models that define additional methods and plots to
investigate and visualize individual decision trees within the ensemble. These
use the ``dtreeviz`` library to visualize individual decision trees.

You can get a pd.DataFrame summary of the path that a specific index row took
through a specific decision tree.
Expand All @@ -476,9 +476,9 @@ And for dtreeviz visualization of individual decision trees (svg format)::
explainer.decisiontree_file(tree_idx, index)
explainer.decisiontree_encoded(tree_idx, index)

These methods are part of the ``RandomForestExplainer`` and XGBExplainer`` mixin
classes that get automatically loaded when you pass either a RandomForest
or XGBoost model.
These methods are part of the ``RandomForestExplainer``, ``XGBExplainer``, and
``LGBMExplainer`` mixin classes that get automatically loaded when you pass a
RandomForest, XGBoost, or LightGBM model.


plot_trees
Expand Down Expand Up @@ -661,12 +661,12 @@ restrict candidate rows by feature values before selecting a random index::
.. automethod:: explainerdashboard.explainers.RegressionExplainer.random_index


RandomForest and XGBoost outputs
--------------------------------
RandomForest, XGBoost, and LightGBM outputs
-------------------------------------------

For RandomForest and XGBoost models mixin classes that visualize individual
decision trees will be loaded: ``RandomForestExplainer`` and ``XGBExplainer``
with the following additional methods::
For RandomForest, XGBoost, and LightGBM models mixin classes that visualize
individual decision trees will be loaded: ``RandomForestExplainer``,
``XGBExplainer``, and ``LGBMExplainer`` with the following additional methods::

decisiontree_df(tree_idx, index, pos_label=None)
decisiontree_summary_df(tree_idx, index, round=2, pos_label=None)
Expand Down
3 changes: 2 additions & 1 deletion docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,8 @@ with just two lines of code.

It allows you to investigate SHAP values, permutation importances,
interaction effects, partial dependence plots, all kinds of performance plots,
and even individual decision trees inside a random forest. With ``explainerdashboard`` any data
and even individual decision trees inside random forest, XGBoost, and LightGBM models.
With ``explainerdashboard`` any data
scientist can create an interactive explainable AI web app in minutes,
without having to know anything about web development or deployment.

Expand Down
12 changes: 10 additions & 2 deletions explainerdashboard/dashboard_components/decisiontree_components.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
from dash.exceptions import PreventUpdate
import dash_bootstrap_components as dbc

from ..explainers import RandomForestExplainer, XGBExplainer
from ..explainers import RandomForestExplainer, XGBExplainer, LGBMExplainer
from ..dashboard_methods import *
from .. import to_html

Expand Down Expand Up @@ -94,12 +94,20 @@ def __init__(
elif isinstance(self.explainer, XGBExplainer):
if self.description is None:
self.description = """
Shows the marginal contributions of each decision tree in an
Shows the marginal contributions of each decision tree in an
xgboost ensemble to the final prediction. This demonstrates that
an xgboost model is simply a sum of individual decision trees.
"""
if self.subtitle == "Displaying individual decision trees":
self.subtitle += " inside xgboost model"
elif isinstance(self.explainer, LGBMExplainer):
if self.description is None:
self.description = """
Shows the marginal contributions of each decision tree in a
LightGBM ensemble to the final prediction.
"""
if self.subtitle == "Displaying individual decision trees":
self.subtitle += " inside LightGBM model"
else:
if self.description is None:
self.description = ""
Expand Down
100 changes: 99 additions & 1 deletion explainerdashboard/explainer_methods.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@
"get_xgboost_path_df",
"get_xgboost_path_summary_df",
"get_xgboost_preds_df",
"get_lgbm_preds_df",
"get_multiclass_logodds_scores",
"get_xgboost_output_label",
"_ensure_numeric_predictions", # Internal helper for XGBoost 3.0+ compatibility
Expand Down Expand Up @@ -2165,7 +2166,14 @@ def node_pred_proba(node):
else:

def node_mean(node):
return decision_tree.tree_model.tree_.value[node.id].item()
try:
return decision_tree.tree_model.tree_.value[node.id].item()
except Exception:
node_samples = decision_tree.get_node_samples()
sample_idxs = node_samples.get(node.id, [])
if len(sample_idxs) == 0:
return np.nan
return float(np.asarray(decision_tree.y_train)[sample_idxs].mean())

for node in nodes:
if not node.isleaf():
Expand Down Expand Up @@ -2549,3 +2557,93 @@ def get_xgboost_preds_df(xgbmodel, X_row, pos_label=1):
0, "pred_proba"
]
return xgboost_preds_df


def get_lgbm_preds_df(lgbmodel, X_row, pos_label=1):
"""Returns cumulative per-tree predictions for a LightGBM model.

Args:
lgbmodel: fitted LightGBM sklearn-compatible model
(i.e. LGBMClassifier or LGBMRegressor)
X_row: a single row of data, e.g X_train.iloc[0]
pos_label: for classifier the label to be used as positive label
Defaults to 1.

Returns:
pd.DataFrame
"""
if safe_isinstance(lgbmodel, "lightgbm.sklearn.LGBMClassifier"):
is_classifier = True
n_classes = len(lgbmodel.classes_)
n_trees = lgbmodel.booster_.num_trees()
if n_classes > 2:
n_trees = int(n_trees / n_classes)
elif safe_isinstance(lgbmodel, "lightgbm.sklearn.LGBMRegressor"):
is_classifier = False
n_trees = lgbmodel.booster_.num_trees()
else:
raise ValueError("Pass either an LGBMClassifier or LGBMRegressor!")

if is_classifier:
if n_classes == 2:
if pos_label not in (0, 1):
raise ValueError("pos_label should be either 0 or 1!")

margins = []
for i in range(1, n_trees + 1):
margin_raw = lgbmodel.predict(X_row, raw_score=True, num_iteration=i)[0]
margin_raw = _ensure_numeric_predictions(margin_raw)
if isinstance(margin_raw, np.ndarray):
margin_raw = (
margin_raw.item()
if margin_raw.ndim == 0
else float(margin_raw[0])
)
margin = float(margin_raw)
margins.append(margin if pos_label == 1 else -margin)

pred_probas = (np.exp(margins) / (1 + np.exp(margins))).tolist()
base_score = 0.0
base_proba = 0.5
preds = margins
else:
if pos_label < 0 or pos_label >= n_classes:
raise ValueError(
f"pos_label={pos_label}, but should be >= 0 and <= {n_classes - 1}!"
)
margins = []
for i in range(1, n_trees + 1):
margin_raw = lgbmodel.predict(X_row, raw_score=True, num_iteration=i)[0]
margin_raw = _ensure_numeric_predictions(margin_raw)
margin = np.asarray(margin_raw, dtype=float)
margins.append(margin)

preds = [float(margin[pos_label]) for margin in margins]
pred_probas = [
float((np.exp(margin) / np.exp(margin).sum())[pos_label])
for margin in margins
]
base_score = 0.0
base_proba = 1.0 / n_classes
else:
preds = []
for i in range(1, n_trees + 1):
pred_raw = lgbmodel.predict(X_row, raw_score=True, num_iteration=i)[0]
pred_raw = _ensure_numeric_predictions(pred_raw)
if isinstance(pred_raw, np.ndarray):
pred_raw = pred_raw.item() if pred_raw.ndim == 0 else float(pred_raw[0])
preds.append(float(pred_raw))
base_score = 0.0

lgbm_preds_df = pd.DataFrame(
dict(tree=range(-1, n_trees), pred=[base_score] + preds)
)
lgbm_preds_df["pred_diff"] = lgbm_preds_df.pred.diff()
lgbm_preds_df.loc[0, "pred_diff"] = lgbm_preds_df.loc[0, "pred"]

if is_classifier:
lgbm_preds_df["pred_proba"] = [base_proba] + pred_probas
lgbm_preds_df["pred_proba_diff"] = lgbm_preds_df.pred_proba.diff()
lgbm_preds_df.loc[0, "pred_proba_diff"] = lgbm_preds_df.loc[0, "pred_proba"]

return lgbm_preds_df
7 changes: 5 additions & 2 deletions explainerdashboard/explainer_plots.py
Original file line number Diff line number Diff line change
Expand Up @@ -2930,6 +2930,7 @@ def plotly_xgboost_trees(
target="",
units="",
higher_is_better=True,
model_name="xgboost",
):
"""Generate a plot showing the prediction of every single tree inside an XGBoost model

Expand All @@ -2944,6 +2945,8 @@ def plotly_xgboost_trees(
units (str, optional): Units of target variable. Defaults to "".
higher_is_better (bool, optional): up is green, down is red. If False then
flip the colors.
model_name (str, optional): model family label used in chart titles.
Defaults to "xgboost".

Returns:
Plotly fig
Expand Down Expand Up @@ -3041,10 +3044,10 @@ def plotly_xgboost_trees(
)

if target:
title = f"Individual xgboost decision trees predicting {target}"
title = f"Individual {model_name} decision trees predicting {target}"
yaxis_title = f"Predicted {target} {f'({units})' if units else ''}"
else:
title = "Individual xgboost decision trees"
title = f"Individual {model_name} decision trees"
yaxis_title = f"Predicted outcome ({units})" if units else "Predicted outcome"

layout = go.Layout(
Expand Down
Loading
Loading