Skip to content

[WIP] Add CPU QuadratureSHAP prototype alongside TreeSHAP#12106

Draft
RAMitchell wants to merge 10 commits intodmlc:masterfrom
RAMitchell:shapley-value-algorithms
Draft

[WIP] Add CPU QuadratureSHAP prototype alongside TreeSHAP#12106
RAMitchell wants to merge 10 commits intodmlc:masterfrom
RAMitchell:shapley-value-algorithms

Conversation

@RAMitchell
Copy link
Copy Markdown
Member

Summary

This draft PR introduces a new CPU SHAP algorithm, quadratureshap, alongside the existing CPU TreeSHAP implementation.

The intent of this change is to make it possible to evaluate the new algorithm side by side with the current baseline, focusing on:

  • correctness relative to TreeSHAP
  • CPU runtime
  • numerical behavior across different tree shapes

What this adds

  • a new CPU SHAP selector:
    • shap_algorithm=treeshap
    • shap_algorithm=quadratureshap
  • a CPU QuadratureSHAP implementation under src/predictor/interpretability/
  • tests comparing QuadratureSHAP against TreeSHAP
  • documentation for the new selector
  • internal cleanup so the CPU SHAP implementations live in the same module

Current status

This PR is intentionally draft.

At this stage, the goal is not to replace TreeSHAP, but to introduce QuadratureSHAP in a form that can be reviewed and benchmarked against the existing implementation.

The current CPU implementation uses an endpoint-adapted quadrature rule with 16 points.

Preliminary results

On a local CPU benchmark with:

  • 40k training rows
  • 50 features
  • tree_method=hist
  • max_depth=8
  • 200 boosting rounds
  • SHAP prediction on 5k rows

Measured means:

  • TreeSHAP: 2.107s
  • QuadratureSHAP: 0.805s

This is about a 2.62x speedup relative to TreeSHAP on that workload.

For the same run, the output remained very close to TreeSHAP:

  • max abs diff: 7.45e-08
  • mean abs diff: 2.50e-09
  • max additivity error: 4.77e-07

These results are preliminary and intended to support further evaluation.

Testing

/home/nfs/rorym/anaconda3/bin/conda run -n xgboost ./build-cpu-v8/testxgboost --gtest_filter='Predictor.QuadratureShapPrototypeMatchesTreeShapCPU:Predictor.QuadratureShapSelectorMatchesTreeShapCPU:Predictor.ShapOutputCasesCPU'

Review focus

Feedback is especially useful on:

  • selector/API naming
  • implementation placement
  • test coverage
  • benchmark methodology
  • numerical tolerances versus TreeSHAP

@RAMitchell RAMitchell force-pushed the shapley-value-algorithms branch from c73c829 to 0888129 Compare March 19, 2026 09:18
@RAMitchell
Copy link
Copy Markdown
Member Author

Added a reproducible QuadratureSHAP benchmark harness in demo/guide-python/quadratureshap_benchmark.py together with a compact N=8 summary.

The main result is that the observed speedup depends much more on tree size than on whether the data is “real” or “synthetic”. For small trees, TreeSHAP is often competitive or faster. For medium trees, QuadratureSHAP is typically around 1.2x to 1.9x faster. For large trees, the gap widens substantially.

Workload Family Depth Mean nodes/tree Speedup vs TreeSHAP Max abs diff
breast_cancer real 30 37.37 0.42x 4.77e-07
digits real 30 46.52 1.58x 2.50e-06
diabetes real 30 191.01 1.60x 1.53e-05
easy_linear synthetic 4 30.61 0.65x 1.91e-06
easy_linear synthetic 8 370.47 1.19x 2.86e-06
easy_linear synthetic 16 1467.43 2.37x 5.72e-06
easy_linear synthetic 30 1928.48 3.46x 3.40e-03
random_labels synthetic 4 30.54 0.74x 5.96e-08
random_labels synthetic 8 321.22 1.90x 1.79e-07
random_labels synthetic 16 4273.45 4.59x 2.88e-06

Two practical takeaways from this:

  • The earlier 2.6x style speedup numbers appear to come from fuller / larger-tree regimes, not from typical small real-data trees.
  • quadratureshap_points=8 still looks like the best current operating point: it preserves good accuracy on most cases while retaining a clear speed advantage once trees get moderately large.

@trivialfis
Copy link
Copy Markdown
Member

Is there a reference for the algorithm you are working on?

@RAMitchell
Copy link
Copy Markdown
Member Author

There are no public materials yet this is still very much in flight.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants