[WIP] Add CPU QuadratureSHAP prototype alongside TreeSHAP by RAMitchell · Pull Request #12106 · dmlc/xgboost

RAMitchell · 2026-03-19T09:08:29Z

Summary

This draft PR introduces a new CPU SHAP algorithm, quadratureshap, alongside the existing CPU TreeSHAP implementation.

The intent of this change is to make it possible to evaluate the new algorithm side by side with the current baseline, focusing on:

correctness relative to TreeSHAP
CPU runtime
numerical behavior across different tree shapes

What this adds

a new CPU SHAP selector:
- shap_algorithm=treeshap
- shap_algorithm=quadratureshap
a CPU QuadratureSHAP implementation under src/predictor/interpretability/
tests comparing QuadratureSHAP against TreeSHAP
documentation for the new selector
internal cleanup so the CPU SHAP implementations live in the same module

Current status

This PR is intentionally draft.

At this stage, the goal is not to replace TreeSHAP, but to introduce QuadratureSHAP in a form that can be reviewed and benchmarked against the existing implementation.

The current CPU implementation uses an endpoint-adapted quadrature rule with 16 points.

Preliminary results

On a local CPU benchmark with:

40k training rows
50 features
tree_method=hist
max_depth=8
200 boosting rounds
SHAP prediction on 5k rows

Measured means:

TreeSHAP: 2.107s
QuadratureSHAP: 0.805s

This is about a 2.62x speedup relative to TreeSHAP on that workload.

For the same run, the output remained very close to TreeSHAP:

max abs diff: 7.45e-08
mean abs diff: 2.50e-09
max additivity error: 4.77e-07

These results are preliminary and intended to support further evaluation.

Testing

/home/nfs/rorym/anaconda3/bin/conda run -n xgboost ./build-cpu-v8/testxgboost --gtest_filter='Predictor.QuadratureShapPrototypeMatchesTreeShapCPU:Predictor.QuadratureShapSelectorMatchesTreeShapCPU:Predictor.ShapOutputCasesCPU'

Review focus

Feedback is especially useful on:

selector/API naming
implementation placement
test coverage
benchmark methodology
numerical tolerances versus TreeSHAP

RAMitchell · 2026-03-19T12:08:48Z

Added a reproducible QuadratureSHAP benchmark harness in demo/guide-python/quadratureshap_benchmark.py together with a compact N=8 summary.

The main result is that the observed speedup depends much more on tree size than on whether the data is “real” or “synthetic”. For small trees, TreeSHAP is often competitive or faster. For medium trees, QuadratureSHAP is typically around 1.2x to 1.9x faster. For large trees, the gap widens substantially.

Workload	Family	Depth	Mean nodes/tree	Speedup vs TreeSHAP	Max abs diff
`breast_cancer`	real	30	37.37	`0.42x`	`4.77e-07`
`digits`	real	30	46.52	`1.58x`	`2.50e-06`
`diabetes`	real	30	191.01	`1.60x`	`1.53e-05`
`easy_linear`	synthetic	4	30.61	`0.65x`	`1.91e-06`
`easy_linear`	synthetic	8	370.47	`1.19x`	`2.86e-06`
`easy_linear`	synthetic	16	1467.43	`2.37x`	`5.72e-06`
`easy_linear`	synthetic	30	1928.48	`3.46x`	`3.40e-03`
`random_labels`	synthetic	4	30.54	`0.74x`	`5.96e-08`
`random_labels`	synthetic	8	321.22	`1.90x`	`1.79e-07`
`random_labels`	synthetic	16	4273.45	`4.59x`	`2.88e-06`

Two practical takeaways from this:

The earlier 2.6x style speedup numbers appear to come from fuller / larger-tree regimes, not from typical small real-data trees.
quadratureshap_points=8 still looks like the best current operating point: it preserves good accuracy on most cases while retaining a clear speed advantage once trees get moderately large.

trivialfis · 2026-03-20T07:28:10Z

Is there a reference for the algorithm you are working on?

RAMitchell · 2026-03-20T09:01:02Z

There are no public materials yet this is still very much in flight.

…orithms

RAMitchell added 2 commits March 19, 2026 02:18

Add CPU V6 SHAP algorithm

4203d66

Refactor and optimize CPU QuadratureSHAP

0888129

RAMitchell force-pushed the shapley-value-algorithms branch from c73c829 to 0888129 Compare March 19, 2026 09:18

RAMitchell added 2 commits March 19, 2026 02:56

Add configurable quadrature point count

e257d39

Add QuadratureSHAP benchmark harness

60611db

Add CUDA QuadratureSHAP prototype

16a5a59

yupbank mentioned this pull request Mar 26, 2026

Add V5 leaf-extraction QuadratureSHAP — faster than TreeSHAP at every depth yupbank/xgboost#1

Open

4 tasks

RAMitchell added 5 commits March 30, 2026 03:44

Refine GPU QuadratureSHAP kernel

2b8adee

Refactor QuadratureSHAP CPU structure

5e96ce9

Speed up CPU QuadratureSHAP extraction

d80dfa1

Merge remote-tracking branch 'upstream/master' into shapley-value-alg…

1d21bb2

…orithms

Add QuadratureSHAP interaction paths

5d478c3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] Add CPU QuadratureSHAP prototype alongside TreeSHAP#12106

[WIP] Add CPU QuadratureSHAP prototype alongside TreeSHAP#12106
RAMitchell wants to merge 10 commits intodmlc:masterfrom
RAMitchell:shapley-value-algorithms

RAMitchell commented Mar 19, 2026

Uh oh!

RAMitchell commented Mar 19, 2026

Uh oh!

trivialfis commented Mar 20, 2026

Uh oh!

RAMitchell commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

RAMitchell commented Mar 19, 2026

Summary

What this adds

Current status

Preliminary results

Testing

Review focus

Uh oh!

RAMitchell commented Mar 19, 2026

Uh oh!

trivialfis commented Mar 20, 2026

Uh oh!

RAMitchell commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants